Yet Another Python Multimethod Decorator, Part 7: What You’re Trying To Do Is Wrong

One of the recurring difficulties with multimethods is the concept of method resolution order. If you have multiple methods with the same name but different parameter types in scope, and you call the method with types such that more than one method which can accept them, how do you decide which one to dispatch to?

Our dispatcher implementation doesn’t even try to determine which the most appropriate method to dispatch to is – it tries each case in declaration order and just picks the first match. Partly because we hadn’t got around to addressing that issue yet, but mostly because it’s not a straightforward thing to do. It’s one of those problems where it seems you’re being perfectly reasonable until you hit a brick wall just as you think you’re almost done, only to realise you were never being reasonable at all.

Resolution order with single dispatch

Let’s take the example where we have types A and B such that B is a subtype of A, and we have the following code snippet:

function foo(A arg) { ??? }  // function 1
function foo(B arg) { ??? }  // function 2

foo(new B());

It seems reasonable to expect that we would call function 2, as it’s more specific than function 1 – this is the sort of case it exists for. There are all sorts of reasons why we want this to be the case.

It might be that Bs foo differently to As, so we need this to fix its behaviour. Generally here we’ve broken something in our model, though: we have a function which says it can foo As, and a B is an A, and yet we can’t foo it like we can the other As – so is B really a subtype of A, is it really substitutable here?

It might be that B can be foo-d in the same way an A can, but there are valid reasons to foo it differently. Say A is List, and B is ArrayList, and foo is asArray(). Generally we can turn a list into an array by allocating an array the length of the list and copying the list into it element-by-element. For an ArrayList, though, we can just return a reference to its underlying array (or if we’re worried about aliasing, use a block memory copy). Both implementations are valid, but one is more efficient.

This is similar to the situation we get in method dispatch (often referred to in OOP languages as single dispatch). The way this generally works is by having a bunch of implementations of a given virtual method in a table somewhere, and looking up the implementation based on the runtime class of the object the method’s being invoked on. Arguably, methods on objects are just a special case of functions, taking the object itself as another parameter (often referred to as the implicit parameter).

Method resolution order for method dispatch is simple: if it’s defined in the class, use that, otherwise use the implementation from the parent, and so on using successively more general types until you find an implementation. It’s essential here we take the most specific method, otherwise overriding methods doesn’t work.

Resolution order with multiple dispatch

The problem comes when we step up to multiple dispatch: let’s take the same types A and B as above, and then we have the following code snippet:

function foo(A arg, B arg) { ??? } // function 1
function foo(B arg, A arg) { ??? } // function 2

foo(new B(), new B());

What do we expect to be called here? Both appear to be valid, and both appear to be equally specific. What can we do here?

We could pick arbitrarily. Say, based on the order of declaration. This seems bad: what if we import these from other modules, do we want re-ordering the imports to change our behaviour? But we could do that.

We could make one more specific than the other by saying the first parameter is more important than the others, and should match that as specifically as possible before trying to match the later parameters. This has the advantage of always finding a match, in an unambiguous manner. It does hoever seem a little arbitrary – why should the first parameter be more important? It’s also got some undesirable side-effects – refactoring the order of parameters in our various multimethod implementations changes which one is called.

We could disallow situations like this, by requiring an unambiguous implementation. In other words, if for a given set of parameters, there are multiple valid matches, and across those matches there isn’t one which has the most specific match for all the parameters, then it’s an ambiguous call, and it’s disallowed. The problem with this is: how do we resolve it?

The above situation actually happens in Java’s static dispatch system: take the following code:

public class Dispatch {
	static class A { }
	static class B extends A { }

	public static void foo(A a, B b) { System.out.println("An A and a B"); }
	public static void foo(B b, A a) { System.out.println("A B and an A"); }
	public static void main(String... args) { foo(new B(), new B()); }
}

This won’t compile – it’ll complain about an ambiguous method call. There is a resolution:

public class Dispatch {
	static class A { }
	static class B extends A { }

	public static void foo(A a, B b) { System.out.println("An A and a B"); }
	public static void foo(B b, A a) { System.out.println("A B and an A"); }
	public static void main(String... args) 
	{ 
		foo((A)new B(), new B());
		foo(new B(), (A)new B()); 
	}
}

This only works, though, because Java’s resolving the call statically, so we can change the type it thinks a B is with a cast. We can’t do that with dynamic dispatch – we can’t just make a B not a B to direct it towards the right invocation. There’s nothing else we can do in the general case, either, without jumping to a level beyond the interface we’ve got here. If foo‘s a multimethod we imported from a module, then we’re stuck: there is no way of resolving this.

Well, that’s not true: we could add syntax to provide dispatch hints, something like this:

function foo(A arg, B arg) { ??? } // function 1
function foo(B arg, A arg) { ??? } // function 2

foo<A, B>(new B(), new B());   // calls function 1
foo<B, A>(new B(), new B());   // calls function 2

That would work, and it’s at least not obscure which implementation is being called: it does, nonetheless, seem pretty horrible – we’re basically turning dynamic dispatch off because it doesn’t work, and it requires we know exactly what cases are implemented so we can specify the one we want to call.

So, that’s three decidedly unsatisfying approaches. At this point I’m out of ideas, although of course there may be some other approach which isn’t listed above, but it looks like dispatch resolution for multiple dispatch in the presence of subtyping just doesn’t always work.

It’s worth pointing out that without subtyping, none of these issues exist. Just an aside.

It’s worse than that, though, because this here:

The problem comes when we step up to multiple dispatch

was a lie.

Resolution order with single dispatch, revisited

Now let’s look at the case of the type C, which is a subtype of both A and B.

function foo(A arg) { ??? }    // function 1
function foo(B arg) { ??? }    // function 2

foo(new C());

What happens here? We’ve basically got exactly the same problem as multiple dispatch, only with a single argument. This exact case is easily replicatable in Java:

public class Dispatch {
    interface A { }
    interface B { }
    static class C implements A, B { }

    public static void foo(A a) { System.out.println("An A"); }
    public static void foo(B b) { System.out.println("A B"); }
    public static void main(String... args) { foo(new C()); }
}

This can be fixed the same way as before: casting to hint type, and the same objections about why it doesn’t work for dynamic dispatch as before also apply.

The long and short of it appears to be: subtyping and resolving ambiguities in dispatching on types based on specificity are incompatible. Not in every case – not even in the majority of cases, but often enough to show we’re on shaky ground. Various limitations (for example: single dispatch is fine as long as subtyping is strictly hierarchical with single parentage) and workarounds (such as hinting towards the case we want) apply, but we’re clearly patching a leaky model now.

That’s not good news: it not only undermines multiple dynamic dispatch, it also undermines multiple static dispatch, aka overloading (although we can get around that via weakening type annotations – that said, static dispatch on types has a whole boatload of other problems) and single dynamic dispatch. Those are basically all our mechanisms for ad-hoc polymorphism.

There’s good news, though, on two counts. Firstly, nothing above actually undermines method invocation, because:

Method invocation secretly isn’t actually dispatch.

Let’s start by talking about what dispatch actually is. Dispatch is when you have a number of functions in scope, and you choose one of them based on whatever criteria.

That’s often how method dispatch is implemented. But, really, with a sensible view of how objects behave, that’s just an under-the-hood detail. What an object is – how an object should be conceptually modelled – is a record of closures, closing over a common environment. When you call a method, don’t think of it as taking the method name and looking up the implementation for the object type: think of it as taking a value out of a map.

Sure, you might define a class as inheriting from this class and whatever. How that map is built may be based on all sorts of arcane rules (just check out Ruby’s method resolution rules sometime, just for a laugh), but as far as its users are concerned, though, an object is flat, a projection of its layers of structure into a single aggregation of methods. With Javascript’s prototypical object model, this is exactly how it’s implemented.

We’re not choosing one function from many in scope, based on types. We’re choosing one function from one in scope – only, we’re loading that scope dynamically.

So. We may have undermined resolving dispatch on type combined with subtyping, but that doesn’t mean method calls are problematic. That’s good. But I thought we were talking about multiple dispatch?

Resolution order with multiple dispatch, revisited

Is there anything we can do about sorting out method resolution? Well, yes, there is actually. We don’t let the language attempt to divine the most appropriate implementation to dispatch to: we tell it in what order we want to consider the cases, based on the declaration order of those cases.

This isn’t as powerful an approach as “idealised” multiple dispatch, intended to work just like overloading tends to work, just dynamically. You can’t pick between “multiple implementations in scope” where the origin scope of those implementations are different: it requires we define all the cases in the same location. So we can’t add a case to a multimethod we imported by just declaring it, nor can we combine two multimethods by importing them from different modules. The good news is that both of those are terrible ideas, so really, we didn’t lose – we won.

We gain readability: all the cases are arranged together in the order we expect them to be used, special cases at the top, base cases at the bottom. We retain extensibility: we can always take a multimethod and wrap it with another multimethod which adds the new cases, then defers to the old one. And we protect against logic being changed inadvertently by introducing a new case in a different scope, causing a new kind of spooky, scary action at a distance.

Plus, this model allows us to address cases where the entire concept of “most specific” makes no sense at all, such as the arbitrary predicate on runtime value.

And the best part? By a happy coincidence, this is the approach we were making do with while we were deferring thinking about the problem.

That’s about all I have to say on the subject. If you want to take a look at the finished dispatch library, it’s gratifyingly tiny and available on Github.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s