Sunday, July 17, 2011

When object orientation is the wrong answer

One of the major refactorings I did this week is to completely replace the implementation of the "isSubtype" test.

The "is subtype" test is a binary relation between two types. There's even an operator for it in Tart: "<:" which is the same one used by Scala. So the expression "A <: B" is true if A is either the same as B or is a subtype of B. This particular test gets called in a lot of places - for example, deciding which of two functions is a closer match involves doing subtype tests on all of the arguments.

The way I originally implemented this in the compiler was by following the standard object-oriented approach: class Type, which is the base class of all types, has a pure virtual method "isSubtypeOf(const Type * base) const". Each subclass of Type (PrimitiveType, FunctionType, CompositeType, etc) overrides this and provides an implementation that is specific to that type.

However, the subtype relation isn't quite as cut and dried as all that. For example, what if the type being tested is a type alias that points to some other type? Well, you have to dereference the alias before you can do the subtype test. OK so we do that. But now, what if the base type is also an alias? Well we have to dereference that too. So now every type has to know about aliases - either that, or every place that calls isSubtypeOf has to dereference the types first.

It gets more complicated: What if the base type is a protocol? Any type can be a subtype of a protocol, because a protocol is simply a constraint - it says "this type has a method with such-and-such name and such-and-such signature". So for example, even primitive types have a "toString" method, which means that primitive types are technically considered subtypes of the "HasToString" protocol. OK so now every type has to know about protocols.

What about ambiguous types? These are temporary types, created during the type inference pass, that represent a "cloud" of related types which haven't been fully resolved yet. So for example, if we call "min(a, b)" the return value might be an int, or a float, or even a String depending on which overload of "min" eventually gets chosen. Each ambiguous type has a generator which can iterate through all of the possible types, and that generator will return fewer and fewer results as we tighten up the constraints looking for a singular solution.

The "isSubclassOf" test for ambiguous types requires that all of the "possible" types must each pass the subclass test, if any of them fails then the whole thing does.

There are also "type assignments" such as "%T=int" - meaning that the type variable T is assigned the value "int" within the current environment. Now, what happens if you have a template instance, where one or more of the type parameters is a type assignment or some other funky type? For example: given that ArrayList[int] is a subtype of List[int], is ArrayList[%T=int] also subtype of List[int]? The answer is, it depends on the environment - so now isSubtypeOf() needs to know about environments.

As you can imagine, with all these complexities the code is starting to look pretty nasty. Time for a redo.

In the new scheme, isSubtype(a, b) is now just a regular function - not even a class method. It's a fairly large function, since it has to handle all of the logic I've outlined above. However, it's not as large as you might think. Mainly what it does in complex cases is simplify the case by one step and then call itself recursively with the simplified arguments. Thus, if either of the two arguments are an ambiguous type, it generates the set of possibilities, and then calls itself for each possibility.

The biggest benefit is that all of the logic for the subtype relation is all located in a single file, which is much easier to understand than having it scattered across two dozen source files. Although some types have different logic, there's a lot of commonality too - for example, there's a whole set of types which have no concept of inheritance of subtyping - booleans and machine address types are an example - they both handle the "isSubtype" test the same way - by delegating to "isEqual".

1 comment:

  1. Having all of the logic for the subtype relation in one place is really a good achievement and it will be very handy while maintaining code.

    Javin
    Why multiple inheritance is not supported in Java

    ReplyDelete