Machine Words: Type inference and type constraints

Some notes about the way Tart's type inference solver works.

The type solver takes as its input an expression tree, and returns an updated expression tree. The input tree has ambiguous or unspecified types for the expression nodes in the tree, while the output tree has fully-specified types for every expression node. If the input tree is already fully specified, or the solver fails to find a solution, then the original tree is returned unchanged.

The first phase is the "gather constraints" pass. The solver walks the entire tree looking for "constraint sites", that is, nodes in the tree that implicitly constrain the types of their child or parent nodes. The simplest example is a call to a non-overloaded function: Each sub-tree representing an argument is constrained to be the same type as the corresponding function parameter.

In more complex cases, a function may have multiple overloads, or may be a template where the type of the argument is a template parameter. For cases like this, we create a "type constraint object", which is an subclass of Type. A type constraint represents all of the possible types that could exist at that location.

For example, suppose we call "foo(a, b)", and there are two overloads of foo: "foo(int, int)" and "foo(float, float)". Since we don't know yet whether parameter 'a' is an int or a float, we instead assign to 'a' a type constraint representing the set {int, float}.

However, a type constraint is not just a passive collection of types. The constraint remains connected to the original site - so for example, if we decide later to discard one of the two overloads of 'foo', the constraint will automatically update itself to reflect the new set of types that could exist at that point.

Type constraints have many of the same methods that types do. For example, types have a method called canConvert(), which attempts to convert an expression to that type, and returns an integer indicating the compatibility level for that conversion - whether it is possible, whether it would cause a compiler warning, etc. For type constraints, calling canConvert() simply calls canConvert() to each of the types in the set, and returns the worst compatibility level found. Similarly, type constraints have isSubclass(), includes(), isEqual() and other type comparison methods.

There are several different kinds of type constraints. The three most important are:

"parameter-of": Given a call site, return all of the possible types for the Nth parameter of the function being called.
"result-of": Given a call site, return all of the possible types for the return value of the function being called.
"type-parameter-of": Given a set of types, some of which may be parameterized types, return the set of their type parameters. So for the input set {Address[int], Address[float]}, it returns the set {int, float}.

Once the type constraints are in place, the solver starts the next phase, which is unification. This looks for any unbound template parameters in the expression, and binds them to the type of the corresponding sub-node. So if the template function foo(%T) is called with 'int', it binds %T=int. In many cases, the template parameter will be bound to a type constraint.

After unification comes pruning. This is essentially an optimization problem: For each call site, we want to reduce the number of choices available, but only in a way that increases the total compatibility ranking for the entire expression.

There are three pruning stages. Each stage makes use of a backtracking "cull" function. The "cull" function eliminates from a call site one or more choices (i.e. overloaded methods) for that site, but does it in a way that can be backtracked. (It basically just saves an integer backtracking level with each choice.)

Cull by ranking: We cull all methods that have a conversion compatibility level less than some threshold. So for example, we remove from consideration all choices that would produce a compiler warning.
Cull by specificity: For any given site, we compare the choices, and if one choice is strictly more general than another, we eliminate the more general one. So given foo(String) and foo(Object), both of which have the same compatibility ranking, we eliminate foo(Object) from consideration.
Cull by elimination. If neither of the two culling phases produces a solution, we do a brute force search over all possible permutations.

In each culling phase, we keep track of whether the particular combination of cullings is overconstrained or underconstrained. If it's overconstrained (meaning that there are one or more sites with no choices left), then we backtrack. If the solution is underconstrained, meaning that there are some sites with more than one choice remaining, then we proceed to the next phase and tighten the constraints even further.

The solver finishes when there is only one choice remaning at each constraint site.

Machine Words

Tuesday, January 25, 2011

Type inference and type constraints

No comments:

Post a Comment