Monday, December 6, 2010

Should parameter names be part of the function type?

Here's an issue which I have been ignoring for a long time, but which is starting to become a pain point.

Tart supports keyword arguments in a manner similar to Python: When calling a function, any parameter can be referred to by name or by position. Parameters can also have default values, so you can have a function with a large number of parameters having default values, and use parameter names to pick out just the ones you want to have non-default values. (Whether this is a good API design choice is an open question.)

However, under the hood Tart does this very differently from the way Python does it. In Python, the matching of argument keywords to parameter positions is done at the time the call is dispatched. Since creating a dictionary is cheap in Python (in a relative sense, at least), the named parameters are bundled into a dictionary and the contents of this dictionary is matched against the parameter names of the called function.

In Tart, because it's a statically typed language, the matching of keyword arguments to parameter positions is done at compile time. (One of Tart's design principles is "if you can't figure out how to make it fast, it doesn't go in - meaning that no language feature is allowed if ordinary use of that feature would create a performance bottleneck.) Tart doesn't create a dictionary of keyword arguments - instead, the compiler internally creates a map of keyword arguments to parameter positions, and uses this information to re-arrange the order of arguments of the call. Any arguments not explicitly mentioned are filled in with their default values. In other words, all of the keyword stuff is just syntactic sugar on top of a C-style calling convention. Note that this also means that the Python feature of unpacking a dictionary via **args isn't supported.

In order for this to work, the names of the function parameters (and any other trait that would change how a function gets called) need to be part of the function's type. That means that f(a:int) and f(b:int) are actually different types.

Where this gets weird is when you start having things like pointers to functions. Imagine the following:

def f(a:int, b:int) { ... }

let f2 = f;
let f3:fn (b:int, a:int) = f;

Here we've assigned the original function 'f' to two function variables, one which has the same type as 'f', and one which defines a different function signature with the order of parameters reversed. Note that even though f3 is a different type than f, the conversion is allowed - the compiler doesn't consider the types to be incompatible just because the parameters have different names. Essentially, the names of the parameter are what I call a "type hint" - meaning that they make the type distinct, but it's a distinction without a difference as far as the type conversion checker is concerned. (The reason for this is simple: If we didn't allow function types with different parameter names to be interconvertible, then every function pointer would have to have the exact same spelling of the parameter names as the function that was assigned to it, which would make life hell for programmers.)

Thus, in the above example, if we call f3(a=1, b=2) it's the same as if we called f(a=2, b=1). Trippy, dude!

Where this starts to get painful is when we try to represent this information in the reflection metadata. As you know, one of my challenges has been to avoid a combinatorial explosion of metadata that gets generated along with a class or function. One way to do this is by sharing definitions whenever possible, so that two functions with the same type can point to the same type definition. Thus, in an ideal world, all functions with the type signature f(:int) could point to the same definition. However, if we include the parameter names along with the type, then the amount of sharing diminishes greatly, since it's unlikely that two functions will have both the same parameter types and parameter names.

Now, all this being said there's an alternative strategy I could have chosen, which is not to make the parameter names part of the type. In this strategy, when a function is called, the parameter assignment logic looks beneath the type to the actual function declaration being called. In the case of function pointers, where there is no underlying function declaration, keyword arguments simply wouldn't be allowed. I suspect that this would cause even more confusion than the case outlined above. However, it would have greatly simplified the logic of function types.

Actually, this is not strictly speaking true: every function type ultimately derives from a declaration somewhere - even a function value which is the result of some complex expression still has a type declaration which ultimately leads back to some variable or function declaration. The reason this is true is because even though functions can be created dynamically, function types can't be. So it might be possible for the compiler to examine an expression of function type, and by diving deep enough in the expression it could find the original function declaration. I'm not sure this strategy is any different than having the information be encoded in the type, nor am I certain I could make the compiler smart enough to figure this out in all cases.