Machine Words: June 2009

Tuesday, June 9, 2009

Tart: Integer constants and unification

Lots of work last night on integer constants. In Tart, an integer constant is "unsized" until you actually attempt to assign it to a variable or pass it to a function. Thus, a statement like the following produces no warning:

var x:byte = 10000001 - 10000000;

In the above example, the subtraction operation occurs before the type case, so what you are really doing is assigning the value '1' to the variable. If you really wanted to produce a warning, you could have said:

var x:byte = int(10000001) - 10000000;

Now, where this gets interesting is when we attempt to pass an integer constant to a generic function, such as max, which has the type signature max<[%T]>(a:T, b:T) -> T.

var x = max(1, 100); // Both arguments are unsized
var x = max(byte(1), 100); // Only one argument is unsized
var x = max(ubyte(1), 100); // Only one argument is unsized - and it is unsigned.

So what does type 'T' get bound to? In the first case, we try to find the smallest integer type that will hold both values, which in this case is a byte. The second case works the same way, except that one of the types is explicit - again, we combine the two types to find which one is inclusive of the other. If there's no specificity order between the two, then a warning is emitted.

For the third case, one of the types is unsigned - so when we calculate the smallest integer type that can contain the second argument, we also pass in a flag telling it to use unsigned integer types. If the integer constant is negative, then a warning will result.

From a compilation perspective, this is all a bit tricky because of the fact that the type of an integer constant is dependent on the value, which means that you can't evaluate a type expression purely on the basis of type. Fortunately, the damage is contained by the fact that most operations on unsized ints cause them to become sized, so the majority of type calculations are not affected by this.

One area that bears closer scrutiny is the assumption that the combination of two types yields the type that is more general. This is in fact not the case when combining a parameter type with a return type. Consider the following example:

var x:uint = max(1, 2); // OK
var x:uint = max(1, 200000000); // Truncation warning

While the parameter types try to widen the definition of T, the return type tries to narrows it. (This is because return types are covariant, and parameters are contravariant.) In the first example, it knows that the expected return type is unsigned, so both of the parameters must also be converted to unsigned types. In the second example, the widening and narrowing results in T being overconstrained, leading to a truncation warning.

Monday, June 1, 2009

Tart: try/catch/finally

I just got "finally" working for simple cases. It doesn't yet handle nested finally clauses.

Cleanup handlers like finally are a challenge because the flow of control can go to different places on exit from the finally block. For example, a try block could exit with a return, a throw, a break, a continue, or just fall off the end, and each of these could potentially go to a different place depending on which one cause the exit from the try block.

One approach is to simply clone the try block at each exit point. I decided not to go this approach because of concerns about code bloat. (Although I may later add a heuristic that decides whether inlining the finally block is more efficient.)

The approach that I took was to simulate "local procedures". A local procedure is like a subroutine within a function, which you can call like a function - it resumes execution at the instruction after the call when it finishes. Now, there's no such instruction in LLVM IR (And I know why - the 'return' instruction would play merry hell with the optimizer, since it's effectively a branch to a variable address). So even though most processors have a "jump to subroutine" instruction, there's no way to represent that in LLVM language.

Instead, what I did is to write a pass that takes a list of basic blocks containing local procedure calls, and convert it into a state machine. Each local procedure call is converted into a regular branch, preceded by an instruction that sets a state variable. The end of each local procedure is converted into a conditional branch or switch which jumps back to the call site, depending on the setting of the state variable. The transform is smart enough to remove branches-to-branches, and to merge states which end up branching to the same place. (This is important for catch blocks, since normally all catch blocks end up at the same place after calling finally.)

During CFG analysis, finally blocks are stored on a stack of "cleanup" handlers for the current block. Note that this same stack will be used for the "with" statement, or any other future statement that defines a sequence of instructions that needs to be executed before exiting the block.

Which leads me to the next question, which is "how should 'with' statements work in Tart?" C# has the "using" statement, however it is much simpler than Python's "with". The using statement does one and only one thing, which is to call "dispose" on the object when the block exits (the object must implement the IDisposable interface.)

Python's "with" statement, on the other hand, does a lot more:

it has the enter and exit methods.
the expression passed to the with statement is a factory for the context object, not the context object itself.
if an exception occurs, it can pass it to the exit function.
the exit function can cancel the exception.

I don't know which of these features are essential. I'm particularly leery of the last one, the ability to cancel an exception dynamically - it doesn't fit very well with the model of statically-compiled catch blocks.

Another difference is that a Tart exception has less information in it than a Python exception - this is just a limitation of native exceptions. For example, my ability to produce a stack trace is very limited - I can do it, but only in a specially constructed exception handler that intercepts the exception in the search phase. In a regular function that uses try/catch, by the time the catch block is invoked, it's too late to get a stack trace, the stack is already gone. (The other alternative would be to *always* record a stack trace, but this would make all exceptions very expensive.)

Also, a stack backtrace isn't going to be terribly useful in an executable that has debugging symbols stripped out. This is a non-issue in an interpreted language which always keeps the symbol names around.