Saturday, November 13, 2010

Coalescence

I spent the last several days experimenting with function-merging in the compiler, and the results are encouraging but not spectacular.

Take for example the following function in class ArrayIterator:

def next() -> ElementType or void {
  if (self.index < self.array._size) {
    return self.array._data[self.index++];
  } else {
    return;
  }
}

The type 'ElementType' is a template parameter. The interesting thing about this function is that it really doesn't care what ElementType is - it doesn't call any of it's methods or examine it's variables. Really, the only thing it cares about at all is it's size. For most arrays, that size will be the size of a pointer.

There's no reason, then, why ArrayIterator[String].next() and ArrayIterator[Object].next() can't be the same machine-code function. What I've managed to do is create a FunctionMergePass class that compares two functions and determines whether or not they can be merged. Currently I do not attempt to merge every function automatically, instead the merging behavior is manually triggered by the presence of the @Coalesce attribute on the class.

The pass works by walking the code-flow graph of both functions in parallel, and checks each pair of expression nodes to see whether or not they are equivalent in terms of their effect. If the merge pass finds any discrepancies between the two, the merge operation is canceled, and the function marked as non-mergeable.

Most expression types are trivial to compare, although some are more interesting. In the above code example, there's an implicit cast from 'ElementType' to 'ElementType or void' in the first return statement - in other words, it's constructing a union type and returning it. The merge pass in this case checks to see if the two union types are equivalent, which they are - 'Object or void' produces the exact same binary structure as 'String or void'.

The most common cause of merge failure is due to the function calling another non-mergeable function. Take for example this method in the template ArrayList[ElementType]:

def insert(index:int, e:ElementType) {
  Preconditions.checkIndex(index >= 0);
  Preconditions.checkIndex(index <= _size);
  grow(1);
  ElementType[].copyElements(_data, index + 1, _data, index, _size - index);
  _data[index] = e;
}

Everything in this function is mergeable except for the call to 'grow'. The reason 'grow' isn't mergeable is that it sometimes allocates a new Array instance, and calls to 'new' are generally unmergeable because the exact type of the object being created needs to be known. (There's a loophole around this which I have thought of but have not implemented - if the object being allocated is the same type as 'self', then we can just use the type information from 'self' instead of referring to the type information via a static constant.)

However, there is a way we could make the call to 'grow' mergeable - at a slight performance penalty. The answer is to make the call virtual. Right now it isn't virtual, because the "ArrayList" class is marked "final", which means that the compiler is smart enough to call 'grow' directly instead of dispatching through a virtual function table. (For those of you unfamiliar with Java, marking a class 'final' means that it cannot be subclassed, which allows the compiler to make significant optimizations.)

If the call to 'grow' were made virtual, then even if we merge ArrayList[String].insert() with ArrayList[Object].insert(), the correct version of 'grow' will be called in either case because it depends on the type of 'self'.

The question is whether or not this is worth doing, which comes down the question of what's more important, speed or size? At the moment, enabling the function merging behavior on the SwitchStmtTest unit test results in a 7K reduction in the size of the executable file. 7K out of 350K is not all that much.

No comments:

Post a Comment