More progress on tuples

Tuesday, December 22, 2009
I've made a lot of progress on tuple types in the last several days, including the "unpacked" or "destructuring" assignment:

def returnTupleLarge() -> (int, String, int, String) {
  return 3, "Hello", 2, "World";

var d1, e1, d2, e2 = returnTupleLarge();

Unfortunately the "Python swap idiom" doesn't quite work yet:

a, b = b, a;

For some reason the code generator isn't generating the store to 'b'. Still tracking this one down.

In order to fully support tuple types I had to make some significant changes to the way that the code generator treats l-values. You see, LLVM currently has a limitation on the way that it supports first-class aggregates: Only structs that can fit into two registers can be passed by value - if the struct is larger than that, it has to be passed by reference. Eventually LLVM will support abitrary-sized parameters and return types, but that support is not there yet.

So the first problem is how to determine whether a struct will fit into two registers. Since my front-end is target-independent, I don't know how big a register is. For the moment I am using a somewhat dodgy heuristic for this.

The second issue is that we now have two kinds of structs as far as the codegen is concerned: "small" and "large". Similarly, we also have small and large tuples, since a tuple is very much like a struct. However, there is one important difference, which is that structs must always be addressable. The reason for this is the handling of 'self'. When you define a method on a struct, the 'self' argument has to be a pointer to the struct, not the value of the struct - otherwise, it would be impossible to write setter methods, for example.

Tuples, on the other hand, don't have methods, so they don't have to be l-values since there's never a 'self' pointer to worry about. However, if a tuple contains a struct, then that struct has to be addressable, which means that the tuple has to be treated like a struct.

I created a spreadsheet of all the combinations of Tart types and how they are represented. The columns represented various types - primitive, class, etc. The rows represented various forms of use - parameter, return val, variable, constant, intermediate value, member field, and so on. Using this, I was able to boil down all of the runtime representations into just 5 categories, which for lack of a better term I am calling "Shapes":

Shape_Primitive,          // float, int, pointer, etc.
Shape_Small_RValue,       // Small tuples
Shape_Small_LValue,       // Small structs
Shape_Large_Value,        // Large tuples or structs
Shape_Reference,          // Class or interface

Primitive types are always passed by value in a single register. Small_RValue shapes are also passed by value, but may require more than one register. Neither type is guaranteed to have a memory address, since the value may be contained in registers.

Small_LValue is passed and returned by value, but internally it is stored in memory and is guaranteed to have an address.

Large_Value is passed by reference. When used as a parameter, a local allocation is created and the value copied into it. For function return values, the compiler generates a 'struct return' parameter which holds a pointer to a buffer which the caller provides. The return value is copied into this buffer before exiting the function. Similarly, for variable assignments, the contents of the local allocation is copied from one buffer to another.

Reference types are always passed as a single pointer value. Assigning to a reference type merely assigns to the pointer, it does not affect the contents of the memory being pointed to.

The "shape" is determined for each type based on the contents of the type. For example, the shape of a union type may be different depending on what the types within the union are. So if a union contains a Large_Value shaped type, then the union must also be Large_Value as well.

With each type now able to report its shape, I can define the various operations on types - such as load, store, return, argument, copy, and so on - in terms of these shapes. This means that rather than having to have a bunch of special cases for each particular type, the resulting code generator can be considerably simplified.


Post a Comment