Machine Words: September 2010

Thursday, September 16, 2010

More on TraceTables

I realized that my object-tracing scheme, as outlined in the last posting, was unable to handle an important case: unions. Unions can contain both pointer and non-pointer types, and since I'm planning on building an accurate rather than conservative collector, that means that I'll need to examine the type of data stored in the union in order to determine whether to trace it. Otherwise, you might end up with the case where you store an integer value in the union, but the garbage collector attempts to interpret it as a pointer.

The internal structure of a union depends on whether or not it contains any value types. If a union contains only reference types (i.e. pointers to objects), then the union is simply a pointer. There's no need for anything else because each object type is self-identifying. Tracing this type of union is simple, just treat it like a regular pointer.

If a union contains any value types, however, then the union also contains a "discriminator" field, which is a small integer field that contains a code indicating what type of data is currently stored in the union. The tracer needs to examine the discriminator field in order to determine whether or not to trace the data stored in the union. Doing this will require the compiler to generate a special tracing function for the union. While it might be possible to generate data tables that tell the garbage collector where to look for the discriminator field, interpreting that data structure would make the garbage collector extremely complex. So a function it is.

At the same time, however, I still want most of the tracing to be data-driven rather than code-driven. The reason for this is efficiency and separation of concerns. The part of the garbage collector that handles object tracing is divided into two subcomponents: There is the part that understands the memory layout of all of the types generated by the compiler, and there is the part that implements the various tracing passes needed by the collection algorithm. Because there are potentially millions of pointers that need to be traced, the communication between these two subsystems needs to be very high bandwidth - we want to avoid having to do an indirect function call for each pointer traced. A data-driven strategy allows the the description of the memory layout for the whole object to be passed to the tracing algorithm as a single call, rather than a call per pointer field.

The new tracing strategy is a hybrid of the two methods. For object types which have a fixed, unvarying layout and interpretation, a purely data-driven approach is used. For fields which are variable length (such as FlexArrays), or which have a changing interpretation based on their contents (such as unions), tracing functions are used to augment the purely data-driven scheme.

Let's define some terms:

TraceAction - an object representing a specific tracing pass, which operates on pointer fields of an object. There are two methods for invoking the action: Either by applying it to a pointer field directly, or by passing it a base pointer and a list of field offsets.

Field Offset Table - A table of offsets into an object, specifying the location of pointer fields within that object.

TraceMethod - a method on the object being traced which takes a TraceAction as an argument, and applies the action to the various pointer fields within the object. TraceMethods are used for tracing object fields that are too complex to be described by a field offset table.

TraceDescriptor - an object representing a strategy for tracing a specific object type, although not necessarily all of the pointers within that type. There are two types of descriptors, field offset descriptors, which contain a pointer to a field offset table, and trace method descriptors, which contain a pointer to a trace method. Both types of descriptor have the same size and layout, and can be arranged together in a flat array.

TraceDescriptorArray - an array of TraceDescriptors sufficient to trace all of the fields within an object. For most types, the descriptor array will consist of a single field offset descriptor. For objects which contain flex arrays or unions, additional trace method descriptors will be in the array.

There are three places where TraceDescriptorArrays are referenced:

For classes, the TIB (TypeInfoBlock) of the class contains a pointer to the descriptor array.
For structs, tuples, unions, and other data types that are statically typed, the descriptor array is a static constant which can be accessed by name.
For stack frames, the descriptor array is accessed via the sync point table - this is a table containing all of the program counter addresses where a garbage collection could potentially take place. The idea is that when a collection occurs, you walk up the stack looking at all of the return addresses for each call frame. You then look up the address in the sync point table, which yields a pointer to the type descriptor array for that stack frame.

Here's what a field offset descriptor looks like. It's designed to be exactly 16 bytes, even on a 64-bit system. (On a 32-bit system it will be padded.)

/** Struct containing information on how to trace a type. */
struct FieldOffsetDescriptor {

  /** If non-zero, it means that this is the last descriptor in the list. */
  readonly var endList:uint16;

  /** Number of fields in the field offset table. 0 = method descriptor. */

  readonly var fieldCount:uint16;

  /** Offset from base for this descriptor. */
  readonly var offset:uint32;

  /** Table of field offsets. */

  readonly var fieldOffsets:Address[uint16];
}

A trace method descriptor looks the same, except that the last pointer field will contain a function pointer rather than the field offset table.

Friday, September 10, 2010

FlexibleArray and GarbageCollection

At the moment, I'm working on two different branches of Tart. The branch on my iMac is all about solving the stupid dwarf debugging problems. The next step is to download and build a copy of dwarfdump2 on my machine so that I can debug it when it segfaults as a result of attempting to read a Tart-generated binary. Ugh.

On my Thinkpad, I'm working on a different line of development, involving garbage collection. I'm trying to modify LLVM to be able to generate stack frame descriptors that handle local variables that aren't just primitive types. This turns out to be somewhat easier than I was afraid of, now that I have taken the time to actually understand the LLVM code for generating stack roots.

Tracer Tables

One issue that I am still pondering is the generation of tracer tables. A tracer table is simply an array of the offsets of all of the traceable pointers within a data structure. The idea is that the garbage collector has a base pointer to an object, and then obtains a tracer table for that object, which it can then use to locate all of the pointers. For user-defined classes, the object's TIB (TypeInfoBlock) contains a pointer to the tracer table:

class Object {

const TypeInfoBlock * __tib;

// (more members follow)

};

class TypeInfoBlock {

const long ** __tracerTable;

// (more members follow)

};

(I'm writing these examples in C++ for clarity.)

In the case of struct types, there's no TIB pointer, so the compiler needs to determine which tracer table to use at compile time. Fortunately this is easy to do, since structs are always contained within a context - either they are local variables or they are embedded within another object type.

Tracer Functions

An alternative to tracer tables is to generate a tracer function for each type. A tracer function is simply a compiler-generated function that iterates through all of the pointers in the data structure and performs some garbage-collection action on each one.

Let's take a simple mark-and-sweep collector as an example. During the mark phase, we need to trace through all of the pointers in the object and mark each object pointed to. So the compiler could generate code to do this, one tracer function for each data type.

One problem with this is that it means that the compiler has to know details about your garbage collection algorithm. Another problem is that for a sophisticated, multi-threaded collector, there will be more than one type of trace pass. Often there will be several passes, with the last being a "fix-up" pass which deals with mutations that have occurred while the collection cycle was underway. In order to avoid generating a different tracer function for each different trace pass, we would need to generalize the tracer function - have it take a parameter telling it what action to take, probably in the form of a functor argument that is used to process each pointer field.

Using a functor has a number of efficiency disadvantages: First, it means that you now have an extra pointer dereference (the functor pointer) for each and every pointer field processed. Secondly, some tracers need to actually modify the pointer values, while others can get by with merely reading the pointers but not changing them. The tracer function, unfortunately, would need to assume the worst - that the functor was always going to modify the pointer being traced.

From a caching and memory locality standpoint, using a tracer table is better (not to mention more compact) - the garbage collector can simply iterate through the table entries in a tight loop, and perform whatever action it wants in the body of that loop.

The one drawback of tracer tables is that the structure of the object must be known at compile time in order to generate the table. But that's not a big hardship, right?

Flexible Arrays

In C++ and other languages, the notion of a "variable-length" type is extremely useful. So useful, that in the case of Tart, there's a built-in data type designed to support this use case: FlexibleArray.

Here's an example of you you might use the FlexibleArray data type:

final class String {

var size:int;

@Unsafe var data:FlexibleArray[byte];

}

A FlexibleArray is simply a way to tell the compiler that the size of the member is unknown at compile time. It's similar to declaring a zero-length array in C++. It is illegal to declare a FlexibleArray-typed member that is not the last data member of the class, and the class must be declared 'final'. You also need to declare a custom allocator for the type:

final class String {

static def create(length:int) -> String {

let s:String = __flexAlloc(length);

s.size = length;

return s;

}

var size:int;

@Unsafe var data:FlexibleArray[byte];

}

In Tart, if a class has no constructor, but does have a static method named "create", that method is treated as a factory for the type. So you can say "s = String(10)" and it will call "create" just like a constructor. The difference is that a constructor is passed a pre-allocated instance of the type, with the TIB pointer filled in but all the other fields uninitialized. "create", on the other hand, is responsible for actually allocating the instance as well as initializing it.

The __flexAlloc function takes an integer length, and calculates how big of a memory block is needed to hold both the fixed fields of the type as well as the flexible array elements. It then allocates the memory, and fills in the TIB pointer.

(You may be wondering how __flexAlloc knew what type of object to allocate, in this case String. The answer is that the type inference engine deduced it from the type of the left hand side of the assignment statement. Cute, eh? We could have just as easily said "let s = __flexAlloc[String](length)").

Tracing Flexible-Sized Objects

So far, so good. However, we get back to the issue of garbage collection: How do we generate a tracer table for a variable-length object? The compiler doesn't know the length of the FlexibleArray, since that was determined at runtime.

You might think that the collector could cheat and look at the memory allocation header (which comes *before* the object), however this fails for two reasons: statically-allocated objects don't have a memory allocation header, and not all of the array elements may be initialized. A class might choose to use a FlexibleArray for "reserve capacity" - similar to what happens in an STL vector.

Really, the only entity who knows which elements of the array should be traced is the class itself. That means that we need to give some control of the tracing over to the class.

If we were using tracer functions, it would be relatively simple - we would just override the __trace() method. (Although even that doesn't entirely solve the problem as there's an interesting race condition. If our collector is operating on another thread, and we update the "size" field after writing additional elements to the FlexibleArray, the collector thread might get the older version of "size", which would mean that the newly written entry won't get traced. This is where the post-cycle fixups come in.)

Another drawback of using a tracer function is that now the class becomes responsible for tracing *all* of the fields in the class. You could use a combination of tracer tables and tracer functions, but that would mean implementing each tracer pass (as determined by the collection algorithm) twice - one version that works with tables and another version that works as a functor.

One (somewhat hacky) data-driven approach would be to use an annotation to tell the compiler which data field is used to store the size of the array:

final class String {

var size:int;

@Unsafe @SizeVar("size") var data:FlexibleArray[byte];

}

Essentially the compiler would see the annotation, look for a member variable named 'size', and use that to generate additional metadata. The compiler would need to generate four additional pieces of information:

The offset of the 'size' variable.
The integer type of the 'size' variable.
The tracer table for an individual array element.
The stride of an array element - that is, the difference in bytes of element N from element N-1.

Even this has problems. It means that the interpretation of 'size' is fixed by the compiler. You can't use begin/end pointers as is done in STL, you have to use integers.

Another example is array slice objects. Right now the Tart String class has a dual role: It can be used to represent a string of characters, or it can be use to represent a substring of another string. (For efficiency, the same class is used for both cases - otherwise String would have to be non-final, which would suck from a performance standpoint.) The 'size' field can mean either the size of the whole string, or the size of the substring. This is incompatible with a fixed interpretation of the field.

A third possibility is to have a special member function, say flexSize(), which returns the size of the flexible array and tells the collector how many entries in the array to trace.

final class String {

var size:int;

@Unsafe @FlexSize("flexSize") var data:FlexibleArray[byte];

def flexSize() -> int { return size; }

}

This is the best compromise solution I have come up with so far.

Friday, September 3, 2010

Tart status update and unsolved problems

I've been making fairly good progress on the reflection stuff. At the moment, I have both the old system and new system running in parallel, working towards a point where the old system can be completely deprecated.

There are two long-standing problems that I have not yet been able to solve, both of which are related to LLVM.

The first big problem is generating DWARF debugging info. I've put several man-months into this over the course of the past several years - I recently did a near-total rewrite of the code for generating source line information - and it still doesn't work.

Part of the problem is that the LLVM docs on source-level debugging, while extensive, are also vague and ambiguous in many places. I find that often times the only way to understand how things work is to cross-reference between the docs, the clang source code, and the LLVM source code. Even then, there are a lot of things aren't clear, especially when the code in clang contradicts what's in the doc. For example, according to the source-level debugging doc, the signature of a function is defined using an array of formal parameter descriptors using the DWARF tag DW_TAG_formal_parameter. However, according to Google code search, the identifier "DW_TAG_formal_parameter" appears nowhere in the clang source tree, and if you trace through the code that is used to generate a function type descriptor, you see that in fact the array is an array of type descriptors, not parameter descriptors.

Another difficulty with DWARF symbol generation is the difficulty of isolating a problem. If you make a mistake in your generator and then attempt to debug the compiled program, gdb prints various obscure error messages telling you that your debug info is invalid, but doesn't tell you where in your code the error lies or what debug symbol is having a problem. There's also dwarfdump, which pretty-prints the debug info for your program (typically thousands of pages worth, so good luck finding the problem by eye.) The easiest way to use use dwarfdump is to have it print out all of the debugging info using the -a option, and hope that it segfaults (which it usually does) when it gets to the point where the problem is. Based on the last few lines of printout before the segfault, you can sometime deduce which symbols are having problems, although if your problem lies in a call frame descriptor, this won't work because call frame descriptors aren't self-identifying - you can look at the text dump of a call frame descriptor, but it won't tell you which function's call frame is being described.

Tonight I am going to start experimenting with readelf, although that requires me getting it running on OS X.

The other big problem I have is with llvm.gcroot. This is the pseudo-function in LLVM which is used to mark a local variable as a garbage collection root. A linker plugin (which I have written for Tart) collects all of the calls to llvm.gcroot and uses that information to build the stack frame descriptors - a table which the garbage collector can use to trace the stack. The actual call to llvm.gcroot is removed, although it can have an effect on certain optimization passes.

The llvm.gcroot function has a curious limitation: It only works on data values that are pointers that have been allocated with a local alloc() instruction.

Typically, the call frame for a function contains pointers to objects on the heap. llvm.gcroot isn't responsible for the pointers which are contained inside of heap objects - that's the job of your collector. Instead, llvm.gcroot only deals with pointers that are on the stack. (Note that it does not deal with pointers that are SSA values, i.e. only in registers, since there's no way for a trace to access those anyway. Any pointer must have a location on the stack where it can be traced.)

The LLVM 'alloca' instruction is used to reserve a stack slot for a variable. So typically you would have a stack slot containing a pointer, which points to some heap object. The stack tracer then iterates through all of these stack slots (only the ones containing pointers to heap objects) and trace the objects pointed to.

The problem comes when you have a stack slot that contains pointers, but isn't a pointer itself. An example would be a small structure that contains several pointers. Because the structure is not a heap object, it needs to be traced by the stack tracer. Unfortunately, llvm.gcroot won't accept a stack slot that's not a pointer - it simply emits a fatal error when you try. Nor can you call llvm.gcroot on the members of the structure either - the argument to llvm.gcroot must be the result of an alloca instruction (a stack slot) and not a member of a stack slot.

Now, this is not a problem for languages like Java - Java only allows stack slots for primitive types and pointers. (There's no such thing as a 'struct' in Java). It's not a problem for C and C++, which don't have garbage collection. It *is* a problem for C# and D - those compilers get around the problem by simply not using LLVM's garbage collection framework at all. I don't want to do that because it requires writing a whole lot of code that is specific to a each different CPU architecture, which I'm not up to.

So far the only helpful suggestion from the LLVM mailing list is to allocate an extra stack slot for each local struct variable, containing a pointer to that struct. You would then declare the pointer as a gcroot instead of declaring the struct directly. However, I've been hesitant to go this route because it seems like an ugly hack, not to mention that there's now an additional 8-byte overhead for each local structure variable. Also, it makes the tracer more complicated, because now you have to deal with pointers to stack objects as well as pointers to heap objects.

I've thought about digging in and changing the way llvm.gcroot works. But the code is complicated and I don't entirely understand it, and I don't like mucking around with things I don't understand.