Thursday, September 16, 2010

More on TraceTables

I realized that my object-tracing scheme, as outlined in the last posting, was unable to handle an important case: unions. Unions can contain both pointer and non-pointer types, and since I'm planning on building an accurate rather than conservative collector, that means that I'll need to examine the type of data stored in the union in order to determine whether to trace it. Otherwise, you might end up with the case where you store an integer value in the union, but the garbage collector attempts to interpret it as a pointer.

The internal structure of a union depends on whether or not it contains any value types. If a union contains only reference types (i.e. pointers to objects), then the union is simply a pointer. There's no need for anything else because each object type is self-identifying. Tracing this type of union is simple, just treat it like a regular pointer.

If a union contains any value types, however, then the union also contains a "discriminator" field, which is a small integer field that contains a code indicating what type of data is currently stored in the union. The tracer needs to examine the discriminator field in order to determine whether or not to trace the data stored in the union. Doing this will require the compiler to generate a special tracing function for the union. While it might be possible to generate data tables that tell the garbage collector where to look for the discriminator field, interpreting that data structure would make the garbage collector extremely complex. So a function it is.

At the same time, however, I still want most of the tracing to be data-driven rather than code-driven. The reason for this is efficiency and separation of concerns. The part of the garbage collector that handles object tracing is divided into two subcomponents: There is the part that understands the memory layout of all of the types generated by the compiler, and there is the part that implements the various tracing passes needed by the collection algorithm. Because there are potentially millions of pointers that need to be traced, the communication between these two subsystems needs to be very high bandwidth - we want to avoid having to do an indirect function call for each pointer traced. A data-driven strategy allows the the description of the memory layout for the whole object to be passed to the tracing algorithm as a single call, rather than a call per pointer field.

The new tracing strategy is a hybrid of the two methods. For object types which have a fixed, unvarying layout and interpretation, a purely data-driven approach is used. For fields which are variable length (such as FlexArrays), or which have a changing interpretation based on their contents (such as unions), tracing functions are used to augment the purely data-driven scheme.

Let's define some terms:

TraceAction - an object representing a specific tracing pass, which operates on pointer fields of an object. There are two methods for invoking the action: Either by applying it to a pointer field directly, or by passing it a base pointer and a list of field offsets.

Field Offset Table - A table of offsets into an object, specifying the location of pointer fields within that object.

TraceMethod - a method on the object being traced which takes a TraceAction as an argument, and applies the action to the various pointer fields within the object. TraceMethods are used for tracing object fields that are too complex to be described by a field offset table.

TraceDescriptor - an object representing a strategy for tracing a specific object type, although not necessarily all of the pointers within that type. There are two types of descriptors, field offset descriptors, which contain a pointer to a field offset table, and trace method descriptors, which contain a pointer to a trace method. Both types of descriptor have the same size and layout, and can be arranged together in a flat array.

TraceDescriptorArray - an array of TraceDescriptors sufficient to trace all of the fields within an object. For most types, the descriptor array will consist of a single field offset descriptor. For objects which contain flex arrays or unions, additional trace method descriptors will be in the array.

There are three places where TraceDescriptorArrays are referenced:
  • For classes, the TIB (TypeInfoBlock) of the class contains a pointer to the descriptor array.
  • For structs, tuples, unions, and other data types that are statically typed, the descriptor array is a static constant which can be accessed by name.
  • For stack frames, the descriptor array is accessed via the sync point table - this is a table containing all of the program counter addresses where a garbage collection could potentially take place. The idea is that when a collection occurs, you walk up the stack looking at all of the return addresses for each call frame. You then look up the address in the sync point table, which yields a pointer to the type descriptor array for that stack frame. 
Here's what a field offset descriptor looks like. It's designed to be exactly 16 bytes, even on a 64-bit system. (On a 32-bit system it will be padded.)

/** Struct containing information on how to trace a type. */
struct FieldOffsetDescriptor {
  /** If non-zero, it means that this is the last descriptor in the list. */
  readonly var endList:uint16;

  /** Number of fields in the field offset table. 0 = method descriptor. */
  readonly var fieldCount:uint16;

  /** Offset from base for this descriptor. */
  readonly var offset:uint32;

  /** Table of field offsets. */
  readonly var fieldOffsets:Address[uint16];
}

A trace method descriptor looks the same, except that the last pointer field will contain a function pointer rather than the field offset table.

No comments:

Post a Comment