Machine Words: 2010

Monday, December 6, 2010

Should parameter names be part of the function type?

Here's an issue which I have been ignoring for a long time, but which is starting to become a pain point.

Tart supports keyword arguments in a manner similar to Python: When calling a function, any parameter can be referred to by name or by position. Parameters can also have default values, so you can have a function with a large number of parameters having default values, and use parameter names to pick out just the ones you want to have non-default values. (Whether this is a good API design choice is an open question.)

However, under the hood Tart does this very differently from the way Python does it. In Python, the matching of argument keywords to parameter positions is done at the time the call is dispatched. Since creating a dictionary is cheap in Python (in a relative sense, at least), the named parameters are bundled into a dictionary and the contents of this dictionary is matched against the parameter names of the called function.

In Tart, because it's a statically typed language, the matching of keyword arguments to parameter positions is done at compile time. (One of Tart's design principles is "if you can't figure out how to make it fast, it doesn't go in - meaning that no language feature is allowed if ordinary use of that feature would create a performance bottleneck.) Tart doesn't create a dictionary of keyword arguments - instead, the compiler internally creates a map of keyword arguments to parameter positions, and uses this information to re-arrange the order of arguments of the call. Any arguments not explicitly mentioned are filled in with their default values. In other words, all of the keyword stuff is just syntactic sugar on top of a C-style calling convention. Note that this also means that the Python feature of unpacking a dictionary via **args isn't supported.

In order for this to work, the names of the function parameters (and any other trait that would change how a function gets called) need to be part of the function's type. That means that f(a:int) and f(b:int) are actually different types.

Where this gets weird is when you start having things like pointers to functions. Imagine the following:

def f(a:int, b:int) { ... }

let f2 = f;

let f3:fn (b:int, a:int) = f;

Here we've assigned the original function 'f' to two function variables, one which has the same type as 'f', and one which defines a different function signature with the order of parameters reversed. Note that even though f3 is a different type than f, the conversion is allowed - the compiler doesn't consider the types to be incompatible just because the parameters have different names. Essentially, the names of the parameter are what I call a "type hint" - meaning that they make the type distinct, but it's a distinction without a difference as far as the type conversion checker is concerned. (The reason for this is simple: If we didn't allow function types with different parameter names to be interconvertible, then every function pointer would have to have the exact same spelling of the parameter names as the function that was assigned to it, which would make life hell for programmers.)

Thus, in the above example, if we call f3(a=1, b=2) it's the same as if we called f(a=2, b=1). Trippy, dude!

Where this starts to get painful is when we try to represent this information in the reflection metadata. As you know, one of my challenges has been to avoid a combinatorial explosion of metadata that gets generated along with a class or function. One way to do this is by sharing definitions whenever possible, so that two functions with the same type can point to the same type definition. Thus, in an ideal world, all functions with the type signature f(:int) could point to the same definition. However, if we include the parameter names along with the type, then the amount of sharing diminishes greatly, since it's unlikely that two functions will have both the same parameter types and parameter names.

Now, all this being said there's an alternative strategy I could have chosen, which is not to make the parameter names part of the type. In this strategy, when a function is called, the parameter assignment logic looks beneath the type to the actual function declaration being called. In the case of function pointers, where there is no underlying function declaration, keyword arguments simply wouldn't be allowed. I suspect that this would cause even more confusion than the case outlined above. However, it would have greatly simplified the logic of function types.

Actually, this is not strictly speaking true: every function type ultimately derives from a declaration somewhere - even a function value which is the result of some complex expression still has a type declaration which ultimately leads back to some variable or function declaration. The reason this is true is because even though functions can be created dynamically, function types can't be. So it might be possible for the compiler to examine an expression of function type, and by diving deep enough in the expression it could find the original function declaration. I'm not sure this strategy is any different than having the information be encoded in the type, nor am I certain I could make the compiler smart enough to figure this out in all cases.

Monday, November 29, 2010

Integer parsing checked in

Tart programs can now convert strings to integer types:

let n = int8.parse("10");

let m = int32.parse("fff", 16);

Features:

The implementation is pure Tart - no calling C library functions. It is also efficient :)
Radix conversions from 2 to 35 are supported.
Overflow is properly handled, even for 64-bit ints - so for example if you attempt to convert 9223372036854775809 to a 64-bit signed int, it will throw an OverflowError.
Bad input strings throw an InputFormatError.
Currently supports all integer and boolean types.

Character and float types are not implemented yet. I probably won't be able to do a pure-Tart implementation of float parsing, because that's hard to get right. So I'll probably end up having to call strtod(). Unfortunately strtod() has much less restrictive input conditions that I'd like (allows junk characters at the end, for example.)

There's also a "tryParse" function in the "Strings" namespace for each numeric type that returns either a numeric value or a conversion error:

def tryParse[uint8](s:String, radix:int) -> uint8 or ConversionError;

Note that the reason for naming it 'tryParse[uint8]' instead of 'tryParseUint8' is to make it easier to work with templates.

Thursday, November 18, 2010

More Closure Fun

Here are some more closure test cases that work:

// Test the closures can be passed as parameters

def testClosureAsParam {
  var i = 0;

  callTenTimes(fn { i += 2; });
  Debug.assertEq(20, i);

  var s = StringBuilder();

  callTenTimes(fn { s.append("A"); });
  Debug.assertEq("AAAAAAAAAA", s.toString());

}

def callTenTimes(f:fn) {
  for i = 0; i < 10; ++i {

   f();
  }

}

// Test that nested closures work.

def testNestedClosures {
  let makeCounter = fn n:int -> fn -> int {

   return fn -> int {
   return n++;

   };
  };


  let c = makeCounter(2);

  Debug.assertEq(2, c());
  Debug.assertEq(3, c());

}

// Test that an inner closure can access a variable two levels out.

def testNestedClosures2 {
  var i = 0;

  let makeCounter = fn -> fn -> int {
   return fn -> int {

   return i++;
   };

  };

  let c = makeCounter();
  let d = makeCounter();

  Debug.assertEq(0, c());

  Debug.assertEq(1, c());
  Debug.assertEq(2, d());

  Debug.assertEq(3, d());
}

One thing that does not work yet is closure variables that are large structs or tuples - basically any local variable that is not an object reference or a primitive type. It shouldn't be hard to fix, however.

Wednesday, November 17, 2010

Closures

I decided to take a break from the low-level work of debugging and garbage collection, and work on some language features, which is a much more pleasant activity. It only took a couple of days to get closures working. Here's one of the unit tests, for example:

def testClosureCounter {

  var a = 0;
  let f = fn -> int {

   return a++;
  };

  Debug.assertEq(0, f());

  Debug.assertEq(1, f());
  Debug.assertEq(2, f());

}

The mechanism behind closures is fairly efficient: The closure environment is a subclass of Object that has a member slot for each captured variable. For 'let' variables (which cannot be changed after the initial assignment), the slot contains just a copy of the variable. For 'var' variables, the slot is a reference to a shared closure cell, whose type is tart.core.MutableRef[T], so that both the inner and outer functions can modify the same variable.

The closure itself is passed around as a pair of pointers - the pointer to the executable code, and the pointer to the environment. Note that closures are type-compatible with bound methods - that is, anywhere you can put a bound method, you can also put a closure, assuming that it takes the same argument types.

Saturday, November 13, 2010

Coalescence

I spent the last several days experimenting with function-merging in the compiler, and the results are encouraging but not spectacular.

Take for example the following function in class ArrayIterator:

def next() -> ElementType or void {

  if (self.index < self.array._size) {
   return self.array._data[self.index++];

  } else {
   return;

  }
}

The type 'ElementType' is a template parameter. The interesting thing about this function is that it really doesn't care what ElementType is - it doesn't call any of it's methods or examine it's variables. Really, the only thing it cares about at all is it's size. For most arrays, that size will be the size of a pointer.

There's no reason, then, why ArrayIterator[String].next() and ArrayIterator[Object].next() can't be the same machine-code function. What I've managed to do is create a FunctionMergePass class that compares two functions and determines whether or not they can be merged. Currently I do not attempt to merge every function automatically, instead the merging behavior is manually triggered by the presence of the @Coalesce attribute on the class.

The pass works by walking the code-flow graph of both functions in parallel, and checks each pair of expression nodes to see whether or not they are equivalent in terms of their effect. If the merge pass finds any discrepancies between the two, the merge operation is canceled, and the function marked as non-mergeable.

Most expression types are trivial to compare, although some are more interesting. In the above code example, there's an implicit cast from 'ElementType' to 'ElementType or void' in the first return statement - in other words, it's constructing a union type and returning it. The merge pass in this case checks to see if the two union types are equivalent, which they are - 'Object or void' produces the exact same binary structure as 'String or void'.

The most common cause of merge failure is due to the function calling another non-mergeable function. Take for example this method in the template ArrayList[ElementType]:

def insert(index:int, e:ElementType) {
  Preconditions.checkIndex(index >= 0);

  Preconditions.checkIndex(index <= _size);
  grow(1);

  ElementType[].copyElements(_data, index + 1, _data, index, _size - index);
  _data[index] = e;

}

Everything in this function is mergeable except for the call to 'grow'. The reason 'grow' isn't mergeable is that it sometimes allocates a new Array instance, and calls to 'new' are generally unmergeable because the exact type of the object being created needs to be known. (There's a loophole around this which I have thought of but have not implemented - if the object being allocated is the same type as 'self', then we can just use the type information from 'self' instead of referring to the type information via a static constant.)

However, there is a way we could make the call to 'grow' mergeable - at a slight performance penalty. The answer is to make the call virtual. Right now it isn't virtual, because the "ArrayList" class is marked "final", which means that the compiler is smart enough to call 'grow' directly instead of dispatching through a virtual function table. (For those of you unfamiliar with Java, marking a class 'final' means that it cannot be subclassed, which allows the compiler to make significant optimizations.)

If the call to 'grow' were made virtual, then even if we merge ArrayList[String].insert() with ArrayList[Object].insert(), the correct version of 'grow' will be called in either case because it depends on the type of 'self'.

The question is whether or not this is worth doing, which comes down the question of what's more important, speed or size? At the moment, enabling the function merging behavior on the SwitchStmtTest unit test results in a 7K reduction in the size of the executable file. 7K out of 350K is not all that much.

Thursday, November 11, 2010

Tart: Still Too Large

Although the new reflection system has managed to reduce the size of the executables somewhat, I'm still concerned about the overall size.

For example, the unit test SwitchStmtTest weighs in at about 350K binary executable size. This is with optimization turned on (including dead code and global elimination), and with all debugging info stripped. That seems quite large to me - I'd like the size to be more on the order of 50K (that's my rough guess at how large an equivalent C++ program would be.) Note that with optimization off and debugging turned on, the size balloons to a whopping 1.5 megs.

I decided to do an analysis to see what exactly was taking up so much space. I used 'nm' for this purpose, along with a Python script that computed the size of each symbol in the object file based on the address deltas. These numbers are very rough, since there's a lot of stuff in the object file that is not accounted for. I'm only using the numbers as an order-of-magnitude guesstimate of sizes.

The contents of the object file can be broken down in several ways. Here's a breakdown by package:

tart.atomic: 16
tart.collections: 24712
tart.core: 37875
tart.gc: 288
tart.reflect: 34823
tart.testing: 3536
tart.text: 6179
SwitchStmtTest: 4435
Total: 111864

The numbers represent the total size, in bytes, of all symbols whose names start with the package name.

One of the first things to notice is that the total is only about a third of the size of the object file. I'm not at all sure why this is.

Also, I'm curious why tart.atomic was even included at all, given that nothing refers to it.

In the case of templates, the template expansions are charged to the package defining the template, not the package that instantiated it. In fact, template expansions account for a large portion of these numbers. Two templates dominate:

class tart.core.Array: 26879
class tart.collections.ArrayList: 13840

This leads me to believe that I need to think very hard about how to reduce template expansion sizes. I tried Nick's -mergefunc LLVM pass (which coalesces functions that produce identical machine instructions), and it had very little effect. That may be due to a bug in the -mergefunc pass, or it may be that there are in fact tiny differences in the generated code for different templates.

Another way to break down the numbers is by symbol type. For example, if I add up all of the static metadata for reflection, the total comes to 27645.

Here's a breakdown of some of the symbols (only items > 500 bytes are listed):

Non-reflection data (dynamic dispatch tables, instance allocators, etc.)

Constant tart.core.String instances: 16256 (Average size 66)
Type info blocks: 6784 (average size 45)
Type base class lists: 1952 (average size 33)
Interface dispatch functions 3168: (average size 36)
Compiler-generated instance allocators: 800 (average size 38)
Trace tables for garbage collection: 736 (that seems low!)

Reflection-related data

Type adapter functions (for boxing / unboxing reflected calls): 20912
Class and module reflection structures: 13216
Compressed name tables: 7864
Field offset tables: 1080
Tables of pointers of types referred to by other types: 2400

Here's my list of take-aways from all this:

64-bit code is really fat. Even a trivial function can be >1K
Template expansions are taking a large amount of space.
I can probably optimize the type adapter functions. (These are functions generated by the compiler that auto-box and unbox function parameters.)
The refection structures for classes and methods can probably be trimmed a bit if I am willing to use void* pointers (means I can combine multiple tables into one.)
It may be possible to reduce the number of generated strings somewhat.

Coming up with a solution for template expansions is going to be the hardest. Ideally, we want to re-use the generic code as much as possible - so instead of separate definitions for tart.core.Array[Object].Iterator.next() and tart.core.Array[String].Iterator.next() - which do exactly the same thing - we can point them both to a common definition. The problem is that not all template methods are type-independent. For example, the tart.core.Array[Object].copyOf() method allocates a new array instance, and the type of that instance will be different for different template instances of copyOf().

Each template class method will be impacted to a greater or lesser degree by variances in the type of the template parameter. We can group functions in to classes by the degree of this impact:

No impact - the function is exactly the same regardless of the template parameter. An example is the method that returns the length of an array, which is the same regardless of the type of the array element.
Different type sizes - for example, the array copyElements() treats all the array elements opaquely (it just memcpys them) so the only thing it cares about is the size of the elements. Thus, all pointer arrays are the same as far as it is concerned.

For these cases, the solution is to generate a version of the function for each distinct size - so you have a version for 1-byte array elements, 2-byte, 4-byte, and so on.

Parametrically polymorphic - this means that difference instances of the method do the same thing, but with different types - so for example, one version might add two ints, while another adds two floats.

One possible approach would be to somehow (I'm not sure how) to migrate the type references outside the function, possibly in the static class data - so for example, when copyOf() creates a new instance, instead of hard-coding the type, you look it up. However, this seems pretty complicated, and I'm not sure how to do it.

Ad hoc polymorphic - this basically means "anything goes" - two different versions of the same template method might end up with completely different behavior. While functional language purists frown heavily on such things, in practice there are a few cases where this can be useful, especially in the metaprogramming domain.

An example of how the code can be different: Suppose we have a type A which defines "foo" as a constant, and type B which defines it as a variable - if a template "Example" has a method method "bar" which operates on the "foo" property, then the generated code for Example[A].bar() will be very different from the generated code for Example[B].bar().

Detecting which group a method falls into requires tracing through the code flow graph of the function, and analyzing each expression, type, and value to determine how that item will vary based on the template's parameters.

In a language with a JIT, the solution to these problems is simple - you instantiate the template instance methods lazily. So for example, if no one ever calls tart.core.Array[String].copyOf(), then just don't generate that method. The problem with a language that generates native binaries (and this applies to more than just templates) is that everything has to be generated in advance, because you don't know ahead of time what will and won't be used.

A final issue that I need to consider is "when?". Do I need to address the size issues right now, or can it wait until more language features go in? I know that the common wisdom is "optimize last", but at some level a program can perform so poorly that no one will consider it worth the trouble of using, let alone optimizing. I know for example that a typical XBox developer would never bother to use Tart in it's current state, where program size is a critical issue (there's no virtual memory on the XBox and games tend to use all of memory.) Also, I don't like building on shaky foundations.

Tuesday, November 9, 2010

Intervals in Gosu

Gosu has an interesting syntax for open and closed intervals:

1..10 // Closed interval, i.e. the values 1-10 inclusive.

1|..|10 // Open interval, the values 2-9

Having an easy syntax for open and closed intervals solves a number of problems I've been having: Sometimes you want intervals to be closed, other times half-open. For example, Python slice objects are half-open, and would be much less useful if they were fully open or fully closed. The interval notation would make it easy to do Python-style slices:

let s = "Hello, world!";

Console.out.printLn(s[0..|5]); // Prints "Hello"

Tart status update

I've been making substantial progress over the last couple of days, which is a welcome change after a couple of frustrating months.

The last few months have seen a lot of hard effort, with very little to show for it. I'd been focused primarily on getting debugging working - being able to inspect Tart data structures in the debugger. Unfortunately, it seems that LLVM's facilities for generating debug information aren't very robust - if you make even the smallest mistake in calling the debug factory functions, it will silently write out bad debugging data without telling you. In addition, the available tools for inspecting and checking DWARF debugging data are very primitive, producing at best a raw text dump of the debugging structures, requiring you to manually trace the cross references between debug entries. Nor is gdb particularly helpful in terms of it's error messages. Add to this the fact that DWARF itself is extremely complex, and that even a small program will produce many kilobytes of debugging data, and you can begin to see why it takes so long to get this stuff right.

The key to breaking this logjam was my attending of the LLVM developer's conference last week. I was able to talk directly to the people working on the parts of LLVM that deal with debugging info. I was able to clarify a number of critical points which I had not been able to do over email conversations. (I've noticed that any email question sent to the llvm-dev mailing list has approximately a 50% chance of being answered, so getting answers takes persistence.)

It took me a few days to be able to apply everything that I learned at the conference, with a lot of trial and error involved. During this time, I was at the Hacker's Conference, and I spent about half my time at the conference staring at my laptop working on Tart. Although I felt bad about not paying attention to the conference sessions, I really felt driven to try and get this stuff working.

As a result, the debug stuff is working much better. I can now set breakpoints, examine function parameters and data structures (although examining local variables is still not working).

In addition, I've also made progress switching over to the new, compressed reflection metadata. The old system has been completely removed, and the unit test framework has been switched over to use the new system.

In other news, I read last night on Slashdot the announcement of the Gosu programming language, which has a lot of similarities to Tart. In fact, it's quite discouraging to me to see how many of my ideas have been developed independently - discouraging because they have a shipping product and I don't. That being said, there are a few things that I think they didn't get right, and a few things that seem faddish, but none of these are serious.

I think the biggest difference between Gosu and Tart is that Gosu targets the JVM, whereas Tart produces native binaries. I'm not sure which of those two paths will prove to be smarter in the long run. Obviously, it's a lot easier to write a new language for the JVM, since you get a lot of stuff for free - garbage collection, a huge class library, and so on. The benefits to producing native executables is more subtle - a quicker startup time, ability to run on platforms that don't have a JVM, and so on.

There are also a number of downsides to native executables - for example, a lot of stuff has to be pre-generated (like type adapter functions for reflection) which can be generated dynamically as needed in the JVM. This potentially makes the native executable much larger than the equivalent Java .class file. Also, there's no way to do HotSpot-style profile-guided optimization, which you could do with a JVM.

In any case, each time someone comes out with a programming language that is similar to Tart or fulfills a similar need (Go, Gosu, Scala, etc.), it means that the chances of Tart achieving any sort of widespread adoption gets smaller. Nevertheless, I'm not ready to quit yet... :)

Monday, October 25, 2010

Switch statement syntax

Here's a few minor design puzzles. C's "switch" statement requires a "break" statement at the end of a switch case. I've long felt that this was a design misfeature, and while it is useful for some clever hacks, it's also a source of much trouble. I decided instead to make each case a first-class block. Here's a sample of what switch statements look like in Tart:

switch value {

case 1 {

// something

}

case 2

case 3 {

// something

}

else {

// something

}

Note that cases 2 & 3 go to the same block. Syntactically, the rule is that a case value can be followed by one of two things: Another case value, or the start of a statement block.

One alternative syntax I considered was to use a comma to separate case values: "case 2, 3". However, I discarded this for two reasons: first, because it looks ugly if there are a lot of case values and you need to break the line, and second, because I eventually want to be able to have case values that are tuples.

However, the lack of a delimiter after the case value is a bit troubling. I've discovered at least one bug where I typed a case value and forgot to write the subsequent block, which resulted in the executing falling through to the next block. If a delimiter were required, then the compiler would have caught that.

There's also a problem with the use of the keyword 'else' instead of 'default' - sometimes I like to put the default case first instead of last, and that doesn't read as well with the word 'else'.

Finally, there is the issue of the 'functional-switch'. A lot of functional languages have a switch statement which returns a value (let n = switch value { ... }). I've been wondering if Tart should have something like this. However, it's not clear to me how you would indicate which value to return in a block, especially in an imperative language which supports multiple statements. The choices are (a) return the value of the last statement in the block (if there is one), (b) have a special keyword for returning a value of a block. Both approaches have issues.

Having a functional-switch implies that there should also be a functional-if as well, which is nothing more than the C/Java/Python conditional operator. Tart doesn't have or need such an operator, since the 'cond' macro does the same job without needing special syntactic sugar. However, doing a switch statement as a macro is a little more difficult, since the number of arguments to the macro can grow arbitrarily large.

Tart status update

I've been working pretty hard on Tart the last few weeks. I've been focused on several areas:

DWARF debugging - There's not much progress in this area, despite my best efforts. Lately I've been attacking the problem from a different direction, which is to produce a reproducible test case that doesn't require the entire Tart infrastructure to be present. Unfortunately, doing this requires fixing a bunch of minor issues which would not otherwise be high priority. For example, the Tart linker fails unless optimization is enabled, due to a bug in the LLVM lowering pass for garbage collection intrinsics. Unfortunately, I need an unoptimized test case if I want to get the LLVM experts to help me with my problem. So I have to fix the second problem in order to even start working on the first.

Garbage Collection - The stack crawler is completely working now. The next step is to write a very simple test collector and a test for it.

One issue that concerns me is that I want Tart to be able to support pluggable collectors like Java does. Java has the ability to specify a collector when you launch the program - in other words, the selection of collection algorithm can be deferred until the program begins execution. In order to do this, I'd need some sort of plugin or loadable module system, something I haven't even begun to think about.

The best I can do for the moment is to make the collector algorithm a link-time decision, that is to have different static link libraries for different collectors.

Reflection - I've been working hard on the new reflection system, the one that uses compressed metadata to describe all of the various classes and methods in a module. Right now the compiler emits the metadata for both the old and new systems, which makes the resulting object files much larger than I would like them. Unfortunately, the new system isn't yet up to the task of replacing the old system. For example, the "execute by name" feature - which is used by the unit test framework - isn't ready yet.

I've also been plagued by a number of strange anomalies. CMake has been acting weird lately - a build rule that was working fine for several months suddenly stopped working. I didn't notice at first since the bug only manifests on a clean build. Although CMake is a great tool, there are certain kinds of build rules which really ought to be simple, but which are nearly impossible to do without truly ugly and non-intuitive work-arounds. For example, it's nearly impossible to define an add_custom_command() rule which has a dependency on a build product which was produced in a different directory.

Sigh...it seems like the last 3-4 months has been spent dealing with low-level details of LLVM, DWARF, and so on - I've spent virtually no time thinking about the Tart language per se, which is what I really want to be working on.

Thursday, September 16, 2010

More on TraceTables

I realized that my object-tracing scheme, as outlined in the last posting, was unable to handle an important case: unions. Unions can contain both pointer and non-pointer types, and since I'm planning on building an accurate rather than conservative collector, that means that I'll need to examine the type of data stored in the union in order to determine whether to trace it. Otherwise, you might end up with the case where you store an integer value in the union, but the garbage collector attempts to interpret it as a pointer.

The internal structure of a union depends on whether or not it contains any value types. If a union contains only reference types (i.e. pointers to objects), then the union is simply a pointer. There's no need for anything else because each object type is self-identifying. Tracing this type of union is simple, just treat it like a regular pointer.

If a union contains any value types, however, then the union also contains a "discriminator" field, which is a small integer field that contains a code indicating what type of data is currently stored in the union. The tracer needs to examine the discriminator field in order to determine whether or not to trace the data stored in the union. Doing this will require the compiler to generate a special tracing function for the union. While it might be possible to generate data tables that tell the garbage collector where to look for the discriminator field, interpreting that data structure would make the garbage collector extremely complex. So a function it is.

At the same time, however, I still want most of the tracing to be data-driven rather than code-driven. The reason for this is efficiency and separation of concerns. The part of the garbage collector that handles object tracing is divided into two subcomponents: There is the part that understands the memory layout of all of the types generated by the compiler, and there is the part that implements the various tracing passes needed by the collection algorithm. Because there are potentially millions of pointers that need to be traced, the communication between these two subsystems needs to be very high bandwidth - we want to avoid having to do an indirect function call for each pointer traced. A data-driven strategy allows the the description of the memory layout for the whole object to be passed to the tracing algorithm as a single call, rather than a call per pointer field.

The new tracing strategy is a hybrid of the two methods. For object types which have a fixed, unvarying layout and interpretation, a purely data-driven approach is used. For fields which are variable length (such as FlexArrays), or which have a changing interpretation based on their contents (such as unions), tracing functions are used to augment the purely data-driven scheme.

Let's define some terms:

TraceAction - an object representing a specific tracing pass, which operates on pointer fields of an object. There are two methods for invoking the action: Either by applying it to a pointer field directly, or by passing it a base pointer and a list of field offsets.

Field Offset Table - A table of offsets into an object, specifying the location of pointer fields within that object.

TraceMethod - a method on the object being traced which takes a TraceAction as an argument, and applies the action to the various pointer fields within the object. TraceMethods are used for tracing object fields that are too complex to be described by a field offset table.

TraceDescriptor - an object representing a strategy for tracing a specific object type, although not necessarily all of the pointers within that type. There are two types of descriptors, field offset descriptors, which contain a pointer to a field offset table, and trace method descriptors, which contain a pointer to a trace method. Both types of descriptor have the same size and layout, and can be arranged together in a flat array.

TraceDescriptorArray - an array of TraceDescriptors sufficient to trace all of the fields within an object. For most types, the descriptor array will consist of a single field offset descriptor. For objects which contain flex arrays or unions, additional trace method descriptors will be in the array.

There are three places where TraceDescriptorArrays are referenced:

For classes, the TIB (TypeInfoBlock) of the class contains a pointer to the descriptor array.
For structs, tuples, unions, and other data types that are statically typed, the descriptor array is a static constant which can be accessed by name.
For stack frames, the descriptor array is accessed via the sync point table - this is a table containing all of the program counter addresses where a garbage collection could potentially take place. The idea is that when a collection occurs, you walk up the stack looking at all of the return addresses for each call frame. You then look up the address in the sync point table, which yields a pointer to the type descriptor array for that stack frame.

Here's what a field offset descriptor looks like. It's designed to be exactly 16 bytes, even on a 64-bit system. (On a 32-bit system it will be padded.)

/** Struct containing information on how to trace a type. */
struct FieldOffsetDescriptor {

  /** If non-zero, it means that this is the last descriptor in the list. */
  readonly var endList:uint16;

  /** Number of fields in the field offset table. 0 = method descriptor. */

  readonly var fieldCount:uint16;

  /** Offset from base for this descriptor. */
  readonly var offset:uint32;

  /** Table of field offsets. */

  readonly var fieldOffsets:Address[uint16];
}

A trace method descriptor looks the same, except that the last pointer field will contain a function pointer rather than the field offset table.

Friday, September 10, 2010

FlexibleArray and GarbageCollection

At the moment, I'm working on two different branches of Tart. The branch on my iMac is all about solving the stupid dwarf debugging problems. The next step is to download and build a copy of dwarfdump2 on my machine so that I can debug it when it segfaults as a result of attempting to read a Tart-generated binary. Ugh.

On my Thinkpad, I'm working on a different line of development, involving garbage collection. I'm trying to modify LLVM to be able to generate stack frame descriptors that handle local variables that aren't just primitive types. This turns out to be somewhat easier than I was afraid of, now that I have taken the time to actually understand the LLVM code for generating stack roots.

Tracer Tables

One issue that I am still pondering is the generation of tracer tables. A tracer table is simply an array of the offsets of all of the traceable pointers within a data structure. The idea is that the garbage collector has a base pointer to an object, and then obtains a tracer table for that object, which it can then use to locate all of the pointers. For user-defined classes, the object's TIB (TypeInfoBlock) contains a pointer to the tracer table:

class Object {

const TypeInfoBlock * __tib;

// (more members follow)

};

class TypeInfoBlock {

const long ** __tracerTable;

// (more members follow)

};

(I'm writing these examples in C++ for clarity.)

In the case of struct types, there's no TIB pointer, so the compiler needs to determine which tracer table to use at compile time. Fortunately this is easy to do, since structs are always contained within a context - either they are local variables or they are embedded within another object type.

Tracer Functions

An alternative to tracer tables is to generate a tracer function for each type. A tracer function is simply a compiler-generated function that iterates through all of the pointers in the data structure and performs some garbage-collection action on each one.

Let's take a simple mark-and-sweep collector as an example. During the mark phase, we need to trace through all of the pointers in the object and mark each object pointed to. So the compiler could generate code to do this, one tracer function for each data type.

One problem with this is that it means that the compiler has to know details about your garbage collection algorithm. Another problem is that for a sophisticated, multi-threaded collector, there will be more than one type of trace pass. Often there will be several passes, with the last being a "fix-up" pass which deals with mutations that have occurred while the collection cycle was underway. In order to avoid generating a different tracer function for each different trace pass, we would need to generalize the tracer function - have it take a parameter telling it what action to take, probably in the form of a functor argument that is used to process each pointer field.

Using a functor has a number of efficiency disadvantages: First, it means that you now have an extra pointer dereference (the functor pointer) for each and every pointer field processed. Secondly, some tracers need to actually modify the pointer values, while others can get by with merely reading the pointers but not changing them. The tracer function, unfortunately, would need to assume the worst - that the functor was always going to modify the pointer being traced.

From a caching and memory locality standpoint, using a tracer table is better (not to mention more compact) - the garbage collector can simply iterate through the table entries in a tight loop, and perform whatever action it wants in the body of that loop.

The one drawback of tracer tables is that the structure of the object must be known at compile time in order to generate the table. But that's not a big hardship, right?

Flexible Arrays

In C++ and other languages, the notion of a "variable-length" type is extremely useful. So useful, that in the case of Tart, there's a built-in data type designed to support this use case: FlexibleArray.

Here's an example of you you might use the FlexibleArray data type:

final class String {

var size:int;

@Unsafe var data:FlexibleArray[byte];

}

A FlexibleArray is simply a way to tell the compiler that the size of the member is unknown at compile time. It's similar to declaring a zero-length array in C++. It is illegal to declare a FlexibleArray-typed member that is not the last data member of the class, and the class must be declared 'final'. You also need to declare a custom allocator for the type:

final class String {

static def create(length:int) -> String {

let s:String = __flexAlloc(length);

s.size = length;

return s;

}

var size:int;

@Unsafe var data:FlexibleArray[byte];

}

In Tart, if a class has no constructor, but does have a static method named "create", that method is treated as a factory for the type. So you can say "s = String(10)" and it will call "create" just like a constructor. The difference is that a constructor is passed a pre-allocated instance of the type, with the TIB pointer filled in but all the other fields uninitialized. "create", on the other hand, is responsible for actually allocating the instance as well as initializing it.

The __flexAlloc function takes an integer length, and calculates how big of a memory block is needed to hold both the fixed fields of the type as well as the flexible array elements. It then allocates the memory, and fills in the TIB pointer.

(You may be wondering how __flexAlloc knew what type of object to allocate, in this case String. The answer is that the type inference engine deduced it from the type of the left hand side of the assignment statement. Cute, eh? We could have just as easily said "let s = __flexAlloc[String](length)").

Tracing Flexible-Sized Objects

So far, so good. However, we get back to the issue of garbage collection: How do we generate a tracer table for a variable-length object? The compiler doesn't know the length of the FlexibleArray, since that was determined at runtime.

You might think that the collector could cheat and look at the memory allocation header (which comes *before* the object), however this fails for two reasons: statically-allocated objects don't have a memory allocation header, and not all of the array elements may be initialized. A class might choose to use a FlexibleArray for "reserve capacity" - similar to what happens in an STL vector.

Really, the only entity who knows which elements of the array should be traced is the class itself. That means that we need to give some control of the tracing over to the class.

If we were using tracer functions, it would be relatively simple - we would just override the __trace() method. (Although even that doesn't entirely solve the problem as there's an interesting race condition. If our collector is operating on another thread, and we update the "size" field after writing additional elements to the FlexibleArray, the collector thread might get the older version of "size", which would mean that the newly written entry won't get traced. This is where the post-cycle fixups come in.)

Another drawback of using a tracer function is that now the class becomes responsible for tracing *all* of the fields in the class. You could use a combination of tracer tables and tracer functions, but that would mean implementing each tracer pass (as determined by the collection algorithm) twice - one version that works with tables and another version that works as a functor.

One (somewhat hacky) data-driven approach would be to use an annotation to tell the compiler which data field is used to store the size of the array:

final class String {

var size:int;

@Unsafe @SizeVar("size") var data:FlexibleArray[byte];

}

Essentially the compiler would see the annotation, look for a member variable named 'size', and use that to generate additional metadata. The compiler would need to generate four additional pieces of information:

The offset of the 'size' variable.
The integer type of the 'size' variable.
The tracer table for an individual array element.
The stride of an array element - that is, the difference in bytes of element N from element N-1.

Even this has problems. It means that the interpretation of 'size' is fixed by the compiler. You can't use begin/end pointers as is done in STL, you have to use integers.

Another example is array slice objects. Right now the Tart String class has a dual role: It can be used to represent a string of characters, or it can be use to represent a substring of another string. (For efficiency, the same class is used for both cases - otherwise String would have to be non-final, which would suck from a performance standpoint.) The 'size' field can mean either the size of the whole string, or the size of the substring. This is incompatible with a fixed interpretation of the field.

A third possibility is to have a special member function, say flexSize(), which returns the size of the flexible array and tells the collector how many entries in the array to trace.

final class String {

var size:int;

@Unsafe @FlexSize("flexSize") var data:FlexibleArray[byte];

def flexSize() -> int { return size; }

}

This is the best compromise solution I have come up with so far.

Friday, September 3, 2010

Tart status update and unsolved problems

I've been making fairly good progress on the reflection stuff. At the moment, I have both the old system and new system running in parallel, working towards a point where the old system can be completely deprecated.

There are two long-standing problems that I have not yet been able to solve, both of which are related to LLVM.

The first big problem is generating DWARF debugging info. I've put several man-months into this over the course of the past several years - I recently did a near-total rewrite of the code for generating source line information - and it still doesn't work.

Part of the problem is that the LLVM docs on source-level debugging, while extensive, are also vague and ambiguous in many places. I find that often times the only way to understand how things work is to cross-reference between the docs, the clang source code, and the LLVM source code. Even then, there are a lot of things aren't clear, especially when the code in clang contradicts what's in the doc. For example, according to the source-level debugging doc, the signature of a function is defined using an array of formal parameter descriptors using the DWARF tag DW_TAG_formal_parameter. However, according to Google code search, the identifier "DW_TAG_formal_parameter" appears nowhere in the clang source tree, and if you trace through the code that is used to generate a function type descriptor, you see that in fact the array is an array of type descriptors, not parameter descriptors.

Another difficulty with DWARF symbol generation is the difficulty of isolating a problem. If you make a mistake in your generator and then attempt to debug the compiled program, gdb prints various obscure error messages telling you that your debug info is invalid, but doesn't tell you where in your code the error lies or what debug symbol is having a problem. There's also dwarfdump, which pretty-prints the debug info for your program (typically thousands of pages worth, so good luck finding the problem by eye.) The easiest way to use use dwarfdump is to have it print out all of the debugging info using the -a option, and hope that it segfaults (which it usually does) when it gets to the point where the problem is. Based on the last few lines of printout before the segfault, you can sometime deduce which symbols are having problems, although if your problem lies in a call frame descriptor, this won't work because call frame descriptors aren't self-identifying - you can look at the text dump of a call frame descriptor, but it won't tell you which function's call frame is being described.

Tonight I am going to start experimenting with readelf, although that requires me getting it running on OS X.

The other big problem I have is with llvm.gcroot. This is the pseudo-function in LLVM which is used to mark a local variable as a garbage collection root. A linker plugin (which I have written for Tart) collects all of the calls to llvm.gcroot and uses that information to build the stack frame descriptors - a table which the garbage collector can use to trace the stack. The actual call to llvm.gcroot is removed, although it can have an effect on certain optimization passes.

The llvm.gcroot function has a curious limitation: It only works on data values that are pointers that have been allocated with a local alloc() instruction.

Typically, the call frame for a function contains pointers to objects on the heap. llvm.gcroot isn't responsible for the pointers which are contained inside of heap objects - that's the job of your collector. Instead, llvm.gcroot only deals with pointers that are on the stack. (Note that it does not deal with pointers that are SSA values, i.e. only in registers, since there's no way for a trace to access those anyway. Any pointer must have a location on the stack where it can be traced.)

The LLVM 'alloca' instruction is used to reserve a stack slot for a variable. So typically you would have a stack slot containing a pointer, which points to some heap object. The stack tracer then iterates through all of these stack slots (only the ones containing pointers to heap objects) and trace the objects pointed to.

The problem comes when you have a stack slot that contains pointers, but isn't a pointer itself. An example would be a small structure that contains several pointers. Because the structure is not a heap object, it needs to be traced by the stack tracer. Unfortunately, llvm.gcroot won't accept a stack slot that's not a pointer - it simply emits a fatal error when you try. Nor can you call llvm.gcroot on the members of the structure either - the argument to llvm.gcroot must be the result of an alloca instruction (a stack slot) and not a member of a stack slot.

Now, this is not a problem for languages like Java - Java only allows stack slots for primitive types and pointers. (There's no such thing as a 'struct' in Java). It's not a problem for C and C++, which don't have garbage collection. It *is* a problem for C# and D - those compilers get around the problem by simply not using LLVM's garbage collection framework at all. I don't want to do that because it requires writing a whole lot of code that is specific to a each different CPU architecture, which I'm not up to.

So far the only helpful suggestion from the LLVM mailing list is to allocate an extra stack slot for each local struct variable, containing a pointer to that struct. You would then declare the pointer as a gcroot instead of declaring the struct directly. However, I've been hesitant to go this route because it seems like an ugly hack, not to mention that there's now an additional 8-byte overhead for each local structure variable. Also, it makes the tracer more complicated, because now you have to deal with pointers to stack objects as well as pointers to heap objects.

I've thought about digging in and changing the way llvm.gcroot works. But the code is complicated and I don't entirely understand it, and I don't like mucking around with things I don't understand.

Monday, August 16, 2010

Propitious Properties

Tart's property system is modeled after C#, and to a lesser extent Python. A property is defined with the following syntax:

def length:int {

get { return self._length; }

set { self._length = value; }

}

The idea behind properties is that you use them just as if they were member variables, but reading or writing the variable actually invokes the getter or setter method:

return obj.length; // Calls the 'get' function

obj.length = 1; // Calls the 'set' function

Properties are an improvement over Java's 'bean' reflection system, which I will explain. In Java, you can introspect an object to determine what properties it has. Properties are derived by looking for methods named 'getXXX' and 'setXXX':

A method named 'getFoo' is presumed to define a property named 'foo' - the 'get' prefix is stripped, and the initial character of the property name is converted to lower case.
The return type of the method (in the case of 'get') or the argument type (in the case of 'set') is presumed to define the type of the property. (The getter and setter type must match.)

Here's why a formal syntax for defining properties is superior to Java's syntax:

Doing textual manipulation of the method to derive the property name is ugly.
If you want to @Annotate the property, in Java you have to annotate either the getter or the setter, whereas in C# you can annotate the property directly. So for example, if you want to annotate a property as being non-serializable, for example, which method do you put the annotation on? In C# it's unambiguous.
In C# the reflection system can give you a list of properties directly, whereas in Java you have to use a BeanIntrospection class which compiles the list of properties derived from examining the methods.

However, I've discovered that properties also have some drawbacks. These drawbacks have less to do with properties directly, and more to do with the style of code that results when properties are used:

Name collisions. Often you will have a private member variable that is exposed through a property getter. Unfortunately, this means that you need to come up with distinct names for both the variable and the property. Up to now, I've been using '_name' for the internal variable, and 'name' for the property, however I don't really like using underscores in this way (my preference is to let people name things as close as possible to readable English, and not require funny prefix or suffix characters - I strongly feel that if the the programmer has to write special prefix or suffix characters onto more than 20% of the identifiers in a program, then the language design is deficient.)

In Java, this is not a problem, since 'getFoo' is always going to be named differently than 'foo'.

Chaining. A very common code pattern in Java is the use of chained setters, each of which returns 'this' as a return argument: object.setName(name).setDescription(desc).setKey(key)...and so on. This is a very convenient pattern which I use all the time in Java, but unfortunately, it doesn't work with property assignments, since those don't return a value. You need to repeat the base expression ('object') once for each value set.
I don't like taking the name 'get' and 'set' as keywords. These two names are very commonly method names in Java (See Provider.get() in Guice as an example). "get()" is often used when you have some class that defines a single value that it can produce, such as a factory or cache. I'd hate to break up a nice naming convention :)

I also wonder if it's a good idea to re-use the 'def' keyword to define a property. Syntactically it's unambiguous because of the colon, but the code might be more readable if I changed it to use the word 'prop' instead.

Tart status update

I've been busy this last week re-doing Tart's reflection data structures. Unlike the previous scheme, which used fully-realized, statically-compiled Tart objects to represent the various types, method descriptors, and other metadata needed for reflection, the new scheme uses a highly-compressed byte-encoding of the data. The compressed data is converted into objects lazily on first access.

By the way, all of this lazy evaluation makes heavy use of Tart's 'lazyEval' macro, which is part of the core library:

return lazyEval(<variable>, <initialization-expression>);

Where 'initialization-expression' has type T, and 'variable' is declared as having type 'optional T'. Basically, lazyEval checks to see if 'variable' is initialized, and if not, it initializes it using 'initialization-expression'. The initialization expression is not evaluated if the variable is already initialized.

At some point I'll need to also do a 'lockedLazyEval' which is a synchronized version. But before that can happen I'll need mutexes (mutices?) to be implemened. That's going to be a bit tricky, for a number of reasons. For example, Tart objects can move around in memory, but I'm not sure whether or not it's legal to relocate a pthread_mutex_t. I suspect it isn't. Also, it gets into the whole realm of writing platform-dependent code in Tart (as opposed to platform-dependent code in the C runtime library, which is no problem), which I haven't really worked out yet.

Anyway, lazy evaluation is one of those common code patterns you see everywhere, but it's hard to refactor into a library routine unless your language gives you some way to selectively control evaluation. Macros are one way to do it - closures are another, although they are a bit more wordy.

Another aspect of reflection I am working on at the moment is the "invocation adapters". These are wrapper functions used when calling a method via reflection - it converts the argument list from type Object[] to the native types of the called method's argument list. Each distinct function type requires a distinct invocation adapter - in other words, if two functions take the same argument types, then they can share the adapter function. (Note that the invocation adapter does not handle the 'self' argument - that is handled by a prior stage in the calling process - so two instances methods can share the same adapter, even if they are members of different classes).

When the compiler generates the reflection metadata for a class, it creates a set of all of the unique function types defined by that class. Each of these is assigned an index. Each method definition in the class contains the index corresponding to it's function type, which is encoded using a VarInt, which typically takes one byte. The class metadata also contains a table of pointers to invocation adapter functions, which is also referenced by index. This information is used to lazily construct a method descriptor object, which contains a pointer to both the method itself and the invocation adapter for that method.

In other news, I implemented the Python-style String.join method: So now you can say either ", ".join(list) or String.join(", ", list). The 'list' argument must either be an Iterable[String], or a variable number of String arguments.

Friday, July 30, 2010

More thoughts on Arrays

The more I work on collections in Tart, the more I realize how odd Arrays are.

For example, in my work in Java I almost never use an array type. Sequential containers are almost always ArrayList or ImmutableList. Pythons 'list' class (and 'array' class as well) are much more like Java's ArrayList, in that they can grow as needed.

At the same time, fixed length arrays are needed because the other container type - such as ImmutableList and ArrayList - are implemented in terms of arrays.

Arrays are also problematic in other ways as well. For one thing, there is the dreaded array-initialization problem: In a language where the use of null pointers is heavily discouraged, what do you use for the initial fill value of an array? Not all types have an obvious "default value" instance.

One idea that I have been toying with is the idea of making the basic array class more like fixed_vector: That is, a container with a fixed capacity, but variable length up to that capacity. You could grow the array up to it's maximum size, but in doing so you would have to provide either an explicit fill value, or a list of elements to copy into the array. From an implementation standpoint, this would require adding one extra field to the array, which is the current length. (And moreover, that field would disappear from ArrayList, since it could now use the length field of the underlying array.)

One problem with doing this is that "variable size with a maximum size" is a more complicated concept than just a fixed size. People might be surprised when their arrays start throwing capacity exceeded exceptions.

Wednesday, July 28, 2010

More on templates and reflection

As I mentioned in my last posting, I'm working on a more efficient way to encode reflection data for templates. Rather than pre-generating the reflection data for each expansion of the template, instead I want to store the template definition, and let the expansion be generated lazily at runtime.

Note that under this scheme, the actual compiled methods, and the method dispatch tables, are still being generated by the compiler. It's only the reflection data that isn't generated.

One thing I need to work out still is how to integrate the compiled methods and dispatch tables into the reflection data when it is generated. With non-template classes, this is easy: The reflection data for each method (a static structure generated by the compiler) contains a pointer to that method. In the case of a template method, we need to somehow attach the method pointer to the method descriptor when it is created. This gets tricky because the compiler may have generated multiple copies of the original template method, one for each expansion of the template with different type parameters.

One way to approach this is to have some kind of map of functions: The key consists of all the elements that uniquely identify a function (the name, the arguments, the template arguments, etc) and the value is the function pointer. The trick will be representing the keys compactly.

For the name and template arguments, we can use indices. The name will be an variable-length index into the names table for the module, typically 1 byte. The template are represented as a tuple of types, which itself a type - given types T1, T2, you can create a new tuple type Tuple(T1, T2), and this new type can be assigned a unique index. This index can also be represented as a variable-length integer, and thus can also be (usually) stored in a single byte (it's rare that a module will have more than 128 unique types). The arguments to the method can be treated similarly, with the caveat that method arguments have additional properties than just the type, such as variadic arguments.

So, assuming we can figure a way to encode those additional properties compactly, we're looking at a key size of only a few bytes. Still, that's a minimum of a key and a pointer for each method, and if there's a lot of methods that adds up. Now, some of those methods are already in a table - the dispatch table for the class. It would be nice if we could use that table instead of having to generate a separate table for reflection purposes. Unfortunately, not all methods are in that table - for example, methods declared 'final' or 'static' don't need to be in the dispatch table because the compiler can determine at compile-time which method to dispatch to. And I'd like to avoid making the dispatch table longer, since every subclass has to include the dispatch table of its parent class. Also, methods which are not in the dispatch table can be discarded by the linker if they are never called elsewhere, whereas methods in the dispatch table must be included in the final binary.

One approach would be to create an additional table of only the methods that weren't in the dispatch table. Unfortunately, once we do this, it means that those methods can no longer be discarded by the linker. I'm not sure how to get around this, or even if it's important - one of the perils of reflection is that it breaks the compiler's ability to determine what symbols are reachable from other symbols - if you can access a class or method via it's name, then how can the compiler know whether or not it's called? It has to be conservative and include all potentially reachable symbols, which might be larger than what is actually needed.

(Another possible approach is to use the dwarf debugging information to be able to look up a function by name. However, I want to preserve the option to strip out all debugging symbols from the binary, which would cause reflection to break.)

In any case, assuming we have some small key that can uniquely identify a method, when we expand a template, we need to loop through all the methods, substitute type parameters, generate tuples from those types and look them up in the module's type registry to generate a unique index. Then combine the indices to generate a method key, look up that method in the map, and then use the resulting pointer to fill in the method slot in the method descriptor.

Monday, July 26, 2010

Starting to work on Tart again

My life seems to have calmed down quite a bit and lately I've started poking away at Tart again. It's been about 4 months since I worked on it, and although I've forgotten a few things, it's also given me an opportunity to step back and take a fresh look at some of the problems.

One problem that Tart has is that the binary files it produces are huge. There are a number of reasons for this:

DWARF debugging information
Template bloat
Reflection data

For DWARF, there's not much I can do except generate less code.

For templates, a big win will be gained once I start using Nick Lewycky's function merging pass. This merges LLVM functions which would produce the same sequence of machine instructions. It does this by effectively doing type erasure on pointer variables where possible, and then merging the resulting functions if they hash to the same value. Visual Studio has done this for a long time, and it is an effective way to deal with template bloat.

However, there's another kind of bloat caused by template expansion, which is the bloat in the amount of reflection data - that is, for each expansion of a template, you need to generate a copy of all of the reflection metadata for that class.

Reflection data is large because it needs to describe everything about a type, including the names of all methods and fields. A typical method takes around 128 bytes just to describe it's calling signature. My previous strategy was to combine identical definitions - so for example, if two different classes had a method named "foo", then they would both point to the same string. Similarly, if they had the same return type, they would both point to the same type descriptor struct.

The problem with this approach is that on a 64-bit machine, pointers are 8 bytes. So unless the method or field name is longer than 8 bytes, you're not really saving much by putting a pointer to the name instead of embedding the name directly.

I have two strategies for reducing the size of the template data:

1) Store the reflection data for template instances in an unexpanded form: In other words, instead of generating all of the metadata for each expansion of the template, merely store a reference to the original template definition and the list of type parameters. So for example, for type List[String] we simply store a reference to "List", along with the template parameter list "[String]". The reflection metadata for the specialized type will be generated lazily at runtime the first time someone attempts to access it. I would imagine that the large majority of such type descriptors will never be dereferenced, so the amount of memory consumed will be small compared to the amount required to store all templates in expanded form.

Of course, one drawback of this approach is that we now have to store the original template definition in a runtime-accessible form, which we were not doing before. References to template type variables can be represented by a special "type parameter" type (just as it is in the compiler). Creating an expansion of a template at runtime is a process of cloning all of the data structures of the template, replacing type variables with their associated definitions in the current context.

For purposes of reflection, we don't need the method bodies, just the type signatures of the methods. However, there's a reason why we might want to store the bodies of the methods anyway: To support importation of compiled modules. Right now, whenever the Tart compiler sees an import statement, it loads the source file for that import and (partially) compiles it, which may require resolving that module's imports, and so on. Instead I would like it to work more like Java - the compiler can load in a compiled module instead of having to recompile the original source. (Of course, this will also require the modules to have datestamp and size information so that it can detect when the object module is out of date. In essence the compiler needs to incorporate some of the responsibilities of "make".)

For regular (non-template) classes, we could simply use the existing reflection information that the compiler already generates. In other words, the reflection metadata within a module would have two uses: First, for runtime reflection, and second, for importation into the compiler.

With templates the situation is more complex, because currently unexpanded templates produce no output - a module that defines only templates will produce an empty binary module. In order for the compiler to instantiate a template that was imported from a binary module, the module needs to contain the *complete* definition of the template.

My current thinking is that the template data will be stored in two parts, which are the type descriptors and the method body ASTs. The reason for separating them is so that we can throw away the ASTs at linkage time (since they aren't needed in the final binary.) I will probably store the ASTs using LLVM metadata nodes. The type descriptors will be stored as static data structures, just like they are for non-template classes.

2) The other improvement to the reflection metadata is to transition from a pointer-based to an index-based reference system. Each module will contain a constants pool containing a list of strings, common types, and other information that is referenced by integer index. Variable-length integers will be used to refer to these tables, using an encoding scheme similar to UTF-8. This means that the first 128 entries in each table can be referred to with a single byte index; The first 4096 entries require a 2-byte index, and so on. All of these tables will be sorted in order of by number of references, so very few indices will be more than 1 byte.

Long path names will be stored using a tree-structure. So for example, the string "tart.core.Memory.Address" will be stored as (((tart . core) . Memory) . Address), where (X . Y) is a pair of varints. Here's what the tables might look like:

Strings Pairs

=========== ============

s1: tart p1: (s1, s2)

s2: core p2: (p1, s3)

s3: Memory p3: (p2, s4)

s4: Address

Thus, to refer to the string 'tart.core.Memory.Address', we need only a single byte containing the value 'p3'. (The low bit of the byte determines whether it's a reference to a string or to a pair) Moreover, if there is another string with the same prefix (e.g. "tart.core.Memory.ptrDiff"), then only the last part needs to be added to the table, and the prefix parts only need to be stored once per module.

An additional side-benefit is that each unique string within a module now has a unique numeric ID which can be compared directly without having to do a string comparison (although a string comparison is still needed for cross-module references.)

The same scheme can be used for types. Each type descriptor is a string of bytes, which are a combination of type codes and indices into the module types table.

Of course, with a scheme like this, there will be a lot of duplication of definitions between modules: If module A has a definition of the tuple type (int, int) and module B has the same definition, then unlike the previous scheme they won't be shared. But that's OK, because each definition is only 3 bytes, whereas sharing a common definition would require an 8-byte pointer for each reference.

These highly-compressed type definitions will need to be expanded into data structures in order to be useful. This can be done lazily as types are inspected.

Sunday, April 18, 2010

Tart status

Things have been pretty quiet on this list lately, mostly because I've been taking somewhat of a break from working on Tart, hoping to get my creative batteries recharged. After two and a half years of working on it, I figure a vacation is in order. Also, I've found that stepping away from a project is sometimes an effective way to deal with intractable problems, as you often get a new insight by getting some distance from the problem. In fact, I can already see a couple of minor things I'd like to change.

In terms of garbage collection, I did make a some progress on tracing of stack roots. I was able to write a garbage collection "strategy" plugin for LLVM that lets me generate a stack frame descriptor for each function. For example, take the following test function:

def f3 {

  var a = "S3a";
  var b = "S3b";

  var c = b;
  f4();

}

The stack frame descriptor for this function looks like this (in assembly):

   .align 8

.Lgc_stack1:
   .quad -8

   .quad -16
   .quad -24

   .quad 1

This contains three entries, followed by a end-of-list entry that has the low bit set. It says that there are 3 stack roots at offsets -8, -16 and -24 from the base pointer.

I wrote a test program that simply prints out the stack roots instead of doing actual collections, and here's what I see for this function:

Found frame 0x7fffd6826f50, return 0x40110d
Found var in frame 0x7fffd6826f50, offset -008 addr 0x7fffd6826f48 value 0x408010

Found var in frame 0x7fffd6826f50, offset -016 addr 0x7fffd6826f40 value 0x408040
Found var in frame 0x7fffd6826f50, offset -024 addr 0x7fffd6826f38 value 0x408040

Here's what the assembly for the function looks like:

   .type GCTraceTest.f3,@function
GCTraceTest.f3: # @GCTraceTest.f3

.Leh_func_begin6:
.Lfunc_begin4:

.Llabel48:
# BB#0: # %entry

   pushq %rbp
.Llabel44:

   movq %rsp, %rbp
.Llabel45:

   subq $32, %rsp
.Llabel46:

   movq $0, -8(%rbp)
   movq $0, -16(%rbp)

   movq $0, -24(%rbp)
.Llabel49:

   movq $.string3, -8(%rbp) # GCTraceTest.tart:24:8
.Llabel50:

   movq $.string4, -16(%rbp) # GCTraceTest.tart:25:8
.Llabel51:

   movq $.string4, -24(%rbp) # GCTraceTest.tart:26:8
.Llabel52:

   movq (%rdi), %rax # GCTraceTest.tart:27:4
   callq *88(%rax) # GCTraceTest.tart:27:4

.Llabel47: # GCTraceTest.tart:27:4
.Llabel53:

   addq $32, %rsp # GCTraceTest.tart:28:2
   popq %rbp # GCTraceTest.tart:28:2

   ret # GCTraceTest.tart:28:2

In the above assembly, the label "Llabel47" defines a post-call safe point, meaning that from this function's point of view, a garbage collection could occur during the call that immediately precedes that label. There is a static map that contains an entry for "Llabel47" and maps it to the stack frame descriptor shown earlier. The stack walker uses the frame pointer to find the return address of the calling function, and looks that up in the map. It then gets the caller's frame pointer, and uses the offsets in the stack frame descriptor to find the locations of the variables containing stack roots.