Monday, November 29, 2010

Integer parsing checked in

Tart programs can now convert strings to integer types:

let n = int8.parse("10");
let m = int32.parse("fff", 16);

Features:
  • The implementation is pure Tart - no calling C library functions. It is also efficient :)
  • Radix conversions from 2 to 35 are supported.
  • Overflow is properly handled, even for 64-bit ints - so for example if you attempt to convert 9223372036854775809 to a 64-bit signed int, it will throw an OverflowError.
  • Bad input strings throw an InputFormatError.
  • Currently supports all integer and boolean types.
Character and float types are not implemented yet. I probably won't be able to do a pure-Tart implementation of float parsing, because that's hard to get right. So I'll probably end up having to call strtod(). Unfortunately strtod() has much less restrictive input conditions that I'd like (allows junk characters at the end, for example.)

There's also a "tryParse" function in the "Strings" namespace for each numeric type that returns either a numeric value or a conversion error:

def tryParse[uint8](s:String, radix:int) -> uint8 or ConversionError;

Note that the reason for naming it 'tryParse[uint8]' instead of 'tryParseUint8' is to make it easier to work with templates.

Thursday, November 18, 2010

More Closure Fun

Here are some more closure test cases that work:

// Test the closures can be passed as parameters
def testClosureAsParam {
  var i = 0;
  callTenTimes(fn { i += 2; });
  Debug.assertEq(20, i);

  var s = StringBuilder();
  callTenTimes(fn { s.append("A"); });
  Debug.assertEq("AAAAAAAAAA", s.toString());
}

def callTenTimes(f:fn) {
  for i = 0; i < 10; ++i {
    f();
  }
}

// Test that nested closures work.
def testNestedClosures {
  let makeCounter = fn n:int -> fn -> int {
    return fn -> int {
      return n++;
    };
  };
  
  let c = makeCounter(2);
  Debug.assertEq(2, c());
  Debug.assertEq(3, c());
}

// Test that an inner closure can access a variable two levels out.
def testNestedClosures2 {
  var i = 0;
  let makeCounter = fn -> fn -> int {
    return fn -> int {
      return i++;
    };
  };

  let c = makeCounter();
  let d = makeCounter();

  Debug.assertEq(0, c());
  Debug.assertEq(1, c());
  Debug.assertEq(2, d());
  Debug.assertEq(3, d());
}

One thing that does not work yet is closure variables that are large structs or tuples - basically any local variable that is not an object reference or a primitive type. It shouldn't be hard to fix, however.

Wednesday, November 17, 2010

Closures

I decided to take a break from the low-level work of debugging and garbage collection, and work on some language features, which is a much more pleasant activity. It only took a couple of days to get closures working. Here's one of the unit tests, for example:

def testClosureCounter {
  var a = 0;
  let f = fn -> int {
    return a++;
  };

  Debug.assertEq(0, f());
  Debug.assertEq(1, f());
  Debug.assertEq(2, f());
}

The mechanism behind closures is fairly efficient: The closure environment is a subclass of Object that has a member slot for each captured variable. For 'let' variables (which cannot be changed after the initial assignment), the slot contains just a copy of the variable. For 'var' variables, the slot is a reference to a shared closure cell, whose type is tart.core.MutableRef[T], so that both the inner and outer functions can modify the same variable.

The closure itself is passed around as a pair of pointers - the pointer to the executable code, and the pointer to the environment. Note that closures are type-compatible with bound methods - that is, anywhere you can put a bound method, you can also put a closure, assuming that it takes the same argument types.

Saturday, November 13, 2010

Coalescence

I spent the last several days experimenting with function-merging in the compiler, and the results are encouraging but not spectacular.

Take for example the following function in class ArrayIterator:

def next() -> ElementType or void {
  if (self.index < self.array._size) {
    return self.array._data[self.index++];
  } else {
    return;
  }
}

The type 'ElementType' is a template parameter. The interesting thing about this function is that it really doesn't care what ElementType is - it doesn't call any of it's methods or examine it's variables. Really, the only thing it cares about at all is it's size. For most arrays, that size will be the size of a pointer.

There's no reason, then, why ArrayIterator[String].next() and ArrayIterator[Object].next() can't be the same machine-code function. What I've managed to do is create a FunctionMergePass class that compares two functions and determines whether or not they can be merged. Currently I do not attempt to merge every function automatically, instead the merging behavior is manually triggered by the presence of the @Coalesce attribute on the class.

The pass works by walking the code-flow graph of both functions in parallel, and checks each pair of expression nodes to see whether or not they are equivalent in terms of their effect. If the merge pass finds any discrepancies between the two, the merge operation is canceled, and the function marked as non-mergeable.

Most expression types are trivial to compare, although some are more interesting. In the above code example, there's an implicit cast from 'ElementType' to 'ElementType or void' in the first return statement - in other words, it's constructing a union type and returning it. The merge pass in this case checks to see if the two union types are equivalent, which they are - 'Object or void' produces the exact same binary structure as 'String or void'.

The most common cause of merge failure is due to the function calling another non-mergeable function. Take for example this method in the template ArrayList[ElementType]:

def insert(index:int, e:ElementType) {
  Preconditions.checkIndex(index >= 0);
  Preconditions.checkIndex(index <= _size);
  grow(1);
  ElementType[].copyElements(_data, index + 1, _data, index, _size - index);
  _data[index] = e;
}

Everything in this function is mergeable except for the call to 'grow'. The reason 'grow' isn't mergeable is that it sometimes allocates a new Array instance, and calls to 'new' are generally unmergeable because the exact type of the object being created needs to be known. (There's a loophole around this which I have thought of but have not implemented - if the object being allocated is the same type as 'self', then we can just use the type information from 'self' instead of referring to the type information via a static constant.)

However, there is a way we could make the call to 'grow' mergeable - at a slight performance penalty. The answer is to make the call virtual. Right now it isn't virtual, because the "ArrayList" class is marked "final", which means that the compiler is smart enough to call 'grow' directly instead of dispatching through a virtual function table. (For those of you unfamiliar with Java, marking a class 'final' means that it cannot be subclassed, which allows the compiler to make significant optimizations.)

If the call to 'grow' were made virtual, then even if we merge ArrayList[String].insert() with ArrayList[Object].insert(), the correct version of 'grow' will be called in either case because it depends on the type of 'self'.

The question is whether or not this is worth doing, which comes down the question of what's more important, speed or size? At the moment, enabling the function merging behavior on the SwitchStmtTest unit test results in a 7K reduction in the size of the executable file. 7K out of 350K is not all that much.

Thursday, November 11, 2010

Tart: Still Too Large

Although the new reflection system has managed to reduce the size of the executables somewhat, I'm still concerned about the overall size.

For example, the unit test SwitchStmtTest weighs in at about 350K binary executable size. This is with optimization turned on (including dead code and global elimination), and with all debugging info stripped. That seems quite large to me - I'd like the size to be more on the order of 50K (that's my rough guess at how large an equivalent C++ program would be.) Note that with optimization off and debugging turned on, the size balloons to a whopping 1.5 megs.

I decided to do an analysis to see what exactly was taking up so much space. I used 'nm' for this purpose, along with a Python script that computed the size of each symbol in the object file based on the address deltas. These numbers are very rough, since there's a lot of stuff in the object file that is not accounted for. I'm only using the numbers as an order-of-magnitude guesstimate of sizes.

The contents of the object file can be broken down in several ways. Here's a breakdown by package:
  • tart.atomic: 16
  • tart.collections: 24712
  • tart.core: 37875
  • tart.gc: 288
  • tart.reflect: 34823
  • tart.testing: 3536
  • tart.text: 6179
  • SwitchStmtTest: 4435
  • Total: 111864
The numbers represent the total size, in bytes, of all symbols whose names start with the package name.

One of the first things to notice is that the total is only about a third of the size of the object file. I'm not at all sure why this is.

Also, I'm curious why tart.atomic was even included at all, given that nothing refers to it.

In the case of templates, the template expansions are charged to the package defining the template, not the package that instantiated it. In fact, template expansions account for a large portion of these numbers. Two templates dominate:
  • class tart.core.Array: 26879
  • class tart.collections.ArrayList: 13840
This leads me to believe that I need to think very hard about how to reduce template expansion sizes. I tried Nick's -mergefunc LLVM pass (which coalesces functions that produce identical machine instructions), and it had very little effect. That may be due to a bug in the -mergefunc pass, or it may be that there are in fact tiny differences in the generated code for different templates.

Another way to break down the numbers is by symbol type. For example, if I add up all of the static metadata for reflection, the total comes to 27645.

Here's a breakdown of some of the symbols (only items > 500 bytes are listed):
  • Non-reflection data (dynamic dispatch tables, instance allocators, etc.)
    • Constant tart.core.String instances: 16256 (Average size 66)
    • Type info blocks: 6784 (average size 45)
    • Type base class lists: 1952 (average size 33)
    • Interface dispatch functions 3168: (average size 36)
    • Compiler-generated instance allocators: 800 (average size 38)
    • Trace tables for garbage collection: 736 (that seems low!)
  • Reflection-related data
    • Type adapter functions (for boxing / unboxing reflected calls): 20912
    • Class and module reflection structures: 13216
    • Compressed name tables: 7864
    • Field offset tables: 1080
    • Tables of pointers of types referred to by other types: 2400
Here's my list of take-aways from all this:
  • 64-bit code is really fat. Even a trivial function can be >1K
  • Template expansions are taking a large amount of space.
  • I can probably optimize the type adapter functions. (These are functions generated by the compiler that auto-box and unbox function parameters.)
  • The refection structures for classes and methods can probably be trimmed a bit if I am willing to use void* pointers (means I can combine multiple tables into one.)
  • It may be possible to reduce the number of generated strings somewhat.
Coming up with a solution for template expansions is going to be the hardest. Ideally, we want to re-use the generic code as much as possible - so instead of separate definitions for tart.core.Array[Object].Iterator.next() and tart.core.Array[String].Iterator.next() - which do exactly the same thing - we can point them both to a common definition. The problem is that not all template methods are type-independent. For example, the tart.core.Array[Object].copyOf() method allocates a new array instance, and the type of that instance will be different for different template instances of copyOf().

Each template class method will be impacted to a greater or lesser degree by variances in the type of the template parameter. We can group functions in to classes by the degree of this impact:
  • No impact - the function is exactly the same regardless of the template parameter. An example is the method that returns the length of an array, which is the same regardless of the type of the array element.
  • Different type sizes - for example, the array copyElements() treats all the array elements opaquely (it just memcpys them) so the only thing it cares about is the size of the elements. Thus, all pointer arrays are the same as far as it is concerned.
    • For these cases, the solution is to generate a version of the function for each distinct size - so you have a version for 1-byte array elements, 2-byte, 4-byte, and so on.
  • Parametrically polymorphic - this means that difference instances of the method do the same thing, but with different types - so for example, one version might add two ints, while another adds two floats.
    • One possible approach would be to somehow (I'm not sure how) to migrate the type references outside the function, possibly in the static class data - so for example, when copyOf() creates a new instance, instead of hard-coding the type, you look it up. However, this seems pretty complicated, and I'm not sure how to do it.
  • Ad hoc polymorphic - this basically means "anything goes" - two different versions of the same template method might end up with completely different behavior. While functional language purists frown heavily on such things, in practice there are a few cases where this can be useful, especially in the metaprogramming domain.
    • An example of how the code can be different: Suppose we have a type A which defines "foo" as a constant, and type B which defines it as a variable - if a template "Example" has a method method "bar" which operates on the "foo" property, then the generated code for Example[A].bar() will be very different from the generated code for Example[B].bar().
Detecting which group a method falls into requires tracing through the code flow graph of the function, and analyzing each expression, type, and value to determine how that item will vary based on the template's parameters.

In a language with a JIT, the solution to these problems is simple - you instantiate the template instance methods lazily. So for example, if no one ever calls tart.core.Array[String].copyOf(), then just don't generate that method. The problem with a language that generates native binaries (and this applies to more than just templates) is that everything has to be generated in advance, because you don't know ahead of time what will and won't be used.

A final issue that I need to consider is "when?". Do I need to address the size issues right now, or can it wait until more language features go in? I know that the common wisdom is "optimize last", but at some level a program can perform so poorly that no one will consider it worth the trouble of using, let alone optimizing. I know for example that a typical XBox developer would never bother to use Tart in it's current state, where program size is a critical issue (there's no virtual memory on the XBox and games tend to use all of memory.) Also, I don't like building on shaky foundations.

Tuesday, November 9, 2010

Intervals in Gosu

Gosu has an interesting syntax for open and closed intervals:

   1..10   // Closed interval, i.e. the values 1-10 inclusive.
   1|..|10 // Open interval, the values 2-9

Having an easy syntax for open and closed intervals solves a number of problems I've been having: Sometimes you want intervals to be closed, other times half-open. For example, Python slice objects are half-open, and would be much less useful if they were fully open or fully closed. The interval notation would make it easy to do Python-style slices:

  let s = "Hello, world!";
  Console.out.printLn(s[0..|5]); // Prints "Hello"

Tart status update

I've been making substantial progress over the last couple of days, which is a welcome change after a couple of frustrating months.

The last few months have seen a lot of hard effort, with very little to show for it. I'd been focused primarily on getting debugging working - being able to inspect Tart data structures in the debugger. Unfortunately, it seems that LLVM's facilities for generating debug information aren't very robust - if you make even the smallest mistake in calling the debug factory functions, it will silently write out bad debugging data without telling you. In addition, the available tools for inspecting and checking DWARF debugging data are very primitive, producing at best a raw text dump of the debugging structures, requiring you to manually trace the cross references between debug entries. Nor is gdb particularly helpful in terms of it's error messages. Add to this the fact that DWARF itself is extremely complex, and that even a small program will produce many kilobytes of debugging data, and you can begin to see why it takes so long to get this stuff right.

The key to breaking this logjam was my attending of the LLVM developer's conference last week. I was able to talk directly to the people working on the parts of LLVM that deal with debugging info. I was able to clarify a number of critical points which I had not been able to do over email conversations. (I've noticed that any email question sent to the llvm-dev mailing list has approximately a 50% chance of being answered, so getting answers takes persistence.)

It took me a few days to be able to apply everything that I learned at the conference, with a lot of trial and error involved. During this time, I was at the Hacker's Conference, and I spent about half my time at the conference staring at my laptop working on Tart. Although I felt bad about not paying attention to the conference sessions, I really felt driven to try and get this stuff working.

As a result, the debug stuff is working much better. I can now set breakpoints, examine function parameters and data structures (although examining local variables is still not working).

In addition, I've also made progress switching over to the new, compressed reflection metadata. The old system has been completely removed, and the unit test framework has been switched over to use the new system.

In other news, I read last night on Slashdot the announcement of the Gosu programming language, which has a lot of similarities to Tart. In fact, it's quite discouraging to me to see how many of my ideas have been developed independently - discouraging because they have a shipping product and I don't. That being said, there are a few things that I think they didn't get right, and a few things that seem faddish, but none of these are serious.

I think the biggest difference between Gosu and Tart is that Gosu targets the JVM, whereas Tart produces native binaries. I'm not sure which of those two paths will prove to be smarter in the long run. Obviously, it's a lot easier to write a new language for the JVM, since you get a lot of stuff for free - garbage collection, a huge class library, and so on. The benefits to producing native executables is more subtle - a quicker startup time, ability to run on platforms that don't have a JVM, and so on.

There are also a number of downsides to native executables - for example, a lot of stuff has to be pre-generated (like type adapter functions for reflection) which can be generated dynamically as needed in the JVM. This potentially makes the native executable much larger than the equivalent Java .class file. Also, there's no way to do HotSpot-style profile-guided optimization, which you could do with a JVM.

In any case, each time someone comes out with a programming language that is similar to Tart or fulfills a similar need (Go, Gosu, Scala, etc.), it means that the chances of Tart achieving any sort of widespread adoption gets smaller. Nevertheless, I'm not ready to quit yet... :)