Sunday, January 10, 2010

Tart status update

Well, the good news is that HashTable is coming along nicely and is about half done. This is, like HashSet, a quadratically-probed hash table that is written entirely in Tart.

I've also revamped all of the integer types (although I have not yet updated the docs) based on earlier suggestions. The new integer types are:

   int8, int16, int32, int64 - signed integer types.
   uint8, uint16, uint32, uint64 - unsigned integer types.
   int - synonymous with either int32 or int64 based on machine word size (compiler command line override.)
   uint - same for unsigned ints.

All containers except for String now use 'int' to represent lengths and offsets. That means that on a 64-bit architecture, an array can represent 2^64 items. String is still limited to 2^32 items, a not unreasonable limitation IMHO. Using 'int' this way eliminates the need for multiple overloaded methods in many cases.

Another bit of progress is that default template parameters are working now. This is used in HashSet/HashMap to supply the hashing function:

final class HashSet[%ItemType, %HashFn = Hashing.HashFn[ItemType]] : Set[ItemType] {

Hashing.HashFn has a number of overloads, one of which is:

/** Hash functor that calls computeHash(). */
struct HashFn[%T <: Hashable] {
  def hash(value:T) -> uint64 {
    return value.computeHash();
  }
}

'Hashable' is a protocol that simply requires that the object have a 'computeHash' method:

protocol Hashable {
  def computeHash -> uint64;
}

Thus, any class that has a 'computeHash' function with the given signature will bind to 'HashFn' (because of the superclass constraint operator '<:'), which in turn will work with HashSet/HashMap. Thus, objects used as hash keys do not need to inherit from any special class or interface, they just need to declare the computeHash method. For objects such as ints which cannot have a 'computeHash' method, you can define a custom overload of HashFn that calculates the hash differently.

In other words, protocols give you the equivalent of type-safe duck-typing :)

The not so good news is that there's a nasty bug that is preventing me from reaching the "Hello World" milestone. This milestone is defined as the ability to print out text strings using the real i/o library with the real character codecs - not some hacked up debugging text output function.

Anyway, I've been struggling with this bug off and on for the better part of 3 weeks, Today Alex Liberman came over for a social visit and sat down and looked at the bug with me, and while we made a little bit of progress the solution is still elusive. What appears to be happening is that the "idispatch" sequence of instructions - that is, the sequence of instructions generated when calling a method through an interface pointer - is being miscompiled. The weird part is that the LLVM IR looks correct to me, but the assembly does not, yet I can't think that LLVM would have that egregious of a bug and no one not noticed it. When I trace through the code in gdb, stepping one instruction at a time, I get very confused - the register that I think *ought* to be containing the object pointer instead turns out to contain the object's TIB (Type Info Block), and the register that ought to contain the TIB is instead holding the interface ID used for the dispatch. Alex suggested adding a member field to the object containing 0xDEADBEEF to make it easier to locate the object, but none of the registers appear to be pointing to it. Blah.

And the weird part is that idispatch works fine in other places.

No comments:

Post a Comment