Wednesday, July 11, 2012

Tart still on hiatus, but not dead

I've been greatly enjoying working on my Android game project lately - here are some screen shots of my most recent progress: https://plus.google.com/102501415818011616468/posts/51UaQviTyfM

Naturally this means that I haven't been doing any active coding on Tart for the past 8 months or so. However, that's not necessarily a bad thing. You see, even though I'm not actively working on it, it's still percolating in my subconscious mind, and occasionally an idea pops out. I've often found that some problems are best solved by stepping back and letting go for a while - sometimes the focus on solving a particularly vexing problem causes a kind of tunnel vision, while stepping back and getting some distance from the problem can lead to a new perspective on how to solve it - or even how to avoid it being a problem at all.

I thought I would take some time to jot down some of the issues that I have been thinking about.

I think that there are some aspects of Tart for which I never quite had a clear design idea. That is, there were certain parts of the language design where I figured that I would fill in the blanks at a later time, and looking back on the project I realize that those areas never did get filled in properly. Worse, I realize that some of those design aspects should have been designed up front rather than being put off until later, because the design choices affect the compiler's output in a fairly fundamental way.

1) Dynamic Linking

An example of this is dynamic linking of modules. Being able to divide an application into separately loadable chunks is fairly fundamental these days. Of course, at the lowest level we know how this is all supposed to work - the semantics of shared libraries and dynamic linking are well understood. But high-level languages need something more than just stitching together of linker symbols. They need to be able to do method calls across module boundaries, which means that the class definitions in the various modules must be interoperable. If two modules both import references to a particular class, those two class definitions must be combined somehow - otherwise you end up with two distinct classes that happen to have the same name, and instances of those classes will no longer be comparable, even though they may have identical values. 

In the past, the Tart compiler has generated a large quantity of static metadata for each class (dispatch tables, reflection data, GC stack maps, etc.), and relied on LLVM's linker to combine identical definitions. This is fine for static linking, but doesn't address the problem of dynamic linking at all. To address the problem of dynamic linking, a different approach is needed.

My current thinking is to replace a lot of the statically defined data structures with one or more global registries. For example, you would have a global registry of types. Whenever a module is loaded which contains a reference to a type, it would look it up in the registry (by name if it's a named type, by type expression otherwise), and if the type exists then the module would use the returned value as the type pointer. If the type does not exist, then the module will contain a static string that is a serialized description of the type that can be used to construct the actual type object. In other words, the first module that uses a given type gets to set the definition of that type, and all other modules will use that definition.

Note that the serialized description of the type could potentially be smaller than the instantiated type object, since there would be no need to store NULL pointers, and the serialized information could be compressed.

Of course, this can lead to problems if two modules have incompatible definitions of a type. For example, two modules might not agree on the length or structural layout of the type. To solve this problem, we need to be able to check if two type definitions are compatible. A strict check is fairly easy - see if the definitions are identical. But strict checks lead to brittleness and versioning hell - ideally you want some flex in the system so that a dependent module can be modified without recompiling the world.

A very flexible approach involves constructing the class dispatch tables at runtime instead of at compile time. This allows you to add or delete methods of a class and still remain compatible with old code. This works because the method offset into the dispatch table is no longer fixed at compile time, but calculated at runtime, so if the dispatch table slots change (because methods are added or removed) it won't break older code.

One can go even further and use a dispatch map rather than an array, similar to what Objective-C does: For each method signature, you generate (at runtime) a selector ID. Calling a method involves looking up that selector ID in a hash map, which returns the actual method pointer. While this does cause some additional overhead for certain kinds of method calls, other types of calls (such as calls through an interface) might potentially be sped up.

2) Debugging

Support for DWARF has been the bane of my existence, and has caused me more frustration than any other aspect of working on Tart. And to add insult to injury, almost all of the DWARF information is duplicative - that is, it's simply repeating information that's already present in the reflection data (such as a description of fields and methods of a class), but encoded differently.

I recently learned, however, that lldb (the LLVM debugger project) can potentially support custom debugging formats. That is, it would not be impossible to write an extension to lldb that directly parses the reflection data of a module instead of using DWARF.

This does mean, however, that source-level debugging would only work with lldb, and not gdb. That may be too much of a price to pay - for one thing, I like the Eclipse integration with gdb, and not being able to debug in Eclipse would be sad.

3) The name "Tart"

I've never been completely satisfied with the name "Tart", but unfortunately all the good names are taken. However, I'm concerned about possible confusion or conflation with "Dart", which is yet another language that Google is trying to promote.

Anyway, this is not a complete list of all the things I have been thinking about, but its late, and I shall leave the rest for another time.

--
-- Talin

No comments:

Post a Comment