Machine Words: More on templates and reflection

As I mentioned in my last posting, I'm working on a more efficient way to encode reflection data for templates. Rather than pre-generating the reflection data for each expansion of the template, instead I want to store the template definition, and let the expansion be generated lazily at runtime.

Note that under this scheme, the actual compiled methods, and the method dispatch tables, are still being generated by the compiler. It's only the reflection data that isn't generated.

One thing I need to work out still is how to integrate the compiled methods and dispatch tables into the reflection data when it is generated. With non-template classes, this is easy: The reflection data for each method (a static structure generated by the compiler) contains a pointer to that method. In the case of a template method, we need to somehow attach the method pointer to the method descriptor when it is created. This gets tricky because the compiler may have generated multiple copies of the original template method, one for each expansion of the template with different type parameters.

One way to approach this is to have some kind of map of functions: The key consists of all the elements that uniquely identify a function (the name, the arguments, the template arguments, etc) and the value is the function pointer. The trick will be representing the keys compactly.

For the name and template arguments, we can use indices. The name will be an variable-length index into the names table for the module, typically 1 byte. The template are represented as a tuple of types, which itself a type - given types T1, T2, you can create a new tuple type Tuple(T1, T2), and this new type can be assigned a unique index. This index can also be represented as a variable-length integer, and thus can also be (usually) stored in a single byte (it's rare that a module will have more than 128 unique types). The arguments to the method can be treated similarly, with the caveat that method arguments have additional properties than just the type, such as variadic arguments.

So, assuming we can figure a way to encode those additional properties compactly, we're looking at a key size of only a few bytes. Still, that's a minimum of a key and a pointer for each method, and if there's a lot of methods that adds up. Now, some of those methods are already in a table - the dispatch table for the class. It would be nice if we could use that table instead of having to generate a separate table for reflection purposes. Unfortunately, not all methods are in that table - for example, methods declared 'final' or 'static' don't need to be in the dispatch table because the compiler can determine at compile-time which method to dispatch to. And I'd like to avoid making the dispatch table longer, since every subclass has to include the dispatch table of its parent class. Also, methods which are not in the dispatch table can be discarded by the linker if they are never called elsewhere, whereas methods in the dispatch table must be included in the final binary.

One approach would be to create an additional table of only the methods that weren't in the dispatch table. Unfortunately, once we do this, it means that those methods can no longer be discarded by the linker. I'm not sure how to get around this, or even if it's important - one of the perils of reflection is that it breaks the compiler's ability to determine what symbols are reachable from other symbols - if you can access a class or method via it's name, then how can the compiler know whether or not it's called? It has to be conservative and include all potentially reachable symbols, which might be larger than what is actually needed.

(Another possible approach is to use the dwarf debugging information to be able to look up a function by name. However, I want to preserve the option to strip out all debugging symbols from the binary, which would cause reflection to break.)

In any case, assuming we have some small key that can uniquely identify a method, when we expand a template, we need to loop through all the methods, substitute type parameters, generate tuples from those types and look them up in the module's type registry to generate a unique index. Then combine the indices to generate a method key, look up that method in the map, and then use the resulting pointer to fill in the method slot in the method descriptor.

Machine Words

Wednesday, July 28, 2010

More on templates and reflection

No comments:

Post a Comment