Tuesday, January 11, 2011

Tart status update

I've started working on Tart again, after a break of about a month. The reason I stopped was that I was feeling a bit burned out, and I felt like I had hit a wall with respect to some unsolved problems. I figured that if I deliberately kept away from working on it for a while, I'd not only recharge my batteries, but perhaps I could get some new inspiration as well.

And in fact I did have some minor brainstorms during this period, which made me want to work on it again. Here's a summary of the direction my thoughts have been going lately:

Reflection

If you remember from earlier, I had been kind of obsessing about the large amount of reflection metadata that was being generated. Well, I've made a decision, which is that reflection will now be an opt-in rather an opt-out for classes. That means the compiler will not generate reflection information for class methods and fields unless there is an annotation telling it to do so.

The @Reflect annotation will cause a class, and all of its subclasses, to be fully reflected. In addition, any class member which has an annotation will be reflected, if that annotation is retained in the final executable (the reflection structure is what holds the list of annotations, so it needs to present in order for the annotations to be found.)

Initially, I assumed that all classes would want to have detailed reflection information. My rationale for assuming this is that reflection is about discovery - it's about taking a data type that your code never seen before, and extracting information from it. I reasoned that for any given class, there's no way to know what possible future users of that class might want to discover, so the policy would have to be conservative and inclusive.

However, after thinking about this some, I realized that in practice, the use of reflection generally begins from some known starting point, such as a particular interface, or classes annotated with a particular annotation. Take for example JUnit - it uses reflection to identify all of the test methods within the class (any method that begins with the word 'test'.) However, it only does this for classes that are derived from TestCase. So that's a known starting point. Similarly, another classic use case for reflection is to be able to serialize or edit a class by getting a list of it's properties. However, the author of the class to be serialized or edited generally knows in advance that this is going to happen, and designs the properties and fields of the class to facilitate such access - so requiring an extra attribute on the class to enable this is not a big hardship.

Another common pattern is to scan the class members looking for ones that bear a particular annotation. An example is Guice's dependency injection logic - it scans a class for any constructors that are annotated with @Inject or @Provides. It's not interested in class members that don't have annotations, so there's no need to reflect those.

One possible pitfall of this scheme is that a reflected class may have members whose types are not reflected. For example, a class might have a member whose type is List[String]. Neither List nor String are reflected by default. That means that if you were trying to do a recursive traversal of some tree of objects using reflection, you wouldn't be able to traverse fields with these types. That doesn't mean that you would have no information at all - non-reflected classes still have information about their name, module path, base classes, type parameters, and so on. They just lack information about their members.

For the case of List[String], your traversal algorithm could just have a special case for fields of type List, and iterate through the list normally instead of using reflection. You still wouldn't be able to handle fields whose types were both non-reflected and unfamiliar to your code. The question is, how important is this use case? At the moment, I can't think of a situation where this would be important, but I could be wrong.

The advantage of the opt-in strategy is three-fold: First, it means that we don't generate a pile of unneeded metadata each time a template gets instantiated. Secondly, because we no longer have to worry about size, I can rip out a whole bunch of complicated compression / decompression logic from the Tart runtime (although I still keep method descriptors and field descriptors compressed, but not types). And finally, since I'm no longer creating types lazily, it means that reflection metadata can go back to being static constants - which means that every type Tart once again has a unique, canonical, immutable pointer for its runtime information.

Functional Style

After watching the Scala presentation at Google last month, I realized that there might be some advantage to making some statements more functional in style than they currently are. This doesn't mean that I've drunk the functional programming kool-aid, and I certainly don't think that functional programming is the panacea that some of its adherents claim. However, I was pretty impressed with the brevity of the code examples and the clarity of the abstractions involved.

Thinking about those code samples makes me realize that there's a very common coding pattern in imperative languages that kind of annoys me, although my level of annoyance was so minor that I wasn't aware of it before. Here's some examples:

Example 1:

Record record;
if (some_condition) {
  record = dataSource.getRecord(FIRST_ID);
} else if (other_condition) {
  record = dataSource.getRecord(SECOND_ID);
} else {
  record = dataSource.getRecord(OTHER_ID);
}

doSomethingWithRecord(record);

Example 2:

Record record;
try {
  record = dataSource.readRecord(RECORD_ID);
} catch (IOException e) (
  logger.log(e);
  return;
}

doSomethingWithRecord(record);

In both cases, we're assigning to a temporary variable because in Java/Python/C++, expressions are subservient to statements - that is, statements may contain expressions, but expressions can never contain statements. If we want to calculate an expression where there is control-flow involved in the calculation, then we have to break up the expression into multiple statements.

(Note that Example 1 can be generalized to switch statements as well as chained if-else statements.)

A language that supported a functional style, in which statements could return values, would allow these statements to be written more concisely:

doSomethingWithRecord(
    if (some_condition) {
      dataSource.getRecord(FIRST_ID);
    } else if (other_condition) {
      dataSource.getRecord(SECOND_ID);
    } else {
      dataSource.getRecord(OTHER_ID);
    });

doSomethingWithRecord(
    try {
      dataSource.readRecord(RECORD_ID);
    } catch (IOException e) (
      logger.log(e);
      return;
    });

Of course, Tart already lets you do the first with the 'cond' macro:

doSomethingWithRecord(
    cond(some_condition, dataSource.getRecord(FIRST_ID),
         other_condition, dataSource.getRecord(SECOND_ID),
         dataSource.getRecord(OTHER_ID)});

There are several problems with 'cond' however:
  • It looks different from a regular if-statement. In other words, the language has two completely different if-statements, one functional, one procedural. Think about this for a moment from the perspective of an IDE with syntax highlighting - the 'cond' macro will be syntax-colored just like a regular function call, but it would be nice if the editor could syntax-color it like a flow-of-control construct.
  • The 'cond' macro only allows a single expression to be returned as the result of each if-test. It would be better if the 'value' argument could contain multiple statements. (You can do this by making a closure, but that's overkill IMHO.)
  • 'cond' relys on Tart's macro system, a feature which so far is not pulling it's weight in terms of benefits delivered. I think language features which are only useful in obscure edge cases should be removed, and so far macros have exactly 3 use cases: 1) cond, 2) lazyEval, and 3) assert.
For the second example, with the try-catch statement, we have another interesting issue, which is the behavior of the return statement. When used as an expression, 'return' acts as a non-local exit out of the function - it effectively cancels the function call as a side-effect of evaluating the calling arguments. Pretty freaky but completely logical given the original example that it was derived from.

Unfortunately, there's no way that Tart's macro system can implement try-catch as an expression. If we want to support cases like this, then it's got to have some sugar from the compiler.

If we decided to go down this road, one of the first things we would need to deal with is how to handle expressions containing multiple statements:

doSomethingWithRecord(
    if (some_condition) {
      doSomePreperatoryWork();
      dataSource.getRecord(FIRST_ID);
    } else if (other_condition) {
      doSomeDifferentPreperatoryWork();
      dataSource.getRecord(SECOND_ID);
    } else {
      dataSource.getRecord(OTHER_ID);
    });

In the above example, the first if-block has two statements. What is the value that results when that branch of the if-statement is taken? We can't use the 'return' statement at the end of the block, because as we've seen that returns from the function as a whole. We could come up with some new keyword to indicate the result produced (I haven't been able to think of one), or perhaps some new operator, or we could do as some languages do and say that the last statement in a block is always the result. I don't particularly like the implicit last-statement rule, EIBTI.

(Pure functional languages don't generally have this problem because they don't have 'statements' per se, only expressions.)

It also quickly becomes obvious that the block statements { } are in fact expressions. Does this mean that one can omit the 'return' statement at the end of a function, and just put an expression? That's annoyingly perl-ish from my point of view.

There's also the question of whether all those braces are necessary. Tart's rule for block statements up to this point has been that the braces are not optional, but the parens around the condition can be omitted. However, I'm pretty open to a certain degree of design churn for an experimental language in development, which is another way of saying that I'm open to changing my mind.

In any case, I haven't come to any conclusions on this matter as yet, so no action will be taken.

Functions as Objects

I'd like to be able to write the following in Tart:

interface Function[%ReturnType, %ParamTypes...] {
  def (params:ParamTypes) -> ReturnType;
}

I currently can't do this because ParamTypes is a variadic template argument. Ideally, the Function class should be able to take a list of 0 or more function parameter types as template arguments, such as Function[String, int, int, int].

Secondly, I'd like closures and bound methods to implement this interface - that is, if you have a closure, you can assign it to a variable of this type, assuming the parameter types match. 
Also, the type declaration 'fn (params) -> ReturnType" should simply be a shortcut for this type. Doing that involves changing the internal representation of closures to be more like objects, so that when a closure is called it goes through the normal vtable dispatch mechanism.

The advantage of doing this is that you can derive new classes from Function, and they can be used in all of the same places that closures and bound methods are used (similar to __call__ in Python.)

There's another cool feature that I'd like to steal from Scala: Let's say you have a function that takes as its argument an interface with a single callable method, such as Factory[String].build(). In Scala, you can pass a closure as that argument instead of creating a class which instantiates that interface. So for example:

interface Factory[T] {
  def build() -> T;
}

def buildLotsOfStuff(factory:Factory[String]) {
  // ...
}

// Convert a closure to a Factory[String].
buildLotsOfStuff(fn { return "Hello"; });

// In Java, we'd have to do this much more verbosely:
buildLotsOfStuff(new Factory<String> {
  public String build() {
    return "Hello";
  }
});

I'm not quite sure how Scala does this - that is, there are two ways I can think of to do it, but each has disadvantages. The first way is to automatically generate an adapter object when the closure is converted to a reference to type Factory.
So for example the compiler would generate something like this:

class AnonymousFunctionAdapter : Factory[String] {
  var closure:fn -> String;

  def construct(closure:fn -> String) {
    self.closure = closure;
  }

  def call -> String {
    return closure();
  }
}

This works in all cases, but is less efficient because of the overhead of calling the adapter and because you have to allocate the adapter instance.

The other approach would be for the compiler to generate the closure itself as an instance of Factory[String]. All that is required in this case is to add the additional dispatch table entries to the closure's class - that is, the actual method pointer would be copied into two dispatch slots, one for Function.(), and one for Factory.build(). This means that you avoid allocating the extra object and the code is able to jump right into the method body directly without having to pass the arguments through the adapter.

However, there's a drawback, which is that this technique only works if the compiler knows what type the closure is going to be cast to at the point where the closure is generated. If we pass the closure around as a function pointer, and then later decide to cast it to some other callable type, we won't be able to modify the closure to conform to that type's calling conventions.

No comments:

Post a Comment