Sunday, September 11, 2011

Article on adding type constructors to Java

So I recently saw this article on Lambda the Ultimate about adding type constructors to the Java language. I have to admit, it took me several re-readings of the article before I understood the gist of what they were saying. And there are some parts of the paper, such as the logical proof, that I have no hope of understanding, as I just don't have the background.

Nevertheless, I understand it well enough I think. What they are talking about is indeed highly abstract and powerful. I'll try and summarize as best as I can: A type constructor can be thought of as a function that operates on types. An example would be "X => List[X]" - a function which, given a type X, returns the type of a list of X.

Where this gets powerful is when you start passing type constructors as template arguments - that is, instead of passing in a type as a template argument, you pass in a function that operates on types. Within the body of the template are calls to this function, which appear in the same places that template arguments would appear. So for example I could declare a template that takes a parameter R which is a type constructor, and in the body of the template I might declare a method whose return type is "R(String)". What "R(String)" evaluates to depends on what the value of "R" is when you instantiate the template. Some examples of R could be a constant [R(X) => int], the identity function [R(X) => X] or a type derived from the argument [R(X) => List[X]].

The example that they give in the paper is a visitor pattern - that is, you have a tree representing expression nodes, and you have two different visitors - one which evaluates the expression, and one which prints the expression as a string. In the former case, each visitor method returns a type appropriate to the evaluation - int, float, bool, etc. In the second case, the visitor methods all return type String. Because the visitor methods are all defined in terms of type constructors, you can use the same generic tree-walking code to handle both cases.

Well, that's all very neat, but...I wonder. It doesn't actually seem all that useful to me, or at least it doesn't seem all that useful to the average Joe Programmer (JP). You see, these type constructors don't work on normal classes of the kind JP would write, but only on templates that have these constructor calls built into them. Writing a class which takes advantage of type constructors presupposes that you have a certain kind of generality in mind. That is, given a particular data structure or algorithm, there are many dimensions along which we can generalize - you can think of a particular bit of code as being defined by some set of design criteria, and in generalizing the code we're taking some of those design criteria that are constants and replacing them with variables. We can't replace the entire thing with variables, because then we have no specification - our design becomes "do anything with anything".

The most common kind of generalization we see in practice is to interpolate between two specific use cases - that is, we enlarge the design just enough to encompass the two cases. If additional use cases come along, they may or may not fall within the envelope of what the design can support. (That's why I generally like to see at least three use cases before generalizing a particular design.)

This is exactly what they have done in the paper - their two use cases are "visitor that supports evaluation" and "visitor that supports converting into a string representation".

The thing is, though, is that this problem isn't a "real" problem in the sense that it's not a very hard problem to solve - there are many ways of coding those two use cases, and while some of the possible solutions are a little more verbose, they don't require super-advanced type theory either. The authors are deliberately avoiding any kind of dynamic typing because they trying to use the type system to prove that their code is correct, but I'm not convinced that their code is easier to *reason about* than an equivalent dynamic-typed solution would be. And to my mind, making it easier to think about what the code does is really the key to better programmer productivity.

I'm also not convinced that the kind of generality that they offer is the kind that most programmers would need. That is, a programmer who has a need for a "generic visitor" would be very likely, in my estimation, to need a visitor that is generalized in a way that the authors haven't thought of, and which would not be supported by their system.

To give an example, there's another kind of generality that is used daily by many Java and Python programmers, and which is much more useful. I'm going to call this "algorithmically-produced implementations". That is, given some ordinary abstract interface X that defines a set of methods, generate all of the method bodies using some rule. An example of this is mock objects for unit testing - you "mock out" the interface by generating methods that capture the parameter values and record them in a log.

In other words, rather than having functions that operate on *types*, we have essentially functions that operate on *methods* - given a type signature, return a method body. One could imagine a syntax for this, where '_' is taken to mean "anytime someone attempts to call a method that is undefined, call this method instead".

   def _[%RetType, %ArgTypes...](args:ArgTypes) -> RetType {
     record(args);
     return defaultValue[RetType]();
   }

Of course, the language has to support "perfect forwarding" in order for this to work, but let's assume that it does indeed do so. (Note that the ... in the template argument list indicates that ArgTypes is a variadic template parameter - this allows the function to accept a variable number of arguments.)

Note that from an implementation standpoint, there are two ways that this can work. Approach #1 is to have the compiler replicate the body of "_" for each missing method. Approach #2 is to only have one instance of "_", but for each missing method the compiler generates a trampoline function that packs up the arguments into an array and calls "_". Which method will be used in Tart is something that I still have not decided.

In any case, the beauty of mock objects is that they work with almost any interface - you typically don't have to have mocking in mind when you design your interface, although there are occasionally times when you need to do so. And it's something that the average programmer can immediately grasp.

No comments:

Post a Comment