Monday, August 16, 2010

Propitious Properties

Tart's property system is modeled after C#, and to a lesser extent Python. A property is defined with the following syntax:

   def length:int {
     get { return self._length; }
     set { self._length = value; }
   }

The idea behind properties is that you use them just as if they were member variables, but reading or writing the variable actually invokes the getter or setter method:

  return obj.length; // Calls the 'get' function
  obj.length = 1; // Calls the 'set' function

Properties are an improvement over Java's 'bean' reflection system, which I will explain. In Java, you can introspect an object to determine what properties it has. Properties are derived by looking for methods named 'getXXX' and 'setXXX':
  • A method named 'getFoo' is presumed to define a property named 'foo' - the 'get' prefix is stripped, and the initial character of the property name is converted to lower case.
  • The return type of the method (in the case of 'get') or the argument type (in the case of 'set') is presumed to define the type of the property. (The getter and setter type must match.)
Here's why a formal syntax for defining properties is superior to Java's syntax:
  1. Doing textual manipulation of the method to derive the property name is ugly.
  2. If you want to @Annotate the property, in Java you have to annotate either the getter or the setter, whereas in C# you can annotate the property directly. So for example, if you want to annotate a property as being non-serializable, for example, which method do you put the annotation on? In C# it's unambiguous.
  3. In C# the reflection system can give you a list of properties directly, whereas in Java you have to use a BeanIntrospection class which compiles the list of properties derived from examining the methods.
However, I've discovered that properties also have some drawbacks. These drawbacks have less to do with properties directly, and more to do with the style of code that results when properties are used:
  1. Name collisions. Often you will have a private member variable that is exposed through a property getter. Unfortunately, this means that you need to come up with distinct names for both the variable and the property. Up to now, I've been using '_name' for the internal variable, and 'name' for the property, however I don't really like using underscores in this way (my preference is to let people name things as close as possible to readable English, and not require funny prefix or suffix characters - I strongly feel that if the the programmer has to write special prefix or suffix characters onto more than 20% of the identifiers in a program, then the language design is deficient.)
    • In Java, this is not a problem, since 'getFoo' is always going to be named differently than 'foo'.
  2. Chaining. A very common code pattern in Java is the use of chained setters, each of which returns 'this' as a return argument: object.setName(name).setDescription(desc).setKey(key)...and so on. This is a very convenient pattern which I use all the time in Java, but unfortunately, it doesn't work with property assignments, since those don't return a value. You need to repeat the base expression ('object') once for each value set.
  3. I don't like taking the name 'get' and 'set' as keywords. These two names are very commonly method names in Java (See Provider.get() in Guice as an example). "get()" is often used when you have some class that defines a single value that it can produce, such as a factory or cache. I'd hate to break up a nice naming convention :)
I also wonder if it's a good idea to re-use the 'def' keyword to define a property. Syntactically it's unambiguous because of the colon, but the code might be more readable if I changed it to use the word 'prop' instead.

Tart status update

I've been busy this last week re-doing Tart's reflection data structures. Unlike the previous scheme, which used fully-realized, statically-compiled Tart objects to represent the various types, method descriptors, and other metadata needed for reflection, the new scheme uses a highly-compressed byte-encoding of the data. The compressed data is converted into objects lazily on first access.

By the way, all of this lazy evaluation makes heavy use of Tart's 'lazyEval' macro, which is part of the core library:

    return lazyEval(<variable>, <initialization-expression>);

Where 'initialization-expression' has type T, and 'variable' is declared as having type 'optional T'. Basically, lazyEval checks to see if 'variable' is initialized, and if not, it initializes it using 'initialization-expression'. The initialization expression is not evaluated if the variable is already initialized.

At some point I'll need to also do a 'lockedLazyEval' which is a synchronized version. But before that can happen I'll need mutexes (mutices?) to be implemened. That's going to be a bit tricky, for a number of reasons. For example, Tart objects can move around in memory, but I'm not sure whether or not it's legal to relocate a pthread_mutex_t. I suspect it isn't. Also, it gets into the whole realm of writing platform-dependent code in Tart (as opposed to platform-dependent code in the C runtime library, which is no problem), which I haven't really worked out yet.

Anyway, lazy evaluation is one of those common code patterns you see everywhere, but it's hard to refactor into a library routine unless your language gives you some way to selectively control evaluation. Macros are one way to do it - closures are another, although they are a bit more wordy.

Another aspect of reflection I am working on at the moment is the "invocation adapters". These are wrapper functions used when calling a method via reflection - it converts the argument list from type Object[] to the native types of the called method's argument list. Each distinct function type requires a distinct invocation adapter - in other words, if two functions take the same argument types, then they can share the adapter function. (Note that the invocation adapter does not handle the 'self' argument - that is handled by a prior stage in the calling process - so two instances methods can share the same adapter, even if they are members of different classes).

When the compiler generates the reflection metadata for a class, it creates a set of all of the unique function types defined by that class. Each of these is assigned an index. Each method definition in the class contains the index corresponding to it's function type, which is encoded using a VarInt, which typically takes one byte. The class metadata also contains a table of pointers to invocation adapter functions, which is also referenced by index. This information is used to lazily construct a method descriptor object, which contains a pointer to both the method itself and the invocation adapter for that method.

In other news, I implemented the Python-style String.join method: So now you can say either ", ".join(list) or String.join(", ", list). The 'list' argument must either be an Iterable[String], or a variable number of String arguments.