Saturday, October 31, 2009

So my ideas for const won't work

I'm starting to realize that C-style 'const' and reflection do not play well together.
 
A quick review: C's semantics for const are distinctive in that they are transitive - that is, you can declare a variable as const in such a way that it applies to the members of the object being referred to. Of course, in Java and other languages you can declare a variable as 'final', but that attribute applies to the variable only, it doesn't affect the object being referred to.
 
Another way to think of this is that in Java, the const-ness of an object is self-contained - all of the information about what is mutable and what is not is represented in the object's type, and cannot be changed by an external entity that happens to reference the object.
 
The problem with reflection is that function arguments must be converted into generic Objects in order to be handled uniformly, which means that any information outside of the object is lost. There's no way to represent the const-ness of the object reference when passing it as a generic Object. This means that either you can never pass a 'const' object to a reflected function ever, or passing an const object as an argument to a reflected function silently makes it non-const, which violates the contract. The third alternative is to make the reflection interface more complex and less efficient by wrapping each const object in a special const wrapper object.
One question that should be asked is, what makes C-style const useful?
 
My answer to this starts with the observation that the use-cases for being able to control object mutability fall into two large groups: Group 1 is about object permissions, controlling what parts of the code are allowed to mutate an object. Group 2 is about managing shared state, especially between threads, where being able to declare an object as immutable is a handy shortcut to the general shared state problem.
 
In Java, one generally deals with these problems by explicitly exposing a separate interface that offers a reduced set of methods -- removing all of the methods that could cause a mutation of the object, but leaving the methods which don't cause state chanes. This involves a lot of extra typing. It also doesn't address the Group 2 use cases, since the removal of mutating methods does not guarantee that the data won't be mutated in some other way.
 
The value of transitive const is that it automatically creates such an interface with a single keyword: That is, given a set of methods, it can 'filter out' the ones we don't want, not really removing them but making it an error to call them.
 
Of course, there are many possible such filterings of an interface, but this particular filtering - removal of non-const methods - is so common that it makes sense to support it in the language as some sort of shortcut.
 
One possible solution to the problem would be some way in the languge to "auto-create" an interface - given an interface, return a transformed version of that interface with the mutating methods removed. That doesn't solve the Group 2 use case, but at least it avoids the extra typing. Also, my ideas about this are completely nebulous and fuzzy.

--
-- Talin

Friday, October 30, 2009

Factoring reflection

Now that boxing and unboxing are out of the way, that clears the way for me to start working on function reflection. The reason I needed the boxing support is due to the way that reflected methods are called. For example, suppose we have a method foo of some class Bar:

   class Bar {
     def foo(a:int, b:int) -> int { }
   }

Reflecting 'foo' yields an object of type tart.reflect.Method, which among other things has a "call" method:

   class Method {
     def call(obj:Object, args:Object[]) -> Object;
   }

What 'call' needs to do is typecast the arguments from type Object into the correct argument types. For arguments whose type is a subclass of Object, we need to downcast the argument to that type. For arguments that are primitive types, the input argument must be a boxed value, and 'call' must therefore unbox it. In either case, if the input argument is not the right type, then a TypecastException is thrown. Note that all of this code is generated automatically by the compiler - for each function, there is an instance of 'Method' which is a static immutable global containing the information for that function.

One problem here is that the amount of code and data generated for a single function is large - thus, a simple 'getter' that returns a value might only require a handful of instructions, but the 'call' method needed to typecheck and convert its arguments may be an order of magnitude larger. Generating a 'call' function for every user-defined function would lead to major code bloat.

One obvious optimization is to recognize that the argument type checking logic is the same for functions that have the same type. Thus, if I have two methods a(int, int) and b(int, int), they can share the same type-casting code. We can factor out this code into a function, which I will call "invoke", which takes the arguments of "call" plus a reference to the function itself:

  var invoke:static fn (func:fn (), obj:Object, args:Object[]) -> Object;

The syntax is a bit unfamiliar - the sequence 'static fn' is used to declare a reference to a function that does not have a 'self' argument. What the above declaration represents is a pointer to a function which accepts three arguments, the first of which is a function.

The compiler-generated 'Method' object will now contain two pointers: one to the function itself, and one to the 'invoke' function for that function type. The 'invoke' function is shared by all functions that have the same type. This sharing even works across module boundaries, because each 'invoke' function has a unique name which is derived from the argument types, such as ".invoke(int, int)->int". (LLVM allows function names to contain any printable character, a feature which I rely on heavily.)

Next we need to consider how to handle the 'self' argument. The first approach is to treat it like any other argument to 'invoke'. However, this greatly increases the number of permutations of 'invoke' that get generated. Since the 'self' argument is different for every class, it means that no class can share their 'invoke' methods with other classes.

A better way is to hoist the type-checking of self out of the invoke method and put it in a separate function. So now we have 3 pointers in the Method class: the function pointer, the pointer to invoke, and the pointer to the self type check. Actually the last one doesn't need to be a function pointer, it only needs to be a pointer to data needed to perform an 'isa' test.

As an aside, this same organization can be used for the other data stored in the Method object, such as the list of function parameters. For example, if we store the parameter names in a separate array from the parameter types, then the parameter type descriptor objects can be shared with all other methods that have the same parameter types. Similarly, the array of parameter names can be shared with any function that has the same param names, even if the parameter types are different.

One final issue to address: What if the type of 'self' is not an object? You see, in Tart, there are two major branches on the tree of types: Reference types, which are always passed by reference, and Value types, which are always passed by value. Primitive types are value types; so are structs. Reference types are allocated on the heap and are garbage-collectable; Value types can only live inside of other objects, or on the stack.

Normally any value type can be converted into an Object by boxing, so if you need to pass an int to a function that expects an Object, you can just box it. However, boxing creates a *copy* of the object. That's fine for most parameters, but for 'self' we don't want to create a copy, since that would make it impossible to write a method that mutates the fields of the struct. We need a pointer to the original object.

Tart deals with this by handling the 'self' parameter of value types specially: It is always a reference, even if the type is not a reference type. This has to be handled carefully, however. With Reference types, the 'self' pointer always points to the beginning of a heap allocation, but the 'self' pointer of a value type might point into the middle of an object. The garbage collector doesn't know how to handle such pointers. To avoid memory leaks or data corruption, we can't allow the 'self' argument of a method of a value type to leave the function. We can do this by making 'self' a special type that can be member-dereferenced only - that is, you can access any member, but you can't assign the value to anything. This is for value types only, of course.

But now we want to be able to call methods on value types via reflection, which means we *have* to have some way to pass a struct reference around. For structs which live in the middle of an object, the safe thing to do is to create a wrapper that contains both the struct pointer and a 'base' pointer to the object that contains the struct. That way, the garbage collector can trace the 'base' pointer normally, ignoring the struct pointer.

For structs which live on the stack, there's only one remedy, which is to box them ahead of time, and then pass the boxed reference around. Essentially we will have to have the compiler notice if anyone is using the struct in a way that requires a persistent reference to the object, and convert the stack-based object into a heap-based one at compile time. Note that this is not that different from what happens when we create a closure cell - we notice that someone is using the variable from within a closure, in which case we wrap the variable and put it on the heap.

--
-- Talin

Thursday, October 29, 2009

Progress: Unboxing and explicit specialization

I've made a bunch of progress this last week. Unboxing now works, and what's even better is that it is implemented in pure Tart - although I did need to fix several bugs in the compiler, I did not need to add any special support in the compiler for either boxing or unboxing.

Auto-boxing is done via a coercion function in the Object class:

/** Implicitly convert non-object values to Boxed types. */
static def coerce[%T] (value:T) -> Object { return ValueRef[T](value); }
static def coerce(value:Object) -> Object { return value; }

If the input argument derives from Object then it is returned unchanged, otherwise if it's a value type, such an int, it's wrapped in a ValueRef. This happens automatically whenever you attempt to pass a primitive type to a function that takes an Object as an argument.

The ValueRef class itself is fairly simple:

/** A reference type used to contain a value type. */
class ValueRef[%T] : Ref {
  let value:T;
  
  def construct(value:T) {
    self.value = value;
  }

  /** Return a string representation of the contained value. */
  def toString() -> String {
    return self.value.toString();
  }
}

To unbox a boxed value, use the static "valueOf" method in class "Ref". It requires an explicit type parameter:

var v = Ref.valueOf[int](obj);

The 'valueOf' method is defined for each different primitive type and uses the 'classify' statement to downcast to the correct wrapper type. Here's the entire 'Ref' class:

/** Abstract class used to represent a reference to some value. */
abstract class Ref {
  private static {
    /// Convert signed integers to their unsigned equivalents. Supresses conversion warnings.

    def asUnsigned(v:int8) -> uint8 { return uint8(v); }
    def asUnsigned(v:int16) -> uint16 { return uint16(v); }
    def asUnsigned(v:int32) -> uint32 { return uint32(v); }
    def asUnsigned(v:int64) -> uint64 { return uint64(v); }

    /// Convenience functions to check whether the given input is within the numeric range
    /// of the type specified by the type parameter.

    def rangeCheck[%T](value:int32) {
      if value < T.minVal or value > T.maxVal {
        throw TypecastException();
      }
    }

    def rangeCheck[%T](value:int64) {
      if value < T.minVal or value > T.maxVal {
        throw TypecastException();
      }
    }

    def rangeCheck[%T](value:uint32) {
      if value > asUnsigned(T.maxVal) {
        throw TypecastException();
      }
    }

    def rangeCheck[%T](value:uint64) {
      if value > asUnsigned(T.maxVal) {
        throw TypecastException();
      }
    }

    def rangeCheck[%T](value:char) {
      if uint32(value) > asUnsigned(T.maxVal) {
        throw TypecastException();
      }
    }
  
    def check(value:bool) {
      if not value {
        throw TypecastException();
      }
    }

    /// Check to insure that the input value is not negative.

    def signCheck(value:int32) {
      if value < 0 {
        throw TypecastException();
      }
    }

    def signCheck(value:int64) {
      if value < 0 {
        throw TypecastException();
      }
    }
  }

  /// Convert an object containing a number to the requested number type.

  static def valueOf[bool](ref:Object) -> bool {
    classify ref {
      as v:ValueRef[bool]   { return v.value; }
      else {
        throw TypecastException();
      }
    }
  }

  static def valueOf[char](ref:Object) -> char {
    classify ref {
      as v:ValueRef[int8]   { signCheck(v.value); return char(v.value); }
      as v:ValueRef[int16]  { signCheck(v.value); return char(v.value); }
      as v:ValueRef[int32]  { 
        check(v.value >= 0 and uint32(v.value) <= uint32(char.maxVal));
        return char(v.value);
      }
      as v:ValueRef[int64]  {
        check(v.value >= 0 and uint64(v.value) <= uint64(char.maxVal));
        return char(v.value);
      }
      as v:ValueRef[uint8]  { return char(v.value); }
      as v:ValueRef[uint16] { return char(v.value); }
      as v:ValueRef[uint32] { check(v.value <= uint32(char.maxVal)); return char(v.value); }
      as v:ValueRef[uint64] { check(v.value <= uint64(char.maxVal)); return char(v.value); }
      as v:ValueRef[char]   { return v.value; }
      else {
        throw TypecastException();
      }
    }
  }

  static def valueOf[int8](ref:Object) -> int8 {
    classify ref {
      as v:ValueRef[int8]   { return v.value; }
      as v:ValueRef[int16]  { rangeCheck[int8](v.value); return int8(v.value); }
      as v:ValueRef[int32]  { rangeCheck[int8](v.value); return int8(v.value); }
      as v:ValueRef[int64]  { rangeCheck[int8](v.value); return int8(v.value); }
      as v:ValueRef[uint8]  { rangeCheck[int8](v.value); return int8(v.value); }
      as v:ValueRef[uint16] { rangeCheck[int8](v.value); return int8(v.value); }
      as v:ValueRef[uint32] { rangeCheck[int8](v.value); return int8(v.value); }
      as v:ValueRef[uint64] { rangeCheck[int8](v.value); return int8(v.value); }
      as v:ValueRef[char]   { rangeCheck[int8](v.value); return int8(v.value); }
      else {
        throw TypecastException();
      }
    }
  }

  static def valueOf[int16](ref:Object) -> int16 {
    classify ref {
      as v:ValueRef[int8]   { return int16(v.value); }
      as v:ValueRef[int16]  { return v.value; }
      as v:ValueRef[int32]  { rangeCheck[int16](v.value); return int16(v.value); }
      as v:ValueRef[int64]  { rangeCheck[int16](v.value); return int16(v.value); }
      as v:ValueRef[uint8]  { return int16(v.value); }
      as v:ValueRef[uint16] { rangeCheck[int16](v.value); return int16(v.value); }
      as v:ValueRef[uint32] { rangeCheck[int16](v.value); return int16(v.value); }
      as v:ValueRef[uint64] { rangeCheck[int16](v.value); return int16(v.value); }
      as v:ValueRef[char]   { rangeCheck[int16](v.value); return int16(v.value); }
      else {
        throw TypecastException();
      }
    }
  }

  static def valueOf[int32](ref:Object) -> int32 {
    classify ref {
      as v:ValueRef[int8]   { return v.value; }
      as v:ValueRef[int16]  { return int32(v.value); }
      as v:ValueRef[int32]  { return v.value; }
      as v:ValueRef[int64]  { rangeCheck[int32](v.value); return int32(v.value); }
      as v:ValueRef[uint8]  { return int32(v.value); }
      as v:ValueRef[uint16] { return int32(v.value); }
      as v:ValueRef[uint32] { rangeCheck[int32](v.value); return int32(v.value); }
      as v:ValueRef[uint64] { rangeCheck[int32](v.value); return int32(v.value); }
      as v:ValueRef[char]   { rangeCheck[int32](v.value); return int32(v.value); }
      else {
        throw TypecastException();
      }
    }
  }

  static def valueOf[int64](ref:Object) -> int64 {
    classify ref {
      as v:ValueRef[int8]   { return int64(v.value); }
      as v:ValueRef[int16]  { return int64(v.value); }
      as v:ValueRef[int32]  { return int64(v.value); }
      as v:ValueRef[int64]  { return v.value; }
      as v:ValueRef[uint8]  { return int64(v.value); }
      as v:ValueRef[uint16] { return int64(v.value); }
      as v:ValueRef[uint32] { return int64(v.value); }
      as v:ValueRef[uint64] { rangeCheck[int64](v.value); return int64(v.value); }
      as v:ValueRef[char]   { rangeCheck[int64](v.value); return int64(v.value); }
      else {
        throw TypecastException();
      }
    }
  }

  static def valueOf[uint8](ref:Object) -> uint8 {
    classify ref {
      as v:ValueRef[int8]   { signCheck(v.value); return uint8(v.value); }
      as v:ValueRef[int16]  {
        check(v.value >= 0 and v.value <= int16(uint8.maxVal));
        return uint8(v.value);
      }
      as v:ValueRef[int32]  {
        check(v.value >= 0 and v.value <= int32(uint8.maxVal));
        return uint8(v.value);
      }
      as v:ValueRef[int64]  {
        check(v.value >= 0 and v.value <= int64(uint8.maxVal));
        return uint8(v.value);
      }
      as v:ValueRef[uint8]  { return v.value; }
      as v:ValueRef[uint16] { rangeCheck[uint8](v.value); return uint8(v.value); }
      as v:ValueRef[uint32] { rangeCheck[uint8](v.value); return uint8(v.value); }
      as v:ValueRef[uint64] { rangeCheck[uint8](v.value); return uint8(v.value); }
      as v:ValueRef[char]   { rangeCheck[uint8](v.value); return uint8(v.value); }
      else {
        throw TypecastException();
      }
    }
  }

  static def valueOf[uint16](ref:Object) -> uint16 {
    classify ref {
      as v:ValueRef[int8]   { signCheck(v.value); return uint16(v.value); }
      as v:ValueRef[int16]  { signCheck(v.value); return uint16(v.value); }
      as v:ValueRef[int32]  {
        check(v.value >= 0 and v.value <= int32(uint16.maxVal));
        return uint16(v.value);
      }
      as v:ValueRef[int64]  {
        check(v.value >= 0 and v.value <= int64(uint16.maxVal));
        return uint16(v.value);
      }
      as v:ValueRef[uint8]  { return v.value; }
      as v:ValueRef[uint16] { return v.value; }
      as v:ValueRef[uint32] { rangeCheck[uint16](v.value); return uint16(v.value); }
      as v:ValueRef[uint64] { rangeCheck[uint16](v.value); return uint16(v.value); }
      as v:ValueRef[char]   { rangeCheck[uint16](v.value); return uint16(v.value); }
      else {
        throw TypecastException();
      }
    }
  }

  static def valueOf[uint32](ref:Object) -> uint32 {
    classify ref {
      as v:ValueRef[int8]   { signCheck(v.value); return uint32(v.value); }
      as v:ValueRef[int16]  { signCheck(v.value); return uint32(v.value); }
      as v:ValueRef[int32]  { signCheck(v.value); return uint32(v.value); }
      as v:ValueRef[int64]  {
        check(v.value >= 0 and v.value <= int64(uint32.maxVal));
        return uint32(v.value);
      }
      as v:ValueRef[uint8]  { return uint32(v.value); }
      as v:ValueRef[uint16] { return uint32(v.value); }
      as v:ValueRef[uint32] { return v.value; }
      as v:ValueRef[uint64] { rangeCheck[uint32](v.value); return uint32(v.value); }
      as v:ValueRef[char]   { return uint32(v.value); }
      else {
        throw TypecastException();
      }
    }
  }

  static def valueOf[uint64](ref:Object) -> uint64 {
    classify ref {
      as v:ValueRef[int8]   { signCheck(v.value); return uint64(v.value); }
      as v:ValueRef[int16]  { signCheck(v.value); return uint64(v.value); }
      as v:ValueRef[int32]  { signCheck(v.value); return uint64(v.value); }
      as v:ValueRef[int64]  { signCheck(v.value); return uint64(v.value); }
      as v:ValueRef[uint8]  { return uint64(v.value); }
      as v:ValueRef[uint16] { return uint64(v.value); }
      as v:ValueRef[uint32] { return uint64(v.value); }
      as v:ValueRef[uint64] { return v.value; }
      as v:ValueRef[char]   { return uint64(v.value); }
      else {
        throw TypecastException();
      }
    }
  }

  static def valueOf[float](ref:Object) -> float {
    classify ref {
      as v:ValueRef[int8]   { return float(v.value); }
      as v:ValueRef[int16]  { return float(v.value); }
      as v:ValueRef[int32]  { return float(v.value); }
      as v:ValueRef[int64]  { return float(v.value); }
      as v:ValueRef[uint8]  { return float(v.value); }
      as v:ValueRef[uint16] { return float(v.value); }
      as v:ValueRef[uint32] { return float(v.value); }
      as v:ValueRef[uint64] { return float(v.value); }
      as v:ValueRef[float]  { return v.value; }
      as v:ValueRef[double] { return float(v.value); }
      else {
        throw TypecastException();
      }
    }
  }

  static def valueOf[double](ref:Object) -> double {
    classify ref {
      as v:ValueRef[int8]   { return double(v.value); }
      as v:ValueRef[int16]  { return double(v.value); }
      as v:ValueRef[int32]  { return double(v.value); }
      as v:ValueRef[int64]  { return double(v.value); }
      as v:ValueRef[uint8]  { return double(v.value); }
      as v:ValueRef[uint16] { return double(v.value); }
      as v:ValueRef[uint32] { return double(v.value); }
      as v:ValueRef[uint64] { return double(v.value); }
      as v:ValueRef[float]  { return double(v.value); }
      as v:ValueRef[double] { return v.value; }
      else {
        throw TypecastException();
      }
    }
  }
}

Although this class looks complex, that's mainly because there are a lot of possible combinations of conversions. In terms of usage, it's actually quite simple. It also demonstrates a number of features of Tart, including partial specialization, conversion constructors, type objects (T.minVal) and of course 'classify'.

BTW, I went ahead with Guido's suggestions about the integer type names as you can see.