Machine Words: The Copyable protocol

One of Tart's fundamental design decisions is that there should be no collection classes that have special, "privileged" status. In most programming languages, there are a few types that are built in to the compiler, and which have special syntactic sugar - so for example, in Java, arrays are a built-in type, and only arrays can use the bracket notation (array[n]) to access their elements. Whereas in Tart, the Array type is written in Tart, and moreover, any collection class can define an element access operator.

Actually, this design principle represents an unrealizable ideal more than it does an actuality. In truth, arrays do have special status in two ways: array literals, and functions taking a variable number of arguments. Both of these cases cause the compiler to generate code that constructs a new array instance. But once constructed, an array is no more "special" than any other collection.

There's a downside to this design principle, however. It's fairly common in programming to want to convert from one collection type to another - for example "flattening" a deque into an array, or converting a map into a list of tuples. In cases like these, it's highly inefficient to transfer the items one at a time from the source collection to the target collection. It would be better if there was some way to transfer elements en masse. Unfortunately, doing so requires knowing a lot about the internal representation of the collections involved.

For example, many collection classes use arrays internally - ArrayList and Deque being two prime examples. If we could arrange a bulk transfer of elements from one array to another, much of the overhead of copying could be eliminated. However, those arrays are private - allowing code outside of the class access to those arrays would destroy the data-hiding abstraction of the container.

This is where it's useful to have the compiler have special knowledge of some collections - because the compiler can recognize cases where a bulk transfer would be appropriate and generate code for it. But what if we wanted to extend this ability to user-created collection types? What we'd have to do is abstract the bulk transfer operation itself, so that the user-created classes could participate in bulk-transfer operations.

The Copyable interface represents one approach to solving this problem. Users never use Copyable objects directly - instead, when you pass one collection to another - such as ArrayList.appendAll(collection) - the destination collection checks to see if the source collection supports the Copyable interface. If it does, then it initiates one or more bulk transfers from the source collection. If it does not, then it will transfer the elements the conventional way, one at a time.

The Copyable interface makes certain assumptions about the destination container: It assumes that for a collection of size N, elements are numbered 0 .. N-1, and that sub-ranges of this numbering can be mapped onto ranges of contiguous addresses. So for example, elements 0-255 might be stored in one array, 256-511 stored in a second array, and so forth. Each bulk transfer operation specifies one of these contiguous sub-ranges as a destination buffer. It's up to the source collection to decide how to map it's own internal representation onto that buffer, and to copy its elements into that buffer. The source collection may have it's own internal divisions, which may be different from the destination's - in which case, the source collection may have to do more than one array copy to fill the destination buffer.

The actual copying of data is typically done via Array.copyElements or Memory.copy, both which use memcpy to do the actual transfer. (Tart does not allow redefining the meaning of assignment, so all objects are treated as POD. The reason is that the garbage collector can move objects around, so any assumptions made by an overloaded assignment operator would be violated anyway. But with garbage collection, there's far less need to overload assignment anyway.)

The copy operation is also synchronous - both source and destination are allowed to move things around in-between copy operations. There is no promise that the destination buffer will continue to exist after the copy finishes.

BTW, the idea of Copyable was inspired and greatly influenced by the Python buffer protocol, although Copyable is a much smaller and less featureful concept.

Machine Words

Monday, August 29, 2011

The Copyable protocol

No comments:

Post a Comment