Monday, August 29, 2011

I/O Library

I've taken Guido's advice about using file descriptors rather than FILE * handles, and as a result the i/o classes are coming along nicely. MemoryStream and StreamTextWriter are done, with unit tests. FileStream and StreamTextReader are written but still need tests. The character codecs got an overhaul as well.

One of the nice things about all of this is that it has given me a chance to write lots of Tart code, which is a welcome break from writing the compiler in C++. You can definitely see the power of Tart when you have unit tests that look like this:

stream.write([1, 2, 3, 4, 5]);
assertContentsInOrder(stream.data, 1, 2, 3, 4, 5);

In this case, 'stream' is a MemoryStream, and the 'write' method takes in an array of bytes, which in this case is an array literal. "stream.data" returns a reference to the underlying collection (an ArrayList). "assertContentsInOrder()" takes a collection and a variable number of elements, and asserts that the contents of the collection equals that list (it does this by calling iterators.equal()).

This style of coding feels very Pythonic to me, yet it's all statically typed.

One minor disadvantage of using unix file descriptors is that there's no equivalent of feof(). The only way to know that you're at the end of file is to attempt to read, and then get 0 back as the result. This doesn't play so well with higher-level APIs, which are typically designed such that you can test for end of file before doing an actual read. fread() and its ilk provide this capability by reading ahead.

This means that my lower-level i/o classes have to expose this same behavior - I can't define an "eof" predicate for class IOStream. Such a method would be expensive without buffering or read-ahead of some type. And since we want the IOStream interface to be platform-agnostic, that means that we can't define an eof method even on platforms which do support it for low-level streams. We have to go with the lowest-common-denominator (both of platforms and stream types), which means that all IOStreams have to have unix-style end-of-file behavior.

Obviously the higher-level classes such as TextReader / TextWriter are buffered, so we can define an eof() method on all of these. It's only the lower-level unbuffered classes that don't support it.

No comments:

Post a Comment