Sunday, May 15, 2011

Tart status update

Well, I'm still struggling with my two bugs, which are pretty much the only things preventing me from doing an 0.1 release of Tart at this point. The two bugs are:

1) On 32-bit OS X with debugging turned on, the tests fail to compile - I get a segfault in llc. With debugging disabled, everything compiles fine.

2) On 32-bit OS X, Array.copyElements() crashes - specifically the call to llvm.memmove segfaults. I don't know why.

Neither of these bugs manifests on my 64-bit linux box. Unfortunately, those are the only two machines I have, so I don't know if it's OS X related or 32-bit related. Both bugs are very difficult to diagnose, I think I've been struggling with the second one for almost 6 months now. Honestly, if I could pay someone a thousand bucks to solve these for me it would be worth it - I've spent far more of my time than that on these two.

In the mean time - I checked in the Path class and the tests for it, so we now have file path manipulation functions.

Also, I've been thinking a bit about FFI - that is, foreign function interface. It has always been my plan that Tart should be capable of calling directly to C functions without the need to write a separate wrapper library. The main idea is that you have a bunch of different annotations (package tart.ffi.*) that tells the compiler how transform the function parameters and return value. So for example:

  @CMallocString
  @Extern("getcwd")
  def currentDir -> String;

The idea here is that the compiler generates a wrapper function which calls the underlying getcwd() provided by the operating system. The @CMallocString annotation tells the compiler that the result of getcwd() is a C-style string that has been malloc'd, and which can be freed. The compiler should, in this case, generate the code to convert the C-style string into a Tart String object.

Some other possible annotations might be:

  CString - function returns a char * pointer.
  CWideString - function returns a wchar_t * pointer.
  CMallocString - function returns a char * pointer which can be free()'d.
  CMallocWideString - function returns a wchar_t * pointer which can be free()'d.
  CArray - parameter annotation indicating that the function requires a C-style array pointer.

In addition, we need some way to tell the compiler that certain operations, such as i/o may not return immediately. The reason for this is due to the way garbage collection works - if we run out of memory while some thread is waiting for i/o, we would have to wait for the i/o to finish before doing the collection, because we don't know that it's safe to do a collection until we rendezvous with the thread. The way around this is to have the thread that is doing the i/o signal in advance that it's OK to do a collection while it's waiting. I use the term "suspended" to describe threads in this state.

So what we'd need is something like this:

   with Suspension() {
     // do i/o
   }

The "Suspension" class is an object that supports the "scoped object" protocol, meaning that it has two methods, enter() and exit(), which are called at the beginning and end of the 'with' block, regardless of how the 'with' block is exited (such as throwing an exception.) In the case of Suspension, the 'enter' method puts the current thread into the suspended state, and the 'exit' method restores it to the normal state. While the thread is suspended, any object references may change their address without warning, since the thread is no longer synchronized to the collector. In general you want to do as little as possible in this state, and you should only used 'pinned' objects (which never get moved around as long as they are in the pinned state.)

So in actuality, a typical i/o operation will look like this:
     
   var buffer = ubyte[](128);
   with pbuf = pin(buffer) {
     with Suspension() {
       fread(pbuf, pbuf.size);
     }
   }

Of course, all of this stuff will get buried in the i/o classes and never actually be seen by most users.

Tuesday, May 10, 2011

Yak shaving and plans

So I'm trying to fix a compilation bug, and as part of this I'm once again attempting to build the tart executables with clang, since it tends to be much stricter than gcc. Moreover, I'm attempting to do this on my iMac instead of my Ubuntu/ThinkPad, since I want to make sure that everything is working on that platform.

But when I attempted to build clang and run it, it crashed on me. So I'm thinking there's something amiss with my llvm checkout - for one thing, it's supposed to automatically update the clang directory when I svn up on the llvm directory and that wasn't working. So I'll just go ahead and blow away the whole directory and get a fresh checkout.

But then I'm thinking that maybe svn is out of date - it's been like a year since I did 'port update', so I go and do that. Only 'port' tells me I should do a "port selfupdate' first, which I proceed to do. When that finishes, it recommends that I do a "port upgrade outdated", so OK fine I'll do that - so now I'm sitting here waiting for 'boost' to compile, after having finished zlib, perl5, expat, libiconv, and a whole bunch of other software which really has nothing to do with the problem I'm trying to solve.

Yak Shaving indeed.

In the mean time, I've been sketching out some of the other tart.io classes that will eventually need to be written. Here's a rough sketch for Path and Directory:

/** Utility functions for operating on file paths. */
namespace Path {
  def absolute(path:String) -> String;
  def exists(path:String) -> bool;
  def normalize(path:String) -> String;
  def isReadable(path:String) -> bool;
  def isWritable(path:String) -> bool;
  def isAbsolute(path:String) -> bool;
  def isDirectory(path:String) -> bool;
  def filename(path:String) -> String;
  def parent(path:String) -> String;
  def extension(path:String) -> String;
  def changeExtension(path:String, ext:String) -> String;
  def combine(basepath:String, newpath:String) -> String;
  def toNative(path:String) -> String;
  def fromNative(path:String) -> String;
}

/** Utility functions for operating on directories. */
namespace Directory {
  def current:String { get; }
  def create(path:String) -> bool;
  def directories(pattern:String = "*") -> Iterator[String];
  def files(pattern:String = "*") -> Iterator[String];
  def entries(pattern:String = "*") -> Iterator[String];
}

First thing to note is all of these methods operate on regular strings. Back about 4 years ago there was a huge argument on python-dev about introducing a "Path" object which would have special methods for concatenation, splitting, and so on - sort of like Java's File object. Although it was cool / cute in some ways, I think there's a lot of value in allowing all of the regular operations on strings to work on pathnames too - and there's likely to be confusion if we overload regular string operators like concatenation to work differently with paths than with strings. So paths are just strings, and the special operations that paths need - like extracting the file extension - are just functions in a namespace instead of being operators.

I'm not completely satisfied with the names. In general, I try to follow certain principles:
  • I like names that are short and succinct.
  • Method names should, in general, be verbs, unless they are merely getters which return a value and don't have side effects, in which case it's OK to use a noun.
  • I tend to avoid names like "getFilename()" in favor of just "filename". I think that having to put the word "get" in front of every method like you often see in Java is needless redundancy - any token or symbol which is repeated too many times eventually becomes mentally tuned out.
    • As a general rule, the expression 'foo(bar)' means either 'do foo to bar' or 'return property foo of bar'.
  • Use namespaces rather than long method names to disambiguate similar names. I prefer "Directory.current" rather than "currentDirectory" or "getCurrentDirectory". Although bear in mind that someone may import your symbol into their module as unqualified names, in which case there's a chance of collision with other unqualified names. This is mostly an issue with class names rather than with methods in a namespace. So the rule is don't make class names too generic, even if they are namespaced.
One problem I have is that the word 'absolute' is neither a noun or a verb - it's an adjective, which confuses the meaning of the method. I suppose I really ought to name it "makeAbsolute".

Another issue to resolve is that there are a bunch of functions that operate equally well on either files or directories. The question I have is where they should go:

def remove(path:String, recursive:bool = false) -> bool;
def move(from:String, to:String);
def lastAccessTime(path:String) -> Time;
def lastModificationTime(path:String) -> Time;
def creationTime(path:String) -> Time;
def setLastAccessTime(path:String, time:Time);
def setLastModificationTime(path:String, time:Time);
def setCreationTime(path:String, time:Time);

These are all functions that actually touch the filesystem. One place to put them is in Path, although it's a little weird to be mixing functions that only do string manipulation with functions that actually hit the disk. I guess that's OK though. The way C# organizes things is to put "getCreationTime" and similar functions in two places, Directory and File, and they are basically the exact same functions, which work on both files and directories. That's kinda kooky.

Now. you may notice that some of these functions return a Time object, which is something else that isn't defined yet. I'm thinking that we need three classes:

struct Time {
  let value:int64;
  let calendar:Calendar;
}

struct TimeSpan {
  let value:int64;
}

class Calendar;

I've omitted all of the methods for clarity.

The 'Calendar' is an object that defines the time base - such as what year is year zero, how many nanoseconds a tick is, and so on. All Time objects are defined with respect to some Calendar.

'Time' represents a particular moment in time, relative to a Calendar. It's a struct (which means it's passed by value) and it's immutable.

'TimeSpan' represents a duration. You can add and subtract TimeSpans from Times.

The actual meaning of the 'value' param in Time is probably platform-dependent - that is, it will use whatever representation that platform uses for time values. The Time struct will have lots of methods to convert that into more familiar units such as seconds, milliseconds, and so on.

One final thing to note about Time, is that I really don't know much about date conversions and other time-related stuff. In other words, I'm totally unqualified to implement these classes.

Saturday, May 7, 2011

Tart status update

It's been a slow week for Tart development, due to a whole host of distractions, including work on the house (pretty much complete replacement of all of the old galvanized steel pipe under the house with new hard copper and PEX), there have been workmen running in and out for several days. Plus the heat wave has been making me sleepy :)

However, I did manage to get one thing done, which is that I wrote a new container class - Deque. I'm pretty pleased with the way it turned out. This is also the first container class that includes a fast-fail mutation guard - that is, it will detect if you've mutated the container while iterating, and throw an exception. Eventually I'll want to add this to some of the other containers as well.

Also, the expression "typename?" is now a shortcut for "optional typename" which is a shortcut for "typename or Null". Originally I didn't like the way the question mark looked, however I've realized that "optional" has a number of problems due to the fact that it's a prefix, and most type modifiers in Tart are suffixes. Mixing suffixes and prefixes is a bad idea, as witnessed by the continual confusion they create in C++ with expressions like "int * items[]". There's also the fact that "optional" really means "nullable", which is a slightly different concept. So the "optional" keyword is eventually going to go away.

Overall, we are really, really close to satisfying all my self-imposed criteria for an 0.1 release of Tart. Really, there's only a small number of tasks that need to be done:
  • Get the unit tests to run on 32-bit OS X (currently they all segfault).
  • Test "make install" and fix any bugs.
  • Finish revising the language intro doc (currently about 2/3 done.)
  • Get readLine() implemented.
In fact, I'd be willing to say that only the first two of those should really be a release blocker.

Sunday, May 1, 2011

Tart status update

A few short items, since this has been a rather busy week and I have not been able to work on Tart as much as I would have liked:

1) I haven't forgotten about Liam's IOStream patch, and I'm still thinking about it. Just a few comments:
  • stdin/stdout aren't necessarily connected to files - they could be sockets or pipes - so calling them FileStreams is a bit of a misnomer. Also, I'm trying to consider how the various byte array input / output streams are going to fit into the picture.
  • I'm also thinking about how file read operations are going to work in a mult-threaded context. Reading from a file into a buffer is probably going to go like this: 1) pin the buffer in memory. 2) Tell the GC that that it's OK to do collections while we're waiting for the i/o to complete. 3) Do the actual read, 4) Tell the GC that the read is done, and that it should only do collections when we say it's OK, and finally 5) unpin the buffer. The main idea is that the garbage collector should be able to do collection cycles while we're waiting for the i/o to complete. Note that none of these issues should affect the current design at all, I'm just thinking about the future.
  • For reading lines terminated by newline, it's a shame that we can't use the underlying fgets() function, which is likely to be much more efficient and will take advantage of buffering in the FILE object. Unfortunately, fgets doesn't support universal newlines.
  • File paths: I'm in the Python / C# camp that believes that a file path is just a plain old string (with a bunch of utility functions for operating on them), vs. the Java camp that says that a path is a special kind of object (class File). Note that I feel somewhat differently when it comes to URLs, since they are much more complicated entities.
2) I have a fix for the UpCast bug, but I haven't pushed the changes out yet, mainly because I haven't written tests for it.

3) You may notice that there's a new option in the top-level CMakeLists.txt for building tartc and tartln with clang, if it's available. Everything compiles OK (and clang is both stricter and has better diagnostics than gcc.) Unfortunately, there's some weird runtime incompatibility - it crashes when the clang-compiled tartc tries to call into the gcc-compiled LLVM libraries. I haven't had too much time to think about this.

4) I still have the annoying problem of all Tart unit tests crashing on 32-bit OS X systems. This is difficult to debug due to the fact that DWARF stlll isn't working on those platforms either.

5) I also have not checked in a new feature whereby the compiler knows where the standard libraries are installed, and automatically adds them to the list of module import paths unless -nostdlib is specified on the command line.

--
-- Talin