Monday, November 28, 2011

Mint progress

As I may have already mentioned, I decided to take a break from working on Tart, and spend a few weeks working on Mint. This should by no means be interpreted as a lack of interest or commitment in Tart. It's just that there are a number of difficult problems which have caused work on Tart to proceed with frustrating slowness. Most of these problems involve the complexities of generating DWARF debug info via LLVM's APIs, problems which I've already written about extensively and see no need to repeat here.

In any case, I decided to give myself a holiday and work on something fun. And in fact the progress on Mint has been nothing short of astounding - I think I've written several hundred source files over the course of the last two weeks. (I've also been using Eclipse + EGit to do the coding, which has been a remarkably pleasant experience.)

Those who want a quick overview of what Mint is can look here:


(Click the 'code' link at the top of that page if you want to browse the source.)

Of course, part of why the work on Mint is proceeding so rapidly is that unlike Tart, Mint has virtually no dependencies - it is a self-contained program that doesn't depend on complex libraries or other open-source projects in order to work.

At the present time, the "mint" program doesn't actually do anything useful. But it does do a number of things that are interesting.

For example, Mint is able to run simple 'configure' tests such as testing for the presence of a standard header file such as <malloc.h>. It does this by invoking the C preprocessor and then looking at the exit code. Note that these configuration tests are in no way "built in" - instead they are defined in the standard Mint library.

A Python-inspired 'import' statement is used to import the configuration tests into your project's build file:

from prelude:configtests import check_include_file, check_include_file_cpp

HAVE_STDIO_H   = check_include_file { header = "stdio.h" }
HAVE_STDBOOL_H = check_include_file { header = "stdbool.h" }
HAVE_STDDEF_H  = check_include_file { header = "stddef.h" }
HAVE_STDLIB_H  = check_include_file { header = "stdlib.h" }
HAVE_CPP_ALGORITHM  = check_include_file_cpp { header = "algorithm" }

The syntax "projectname:modulename" syntax in the import statement is used to import definitions from another source tree - this is how Mint supports multiple source trees in a single build configuration. The name "prelude" is a special pseudo-project that points to the location of the Mint standard library.

The Mint language tries to be as declarative as possible, describing the end result that you want to have accomplished, while hiding from view the steps needed to get to that end result. Although Mint does have functions, most actions are described by objects rather than function calls. The typical pattern in Mint is that instead of calling a function with N parameters, you would commonly define an object with N named properties, and then 'evaluate' that object to get a list of actions to perform.

The advantage of this approach is that we can take the graph of objects and interpret it in different ways. If we want to actually build an executable program we would evaluate the object graph into a set of runnable actions for invoking compilers or other tools. On the other hand, if we want to generate an Eclipse or XCode project file, we would translate the object graph into an XML file that could be read into those programs.

Going back to the above example, the "check_include_file" symbol refers to a prototype object. In Mint syntax, an expression of the form <prototype> { <properties...> } creates a new object with the given prototype. In this case, we're creating an object that inherits from the 'check_include_file' prototype, and specifying the name of the header file to test for.

The 'configtests' module in the Mint standard library contains the definition of 'check_include_file'. Here's the actual definition (with comments stripped for brevity):

exit_status_test = object {
  lazy param message : string = ""
  param program : string = undefined
  param args : list[string] = []
  lazy param input : string = undefined
  export lazy param value : bool =
      console.status(message)
      or shell(program, args ++ ["2>&1 > /dev/null"], input).status == 0
}

check_include_file = exit_status_test {
  param header : string = undefined
  message = "Checking for C header file ${header}..."
  program = "cpp"
  args    = ["-xc"]
  input   = "#include <${header}>\n"
}

Here's a blow-by-blow explanation of this code:

"exit_status_test" is a generic prototype that performs configuration tests by running some shell program and checking the exit status. "check_include_file" inherits from, and further specializes, this prototype.

A "param" declaration defines an object property. You can only assign values to properties that have been defined in a prototype - this is to avoid spelling errors.

A "lazy param" is one that is evaluated dynamically. That means that the value of the parameter is calculated each time the parameter is accessed (the normal behavior is to evaluate the value of the parameter at the time the value is assigned.) You can think of lazy params as equivalent to no-argument member functions.

Understanding lazy parameters requires a careful understanding of Mint's scoping rules. First, Mint uses lexical scoping, like most modern languages. Mint also uses object-oriented-style lookup rules, meaning that properties defined on an object take precedence over properties defined on that object's prototype. So for example in the code sample above, we can see that the property 'header' is used in the calculation of the 'message' property of 'check_include_file'. Thus, if you tried to evaluate 'HAVE_STDIO_H.message', you'd get the value 'Checking for C header file stdio.h...". Even though 'message' is a property of 'check_include_file', when it gets evaluated the value of 'header' is taken from HAVE_STDIO_H, not 'check_include_file'.

An "export" parameter is one that should be saved when writing out the build configuration. So for example, we want to save the result of our configuration test (so that we don't have to run it every time), thus the 'value' property is declared as 'export'.

The 'program' and 'args' parameters are the name of the program to run, and the arguments to pass to it, respectively. In the check_include_file prototype, we see that the program to run is "cpp" (The C preprocessor) and the argument is "-xc" (which forces C dialect, as opposed to C++ or Objective-C).

The "input" argument is used to specify text that is to be piped into the program's standard input. In this case, it's a single line which contains a #include of the header we're interested in.

Finally, the 'value' property of 'exit_status_test' is where all of the actual work is done: It first prints a message to the console, and then executes a shell command using the built-in 'shell' function. (Ignore the 'or' in there - that's a temporary hack due to the fact that there's no statement blocks implemented yet.) The 'shell' function returns an object, which has several properties, one of which is the exit status.

If you've gotten this far, I think you'll agree that the language is quite flexible, and that it should be relatively straightforward to create configuration tests of almost any imaginable sort.

I'll show one more code example from the Mint standard library. Although Mint does not yet have the ability to do anything useful with build targets, it does at least allow build targets to be declared. The following is the (unfinished) prototype for build targets that generate executables, and what's notable about it that it can figure out from the source file extension what compiler to invoke. Moreover, this map of file extensions can be extended by objects that inherit from this prototype, making it possible to support other languages.

object_builder = builder {
  param cflags : list[string] = []
  param cxxflags : list[string] = []
  param include_dirs : list[string] = []
  param library_dirs : list[string] = []
  param warnings_as_errors = false
  param builder_map : dict[string, builder] = {
    "c"   = c_builder,
    "cpp" = cpp_builder,
    "cxx" = cpp_builder,
    "cc"  = cpp_builder,
    "m"   = objective_c_builder,
    "mm"  = objective_cpp_builder,
    "h"   = null_builder,
    "hpp" = null_builder,
    "hxx" = null_builder,
    "lib" = identity_builder,
    "a"   = identity_builder,
    "o"   = identity_builder,
  }
  export lazy param implicit_depends : list[builder] = sources.map(
      src => builder_map[path.ext(src)] {
        sources = [ src ]
      })
  actions = implicit_depends.map(b => b.actions)
}

The interesting bit is in the calculation of the 'implicit_depends' property. This uses a Mint lambda function. The syntax for lambdas is "(args) => expr" or just "arg => expr" if there's only one argument. In this case, we're running a 'map' over the list of source files, and for each source file, we get the file extension, look it up in the dictionary, and return an object prototype. We then use that prototype to generate a new object using the "proto {}" syntax. So basically this generates a new builder object for each source file.