Machine Words: Thoughts on a build configuration language

Work on Tart is on hold for the moment while I am waiting to hear back from the LLVM folks about a bug in llc.

In the mean time, I want to write down a few notes about "mint", which is the name of the hypothetical future build tool that I want to create. Although my plan is to write mint in Tart, there's nothing Tart-specific about it, and it could be written either in C++ or Python or some other language.

I've been thinking about this a lot lately, especially because there's a big discussion on llvm-dev about the future of the build system, and part of that includes some very passionate opinions about the pros and cons of cmake.

Anyway, here are some basic requirements for mint:

It should be fast.
It should be extensible, especially with respect to 3rd-party tools. One should be able to write an extension module for yacc or antlr that makes it as easy to invoke those tools as it is to invoke gcc.
It should support hermetic builds, so that a particular binary can be produced on a given host machine that is identical to one produced on another machine, even if those machines have different environments.
It should support generating project files for popular IDEs such as Eclipse, XCode, Visual Studio, and so on.

I'd also like it to support automatic download of dependent packages, but unfortunately this is an area which I don't have a lot of experience in.

So the basic idea is that the build configuration language (BCL) is a hybrid functional / declarative language that supports prototype inheritance.

The language is declarative in the sense that what it primarily does is describe a graph of objects, the Build Dependency Graph (BDG). Objects in the graph include targets, files, tools, actions, and other entities which are used in the build. Making the language declarative means that a build configuration file describes the relationships between these objects, but doesn't say how these relationships are processed or in what order - that is determined by whatever tool is actually doing the build. This is what makes it feasible to generate a project file for an IDE.

The language is functional in that an object's properties can be expressions which are lazily evaluated. For example, the set of flags passed to a compiler is calculated from a number of different inputs, including the user's choices as to whether to enable debug symbols, assertions, and so on.

Prototype inheritance allows for objects in the graph to be re-used. For example, when you declare a build target, you start by inheriting from an abstract build target prototype, and then fill in the details about your specific target. The abstract build target prototype knows about source files, output files, and dependent targets; your target fills in the details of what specific files are involved. Or you may choose to inherit from a build target prototype that knows how to invoke the C++ compiler, which saves you the trouble of having to write the command-line flags for the compiler.

Because prototype inheritance is used so much in this language, the syntax is very simple:

<prototype> { <properties> }

Where 'prototype' is some previously defined object, and 'properties' is a list of key/value pairs. So for example:

myProgram = program.cpp {
sources = [ "main.cpp" ]

}

Here we are starting with the standard prototype "program.cpp", and filling in the 'sources' slot ('sources' is defined in the prototype, but it's empty). We are then assigning that to the name 'myProgram'.

We can then, if we wish, create a variation of 'myProgram' that is compiled with optimization:

myProgramOpt = myProgram {
opt.level = 2
}

The "opt.level" property was also defined in the program.cpp prototype. The program.cpp prototype has a tool definition for the cpp compiler. Here's a glimpse of what the program.cpp prototype might look like, with many details omitted.

program.cpp = program.base {

opt = dict {
level = 0
}

cflags = []
tool = compiler.cpp {
flags = [

match (opt.level,
0 => "-O0",

1 => "-O1",
2 => "-O2",

3 => "-O3",
* => raise("Invalid optimization level: " ++ opt.level))

] ++ cflags
}
}

The first few lines define a property called 'opt' whose value is a dictionary type - basically just a namespace. Within that is the property 'level', which defaults to zero. (I'm undecided as to whether 'level' should have a type as well as a default value - it wouldn't make sense to assign a string to 'level'.)

Next we have the tool that will get run when this target gets updated. The "flags" property is a list of flags that gets passed to the compiler on the command line. When the compiler is invoked, each element of 'flags' will get evaluated and transformed into a string. That string is then passed to the compiler as the shell arguments.

In the case of opt.level, we use a match expression to transform the optimization level numbers 0..3 into compiler flags of the form -On. The idea here is to abstract away the specific syntax of the compiler's command-line arguments, so that the same syntax can be used to control different compilers.

There is also a 'cflags' property that can be overridden by descendant objects, allowing flags specific to a particular compiler to be passed in.

The main idea here is that "builtin" compilers for C and C++ use the same descriptive syntax as for 3rd-party extension compilers. In other words, the built-in rules aren't special.

Another kind of prototype is used to define the command-line options to the build script itself:

enable-asserts = option {
type = bool
default = false
help = "Whether to enable assertion checks"

}

This means that the user can type "mint --enable-assertions" to generate a build configuration with assertions enabled. Or even type "mint --help" to print out the various options available.

BTW, one thing that would be neat is if "--enable-assertions" could change just that one option, while leaving other options set to their current values.

Anyway, I'd be curious to know what people think of this idea.

Machine Words

Sunday, October 30, 2011

Thoughts on a build configuration language

No comments:

Post a Comment