Machine Words: What's next?

I still have a long TODO list, but things are steadily progressing. Here are a few of the items that are up next:

* Compile under Visual Studio. I've set up my new thinkpad to dual-boot both Windows and Ubuntu (my normal development machine is a Mac), and installed Visual Studio 2008 Express on it. Using cmake to generate a VS project for LLVM worked beautifully, and compiled without a hitch.

Unfortunately, Tart's cmake config files aren't in such good shape. My FindLLVM.cmake module relies rather heavily on the llvm-config script to generate header and library dependencies - and llvm-config is not available under Windows. I did some code searches, and it appears that what many projects do is to specify the dependencies manually for Windows, and automatically for other platforms. I don't really want to have to maintain the dependencies manually, part of why I migrated over to using llvm-config was to get away from that. (LLVM has like 30 or so separate libs, keeping the dependencies straight is a pain.)

My current thought is to create a Python script that calls llvm-config when running under Unix, and then transforms the output to cmake syntax. This would then be used to keep FindLLVM.cmake up to date.

Note that all this has to be done before I can even *think* about trying to compile Tart. At this point, I have no idea if it can even be done - at the very least, I would assume that my exception personality function will have to be completely re-written.

* Get source-level debugging working. Most of the code to generate source-level debugging is done, but it's currently turned off because LLVM does wrong things when it's turned on.

* Finish up closures.

* Garbage collection. I started sketching out some Tart classes for the garbage collector (Originally I was going to write the GC in C++, but I think I can do it in Tart nearly as well.) The issue that needs to be solved is how to handle platform differences - I can call vmalloc() on linux to get raw blocks of aligned core memory, but on Windows I'll need to call something else I guess. The important thing is that the decision can't be made at runtime, since the program won't link if it contains calls to APIs that aren't present on that platform.

For swapping out code on different platforms, there are generally two strategies: Conditional compilation (#ifdef in C++), and Makefile conditionals (i.e. replace entire source files based on the current platform). Unfortunately in a language like Tart where there's a relationship between the scope hierarchy and the arrangement of files on the filesystem, you have to be extra tricky to support whole-file swapping, although it can be done. (The easiest way is to define a separate package root for each platform and put them in the class path.)

For source-level conditional compilation, I had planned to implement something like D's "static if" - an if-statement that can appear outside of a function (or inside), in which the test expression must be a constant, and which causes the code within the 'if' to not even be compiled if the test is false.

As far as the test expressions themselves, my current thinking is to define a package, tart.config, that contains the configuration options for the current host. One issue is how much do we care about cross-compilation - it would be much easier if we didn't have to think about that just yet. For cross compilation we need lots of command-line options to the compiler; for simple host-based compilation, we can let the config script do most of the work.

* Some sort of command-line options package. I'd kind of like to do some sort of annotation-based command-line options, where each module can define its own options. Ideally something Guice-like. I know this approach can lead to problems with low-level modules exporting options that are of no interest to the user, but I think that problem is manageable.

One thing I haven't quite figured out is how to collect all of the various command-line variables. The brute force approach is to do reflection on every single field of every class, see if it has a "CmdLineParameter" annotation, and if so, process it. Another approach would be to not use annotations at all, but instead have the command-line options that are statically-constructed instances which register themselves at construction time with the command-line arguments processor. This would of course require me to get static constructors working :)

The third approach is to have some sort of link-time process that collects all fields which are annotated with some particular annotation class and then creates a list, which the program can simply iterate through upon startup. This has to be done carefully, because we only want to include in that list fields which are actually used by the program. You see, the linker currently traces through all of the symbols reachable from "main" and discards any that aren't reachable - sort of a link-time garbage collector. But this hypothetical list of annotated symbols is itself a reference, which would prevent the symbol from being discarded, even if no other part of the program referred to that symbol. So we would want to construct such as list after the dead-global elimination phase.

* Tuple types. These are partly done, but we need to finish them so we can do multiple return values and unpacking assignment.

* Reflection work - currently the only thing reflected is global functions, we need to get other declaration types reflected as well.

* Stack dumps - need a way to print out the current call stack. This one is going to be way hard.

* Finish up constructors - lots of work on inheritance, chaining, final checking, etc.

* Compiled imports. Right now when the compiler sees an "import" statement, it actually loads the source file and parses it, although it only analyzes the AST tree lazily (i.e. only stuff that is actually referred to by the module being compiled). What I'd like to do is take all of the high-level info that is lost when creating a .bc file, and encode that using LLVM metadata nodes. The idea is that if a file was already compiled, instead of recompiling it, it would just load the bitcode file and read the metadata. Assuming that the file had not changed, of course.

* "finally" statement. Although exceptions are working great, I haven't done 'finally'. However, this will be much easier now that LLVM allows for local indirect branches.

* Enum.toString(), Enum.parse().

* String formatting functions.

* I haven't even started on the i/o libraries. (No "Hello World" yet, unless you call Debug.writeLn()).

* yield / with statements.

* Docs, docs, docs...

--
-- Talin

Machine Words

Sunday, November 29, 2009

What's next?

No comments:

Post a Comment