Sunday, April 18, 2010

Tart status

Things have been pretty quiet on this list lately, mostly because I've been taking somewhat of a break from working on Tart, hoping to get my creative batteries recharged. After two and a half years of working on it, I figure a vacation is in order. Also, I've found that stepping away from a project is sometimes an effective way to deal with intractable problems, as you often get a new insight by getting some distance from the problem. In fact, I can already see a couple of minor things I'd like to change.

In terms of garbage collection, I did make a some progress on tracing of stack roots. I was able to write a garbage collection "strategy" plugin for LLVM that lets me generate a stack frame descriptor for each function. For example, take the following test function:

def f3 {
  var a = "S3a";
  var b = "S3b";
  var c = b;
  f4();
}

The stack frame descriptor for this function looks like this (in assembly):

        .align  8
.Lgc_stack1:
        .quad   -8
        .quad   -16
        .quad   -24
        .quad   1

This contains three entries, followed by a end-of-list entry that has the low bit set. It says that there are 3 stack roots at offsets -8, -16 and -24 from the base pointer.

I wrote a test program that simply prints out the stack roots instead of doing actual collections, and here's what I see for this function:

Found frame 0x7fffd6826f50, return 0x40110d
Found var in frame 0x7fffd6826f50, offset -008 addr 0x7fffd6826f48 value 0x408010
Found var in frame 0x7fffd6826f50, offset -016 addr 0x7fffd6826f40 value 0x408040
Found var in frame 0x7fffd6826f50, offset -024 addr 0x7fffd6826f38 value 0x408040

Here's what the assembly for the function looks like:

        .type   GCTraceTest.f3,@function
GCTraceTest.f3:                         # @GCTraceTest.f3
.Leh_func_begin6:
.Lfunc_begin4:
.Llabel48:
# BB#0:                                 # %entry
        pushq   %rbp
.Llabel44:
        movq    %rsp, %rbp
.Llabel45:
        subq    $32, %rsp
.Llabel46:
        movq    $0, -8(%rbp)
        movq    $0, -16(%rbp)
        movq    $0, -24(%rbp)
.Llabel49:
        movq    $.string3, -8(%rbp)     # GCTraceTest.tart:24:8
.Llabel50:
        movq    $.string4, -16(%rbp)    # GCTraceTest.tart:25:8
.Llabel51:
        movq    $.string4, -24(%rbp)    # GCTraceTest.tart:26:8
.Llabel52:
        movq    (%rdi), %rax            # GCTraceTest.tart:27:4
        callq   *88(%rax)               # GCTraceTest.tart:27:4
.Llabel47:                              # GCTraceTest.tart:27:4
.Llabel53:
        addq    $32, %rsp               # GCTraceTest.tart:28:2
        popq    %rbp                    # GCTraceTest.tart:28:2
        ret                             # GCTraceTest.tart:28:2

In the above assembly, the label "Llabel47" defines a post-call safe point, meaning that from this function's point of view, a garbage collection could occur during the call that immediately precedes that label. There is a static map that contains an entry for "Llabel47" and maps it to the stack frame descriptor shown earlier. The stack walker uses the frame pointer to find the return address of the calling function, and looks that up in the map. It then gets the caller's frame pointer, and uses the offsets in the stack frame descriptor to find the locations of the variables containing stack roots.