Tuesday, February 9, 2010

Some new ideas on garbage collection

Up to now, I haven't been making very much progress on garbage collection, because there are a number of problems that I haven't been able to solve. For example, if each mutator thread has its own private thread-local allocation space (for the young generation), what happens when an object belonging to one thread points to an object in another thread's allocation area? It means that in order to do a collection, both threads would have to be stopped - but how would you know which threads to stop without scanning all of the objects to see which ones refer to other spaces?

So lately I've been researching what other collectors do, and one place I looked was OpenJDK, which lead me to a couple of whitepapers on HotSpot garbage collection, from which I learned a few things.

First off, threads in HotSpot don't each have separate young generations. Rather, you have a single young generation which is divided up into TLABs (Thread-local allocation buffers). The key difference is that *all* of the threads are stopped during a young-generation (YG) collection.

You see, in my original design, each thread had a separate YG allocation space for young objects, and when that space got full, that one thread only would be paused and a collection cycle would be run on that one space. Of course, there might be a few pointers from inside the space pointing to objects outside the YG space, and vice versa, but I figured that those could be handled as special cases - perhaps by keeping a separate list of such references. However, I never came up with a way to do this efficiently.

However, in the HotSpot JVM, there is only a single young generation which is divided into TLABs. Each thread has an assigned TLAB, which is a linear region of memory in which it can use to do efficient, lockless "bump-pointer" allocation. When a TLAB runs out of space, the thread gets a new TLAB from the available pool in the young generation (this action requires a lock). Only when we run out of TLABs do we need to actually perform a YG collection. At this point all threads have to be stopped.

According to the whitepaper, YG collections are fast enough that pause times created by stopping all threads and doing the collection aren't an issue - it's only the OG collection that creates problematic pauses. And because all threads are stopped at once, the collector is free to move objects around as needed, which completely solves my problem.

I already came up with a fairly efficient and portable mechanism for stopping threads as part of Scarcity. It basically involves inserting sync points into the generated code, where each sync point tests a flag that says whether or not we are ready to do a collection. If the flag is true, then we use the conventional synchronization primitives to stop the thread.

One issue with sync points is whether the testing of the flag is going to incur an unacceptable overhead, especially if it has to be synchronized between processors. My take on this is that there are two kinds of sync points: "mandatory" (as in "you need to stop right now") sync points, which occur before heavyweight operations such as memory allocation or i/o, and "advisory" or "probabalistic" sync points (as in "we'd like you to stop sometime soon") sync points which occur in inner loops. In other words for the inner loop case, it's ok if testing the flag is non-deterministic as long as its also low-overhead. (Well, it has to be guaranteed that *eventually* the flag will be read. I don't know if any of the processors with weak memory semantics provide that as an option.)

So for now the solution is to use double-checked locking, sort of. I realize that's a hazardous course, but it's the only technique I know of that won't cause unacceptable overhead.

No comments:

Post a Comment