More Garbage Collection Questions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

More Garbage Collection Questions

Talin-3
I'm still (slowly) working on the project of creating a concurrent
garbage collector that works with LLVM. I want to ask a little bit more
about object tags and write barriers and so on.

Let's start with the assumption that a particular language does not use
per-object type tags. The code generator knows the types of all objects,
however, and can pass along that type information to llvm.gcroot. And
since the type of each pointer field is also known, the type information
can be determined for each traced object, as long as there is no
polymorphism.

However, what I don't understand is how this works in the presence of
write barriers. The llvm_gc_write intrinsic takes no type information,
so there's no way to determine the type of the object except with a tag.
So I'm not sure what exactly the write barrier can accomplish.

Note that in a real language, most objects would have type tags, but
there would be a significant number of objects that did not. For
example, a 'string' class might have a fixed-length header with a type
tag, which in turn points to a flat array of characters with no tag. The
character array would not be visible to the outside world, but it would
still need to be visible to the collector.

-- Talin
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: More Garbage Collection Questions

Gordon Henriksen-3
On 2007-09-15, at 18:01, Talin wrote:

> I'm still (slowly) working on the project of creating a concurrent  
> garbage collector that works with LLVM. I want to ask a little bit  
> more about object tags and write barriers and so on.
>
> Let's start with the assumption that a particular language does not  
> use per-object type tags. The code generator knows the types of all  
> objects, however, and can pass along that type information to  
> llvm.gcroot. And since the type of each pointer field is also  
> known, the type information can be determined for each traced  
> object, as long as there is no polymorphism.
>
> However, what I don't understand is how this works in the presence  
> of write barriers. The llvm_gc_write intrinsic takes no type  
> information, so there's no way to determine the type of the object  
> except with a tag. So I'm not sure what exactly the write barrier  
> can accomplish.
>
> Note that in a real language, most objects would have type tags,  
> but there would be a significant number of objects that did not.  
> For example, a 'string' class might have a fixed-length header with  
> a type tag, which in turn points to a flat array of characters with  
> no tag. The character array would not be visible to the outside  
> world, but it would still need to be visible to the collector.

Talin,

Can you be more specific the algorithm for which you need type  
metadata in a write barrier? No algorithms I am aware of perform any  
tracing from a write barrier.

Write barriers are commonly used to record references from old-
generation objects to new-generation ones, either by recording the  
referencing object, the referencing field, or using a card table. For  
these purposes, the addresses are sufficient.

In concurrent collectors—by which I mean those that run  
asynchronously of the mutator—write barriers may be used to mark the  
written object pointer. Even byte arrays as you describe require an  
object header in which to store mark bits, else the collector cannot  
know whether the object has been marked (mark-sweep collectors) or  
copied (copying collectors).

— Gordon


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: More Garbage Collection Questions

Talin-3
Gordon Henriksen wrote:
> Can you be more specific the algorithm for which you need type  
> metadata in a write barrier? No algorithms I am aware of perform any  
> tracing from a write barrier.
>  
This one does:

http://citeseer.ist.psu.edu/cache/papers/cs2/442/http:zSzzSzwww.cs.technion.ac.ilzSz~erezzSzPaperszSzms-sliding-views.pdf/an-on-the-fly.pdf
> Write barriers are commonly used to record references from old-
> generation objects to new-generation ones, either by recording the  
> referencing object, the referencing field, or using a card table. For  
> these purposes, the addresses are sufficient.
>  
In the paper cited above, the write barrier checks to see if the object
being mutated has been traced; If it hasn't, then it records the values
of all pointers contained in the object into a buffer - for which you
need the location of the pointers.
> In concurrent collectors—by which I mean those that run  
> asynchronously of the mutator—write barriers may be used to mark the  
> written object pointer. Even byte arrays as you describe require an  
> object header in which to store mark bits, else the collector cannot  
> know whether the object has been marked (mark-sweep collectors) or  
> copied (copying collectors).
>  
In my design, the mark bits are maintained by the allocator as part of
the allocation header of each memory block.

In other words, I'm trying to maintain a strict separation between the
allocator/collector and the actual objects stored in the heap. The
allocator/collector "owns" all of the bits that are before the start
address of the object, while the language-specific type system owns all
of the bits after the start address. As a result the collector isn't
tied to a specific language.

-- Talin

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: More Garbage Collection Questions

Gordon Henriksen-3
On 2007-09-15, at 23:55, Talin wrote:

Gordon Henriksen wrote:

Can you be more specific the algorithm for which you need type metadata in a write barrier? No algorithms I am aware of perform any tracing from a write barrier.

This one does:


Write barriers are commonly used to record references from old-generation objects to new-generation ones, either by recording the referencing object, the referencing field, or using a card table. For these purposes, the addresses are sufficient.

In the paper cited above, the write barrier checks to see if the object being mutated has been traced; If it hasn't, then it records the values of all pointers contained in the object into a buffer - for which you need the location of the pointers.

Okay. If you're implementing this algorithm with a tagless collector, we may need to extend the intrinsics.

In concurrent collectors—by which I mean those that run asynchronously of the mutator—write barriers may be used to mark the written object pointer. Even byte arrays as you describe require an object header in which to store mark bits, else the collector cannot know whether the object has been marked (mark-sweep collectors) or copied (copying collectors).

In my design, the mark bits are maintained by the allocator as part of the allocation header of each memory block.

In other words, I'm trying to maintain a strict separation between the allocator/collector and the actual objects stored in the heap. The allocator/collector "owns" all of the bits that are before the start address of the object, while the language-specific type system owns all of the bits after the start address. As a result the collector isn't tied to a specific language.

Sure. So by design, type metadata is never necessary to access the mark bits.

— Gordon


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: More Garbage Collection Questions

Talin-3
Gordon Henriksen wrote:

> On 2007-09-15, at 23:55, Talin wrote:
>
>> Gordon Henriksen wrote:
>>
>>> Can you be more specific the algorithm for which you need type
>>> metadata in a write barrier? No algorithms I am aware of perform any
>>> tracing from a write barrier.
>>
>> This one does:
>>
>> http://citeseer.ist.psu.edu/cache/papers/cs2/442/http:zSzzSzwww.cs.technion.ac.ilzSz~erezzSzPaperszSzms-sliding-views.pdf/an-on-the-fly.pdf 
>>
>>
>>> Write barriers are commonly used to record references from
>>> old-generation objects to new-generation ones, either by recording
>>> the referencing object, the referencing field, or using a card
>>> table. For these purposes, the addresses are sufficient.
>>
>> In the paper cited above, the write barrier checks to see if the
>> object being mutated has been traced; If it hasn't, then it records
>> the values of all pointers contained in the object into a buffer -
>> for which you need the location of the pointers.
>
> Okay. If you're implementing this algorithm with a tagless collector,
> we may need to extend the intrinsics.
More accurately, what I am trying to do is not only create a collector,
but also create a framework for implementing different collector
algorithms. I don't intend to be the person that creates the
best/fastest collector, I just want to create something that people can
build on.

There are a lot of algorithms out there that require similar underlying
data structures - an efficient heap, lock-free work queues, and so on. I
figure if I can provide those along with a working example, it may help
to stimulate further development.

I chose this particular algorithm to start with because it seemed fairly
easy to implement.

Eventually I want to do a multi-generation design - most of the research
papers out there seem to settle on a per-thread copying collector for
young generations, and a heap-based mark and sweep for older
generations. However,  tracking references to young generation objects
belonging to a different thread is quite involved, and I wanted to start
off with something less challenging.

One thing I am missing is a way to test this in an LLVM context. Right
now all my tests are just unit tests. I'd like to do a full integration
test - to be able to actually write LLVM programs that work with my heap
- but I don't want to have to write a whole language compiler to do it.

-- Talin

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: More Garbage Collection Questions

Gordon Henriksen-3
Hi Talin,

On 2007-09-16, at 13:28, Talin wrote:

More accurately, what I am trying to do is not only create a collector, but also create a framework for implementing different collector algorithms. I don't intend to be the person that creates the best/fastest collector, I just want to create something that people can build on.

You might want to read this, which documents the work I've recently done[*] to LLVM's compile-time GC support (pending review):


However, I am explicitly not authoring a collector runtime, so our work may not be overlapping at all. My immediate goal is to support code generation for an existing collector.

There are a lot of algorithms out there that require similar underlying data structures - an efficient heap, lock-free work queues, and so on. I figure if I can provide those along with a working example, it may help to stimulate further development.

Sounds good.

I chose this particular algorithm to start with because it seemed fairly easy to implement.

This section might be interesting to you:


Threaded programs and concurrent collectors are the most challenging for the compiler, and LLVM does not (even with my patches) provide sufficient compile-time support for them, so they may not be the best starting point.

Two of the more tricky aspects which pose a challenge to getting started with a GC algorithm are the stack walker and "stop-the-world" algorithms. Libraries to support those would be very cool.

Eventually I want to do a multi-generation design - most of the research papers out there seem to settle on a per-thread copying collector for young generations, and a heap-based mark and sweep for older generations. However,  tracking references to young generation objects belonging to a different thread is quite involved, and I wanted to start off with something less challenging.

One thing I am missing is a way to test this in an LLVM context. Right now all my tests are just unit tests. I'd like to do a full integration test - to be able to actually write LLVM programs that work with my heap - but I don't want to have to write a whole language compiler to do it.

I'm not sure there's another way to usefully create large codes for test.

— Gordon


[*] If you want the code, a recent copy of the patch set is here: http://homepage.mac.com/malichus/gc.3.zip




— Gordon


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: More Garbage Collection Questions

Chris Lattner
In reply to this post by Talin-3

I'm way behind, but slowly catching up on email,

On Sun, 16 Sep 2007, Talin wrote:
>> Okay. If you're implementing this algorithm with a tagless collector,
>> we may need to extend the intrinsics.

> More accurately, what I am trying to do is not only create a collector,
> but also create a framework for implementing different collector
> algorithms. I don't intend to be the person that creates the
> best/fastest collector, I just want to create something that people can
> build on.
>
> There are a lot of algorithms out there that require similar underlying
> data structures - an efficient heap, lock-free work queues, and so on. I
> figure if I can provide those along with a working example, it may help
> to stimulate further development.

This is very exciting stuff, thank you for working on this!  I strongly
suggest taking a look at the work Gordon has done recently, with luck I
will get time to review it and merge it in this week.

-Chris

--
http://nondot.org/sabre/
http://llvm.org/
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev