Interfacing llvm with a precise, relocating GC

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Interfacing llvm with a precise, relocating GC

Sanjoy Das-2
Hello llvm-dev!

My colleages and I are currently evaluating llvm's suitability as a
JIT compiler interfacing with a precise, relocating garbage collector.
While we couldn't find code or writeups that deal with the issues
specific to this design goal, it is entirely possible that we may have
missed something; we would appreciate references to relevant code or
writeups that people on this list may be aware of.

As an example, one issue that makes this non-trivial is that llvm (as
far as we know) is free to manufacture pointers to locations _inside_
objects, something referred to as a "derived pointer" in some places.
Since these pointers need to be updated in sync with the objects they
point into, a relocating GC needs to be aware of them; and the runtime
needs to be able to read off which registers and stack slots hold such
pointers at every safepoint.  We've looked into llvm's existing GC
support, and the mechanism it provides does not seem to help in this
use case.

This email is deliberately terse, but we are more than happy to get
into details about the approaches we've considered as the discussion
progresses.  Pointers to existing work related to this or similar
issues is especially welcome.

Thanks!

-- Sanjoy

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Interfacing llvm with a precise, relocating GC

Rafael Espíndola
On 24 October 2013 17:32, Sanjoy Das <[hidden email]> wrote:
> Hello llvm-dev!
>
> My colleages and I are currently evaluating llvm's suitability as a
> JIT compiler interfacing with a precise, relocating garbage collector.
> While we couldn't find code or writeups that deal with the issues
> specific to this design goal, it is entirely possible that we may have
> missed something; we would appreciate references to relevant code or
> writeups that people on this list may be aware of.


This would be hard. Currently what we have support for is a non-moving
GC where all the roots are in memory. Adding support for a non-moving
gc with register roots would not be too hard and might be possible to
reuse some of the recent stackmap work.

For a moving GC you would probably have to change how we represent
pointer arithmetic in the selection dag and MI. It would be quiet a
big change. CCIng Andy and Patrick since they might have an idea of
how much work that would be and what the costs and benefits for LLVM
are.

Also to note is that there are plans to move away from selection dag,
so it might be good to sync this work with whatever we end up using
instead.

Cheers,
Rafael
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Interfacing llvm with a precise, relocating GC

Andrew Trick

On Oct 24, 2013, at 2:50 PM, Rafael Espíndola <[hidden email]> wrote:

> On 24 October 2013 17:32, Sanjoy Das <[hidden email]> wrote:
>> Hello llvm-dev!
>>
>> My colleages and I are currently evaluating llvm's suitability as a
>> JIT compiler interfacing with a precise, relocating garbage collector.
>> While we couldn't find code or writeups that deal with the issues
>> specific to this design goal, it is entirely possible that we may have
>> missed something; we would appreciate references to relevant code or
>> writeups that people on this list may be aware of.
>
>
> This would be hard. Currently what we have support for is a non-moving
> GC where all the roots are in memory. Adding support for a non-moving
> gc with register roots would not be too hard and might be possible to
> reuse some of the recent stackmap work.
>
> For a moving GC you would probably have to change how we represent
> pointer arithmetic in the selection dag and MI. It would be quiet a
> big change. CCIng Andy and Patrick since they might have an idea of
> how much work that would be and what the costs and benefits for LLVM
> are.

100% agreement.

> Also to note is that there are plans to move away from selection dag,
> so it might be good to sync this work with whatever we end up using
> instead.

FYI: when this was talked about, I heard mention that GEPs should be lowered early in the IR->MI pipeline. I didn’t hear any ideas that would make derived pointer tracking easier.

-Andy
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Interfacing llvm with a precise, relocating GC

Sanjoy Das-2
Hi Rafael, Andrew,

Thank you for the prompt reply.

One approach we've been considering involves representing the
constraint "pointers to heap objects are invalidated at every
safepoint" somehow in the IR itself.  So, if %a and %b are values the
GC is interested in, the safepoint at the back-edge of a loop might
look like:

  ; <label>: body
    %a = phi i32 [ %a.relocated, %body ] [ %a.initial_value, %pred ]
    %b = phi i32 [ %b.relocated, %body ] [ %b.initial_value, %pred ]
    ;; Use %a and %b

    ;; The safepoint starts here
    %a.relocated = @llvm.gc_relocate(%a)
    %b.relocated = @llvm.gc_relocate(%b)
    br %body

This allows us to not bother with relocating derived pointers pointing
inside %a and %b, since it is semantically incorrect for llvm to reuse
them in the next iteration of the loop.  We lower gc_relocate to a
pseudo opcode which lowered into nothing after register allocation.

The problem is, of course, the performance penalty.  Does it make
sense to get the register allocator "see" the gc_relocate instruction
as a copy so that they get the same register / slot?  Will that
violate the intended semantics of gc_relocate (for example, after
assigning the same register / slot to %a and %a.relocated, are there
passes that will try to cache derived pointers across loop
iterations)?

Thanks,
-- Sanjoy


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Interfacing llvm with a precise, relocating GC

Philip Reames-4
In reply to this post by Rafael Espíndola
On 10/24/13 2:50 PM, Rafael Espíndola wrote:

> On 24 October 2013 17:32, Sanjoy Das <[hidden email]> wrote:
>> Hello llvm-dev!
>>
>> My colleages and I are currently evaluating llvm's suitability as a
>> JIT compiler interfacing with a precise, relocating garbage collector.
>> While we couldn't find code or writeups that deal with the issues
>> specific to this design goal, it is entirely possible that we may have
>> missed something; we would appreciate references to relevant code or
>> writeups that people on this list may be aware of.
>
> This would be hard. Currently what we have support for is a non-moving
> GC where all the roots are in memory. Adding support for a non-moving
> gc with register roots would not be too hard and might be possible to
> reuse some of the recent stackmap work.
Agreed,  I think all the mechanisms are either in tree already, or
shortly to be so.  There's still a fair amount of work required to make
it all work together, but the task seems approachable.
> For a moving GC you would probably have to change how we represent
> pointer arithmetic in the selection dag and MI. It would be quiet a
> big change. CCIng Andy and Patrick since they might have an idea of
> how much work that would be and what the costs and benefits for LLVM
> are.
>
> Also to note is that there are plans to move away from selection dag,
> so it might be good to sync this work with whatever we end up using
> instead.
Ouch.  I hadn't realized that GEPs were desugared to integer arithmetic
that early. That does seem like it would be a problem. Thank you for
pointing this out.

Assuming we had a scheme to avoid/solve this specific issue, are you
aware of any other ones?

Can you point me to any previous discussion of the selection dag
migration?  This isn't a topic I usually follow on the list.  A quick
search didn't find the conversations you're referencing.

Philip

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Interfacing llvm with a precise, relocating GC

Philip Reames-4
In reply to this post by Sanjoy Das-2
On 10/24/13 3:42 PM, Sanjoy Das wrote:
> Hi Rafael, Andrew,
>
> Thank you for the prompt reply.
>
> One approach we've been considering involves representing the
> constraint "pointers to heap objects are invalidated at every
> safepoint" somehow in the IR itself.
To say this differently, every heap pointer either needs to be remapped
in place or invalidated and reloaded on next use.  In principal, you
could use a "trap on access to unmapped page" style read barrier which
wouldn't require this invariant, but that scheme is undesirable for
other reasons.  (i.e. performance, unbounded pinning, etc..)

> So, if %a and %b are values the
> GC is interested in, the safepoint at the back-edge of a loop might
> look like:
>
>    ; <label>: body
>      %a = phi i32 [ %a.relocated, %body ] [ %a.initial_value, %pred ]
>      %b = phi i32 [ %b.relocated, %body ] [ %b.initial_value, %pred ]
>      ;; Use %a and %b
>
>      ;; The safepoint starts here
>      %a.relocated = @llvm.gc_relocate(%a)
>      %b.relocated = @llvm.gc_relocate(%b)
>      br %body
>
> This allows us to not bother with relocating derived pointers pointing
> inside %a and %b, since it is semantically incorrect for llvm to reuse
> them in the next iteration of the loop.  We lower gc_relocate to a
> pseudo opcode which lowered into nothing after register allocation.
>
> The problem is, of course, the performance penalty.  Does it make
> sense to get the register allocator "see" the gc_relocate instruction
> as a copy so that they get the same register / slot?  Will that
> violate the intended semantics of gc_relocate (for example, after
> assigning the same register / slot to %a and %a.relocated, are there
> passes that will try to cache derived pointers across loop
> iterations)?
Not a direct response, but building on the idea...

It seems like there might be an entire spectrum of interesting designs
here.  An initial correct, but slow implementation might introduce
explicit redefinitions during the initial construction of the IR.  
Alternatively, a separate pass could add these explicit relocations
before a problematic point in the pipeline if the set of "interesting
pointers" could be reliably identified.  This would potentially allow
incremental performance improvement over time with incremental effort.  
I haven't spent the time to establish whether identifying the set of
"interesting pointers" could be done reliably yet, but I suspect it
probably could.

Philip


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Interfacing llvm with a precise, relocating GC

Ben Karel
In reply to this post by Sanjoy Das-2



On Thu, Oct 24, 2013 at 6:42 PM, Sanjoy Das <[hidden email]> wrote:
Hi Rafael, Andrew,

Thank you for the prompt reply.

One approach we've been considering involves representing the
constraint "pointers to heap objects are invalidated at every
safepoint" somehow in the IR itself.  So, if %a and %b are values the
GC is interested in, the safepoint at the back-edge of a loop might
look like:

  ; <label>: body
    %a = phi i32 [ %a.relocated, %body ] [ %a.initial_value, %pred ]
    %b = phi i32 [ %b.relocated, %body ] [ %b.initial_value, %pred ]
    ;; Use %a and %b

    ;; The safepoint starts here
    %a.relocated = @llvm.gc_relocate(%a)
    %b.relocated = @llvm.gc_relocate(%b)
    br %body

This allows us to not bother with relocating derived pointers pointing
inside %a and %b, since it is semantically incorrect for llvm to reuse
them in the next iteration of the loop.

This is the right general idea, but you can already express this constraint in LLVM as it exists today, by using llvm.gcroot().  As you noted, this also solves the interior-pointer problem by making it the front end's job to convey to LLVM when it would/would not be safe to cache interior pointers across loop iterations. The invariant that a front-end must maintain is that any pointer which is live across a given potential-GC-point must be reloaded from its root after a (relocating) GC might have occurred. This falls naturally out of the viewpoint that %a is not an object pointer, it's the name of an object pointer's value at a given point in time. So, of course, whenever that object pointer's value might change, there must be a new name.

The fact that the mutable memory associated with a gcroot() is allocated on the stack (rather than, say, a machine register) is an implementation detail; fixing it doesn't require altering the (conceptual) interface for LLVM's existing GC support, AFAICT.
 
We lower gc_relocate to a
pseudo opcode which lowered into nothing after register allocation.

The problem is, of course, the performance penalty.  Does it make
sense to get the register allocator "see" the gc_relocate instruction
as a copy so that they get the same register / slot?  Will that
violate the intended semantics of gc_relocate (for example, after
assigning the same register / slot to %a and %a.relocated, are there
passes that will try to cache derived pointers across loop
iterations)?

Thanks,
-- Sanjoy


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Interfacing llvm with a precise, relocating GC

Rafael Espíndola
In reply to this post by Philip Reames-4
> Can you point me to any previous discussion of the selection dag migration?
> This isn't a topic I usually follow on the list.  A quick search didn't find
> the conversations you're referencing.

I think this was the last one:

http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-August/064727.html

Cheers,
Rafael
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Interfacing llvm with a precise, relocating GC

Philip Reames-4
In reply to this post by Ben Karel
On 10/25/13 1:10 PM, Ben Karel wrote:



On Thu, Oct 24, 2013 at 6:42 PM, Sanjoy Das <[hidden email]> wrote:
Hi Rafael, Andrew,

Thank you for the prompt reply.

One approach we've been considering involves representing the
constraint "pointers to heap objects are invalidated at every
safepoint" somehow in the IR itself.  So, if %a and %b are values the
GC is interested in, the safepoint at the back-edge of a loop might
look like:

  ; <label>: body
    %a = phi i32 [ %a.relocated, %body ] [ %a.initial_value, %pred ]
    %b = phi i32 [ %b.relocated, %body ] [ %b.initial_value, %pred ]
    ;; Use %a and %b

    ;; The safepoint starts here
    %a.relocated = @llvm.gc_relocate(%a)
    %b.relocated = @llvm.gc_relocate(%b)
    br %body

This allows us to not bother with relocating derived pointers pointing
inside %a and %b, since it is semantically incorrect for llvm to reuse
them in the next iteration of the loop.

This is the right general idea, but you can already express this constraint in LLVM as it exists today, by using llvm.gcroot().  As you noted, this also solves the interior-pointer problem by making it the front end's job to convey to LLVM when it would/would not be safe to cache interior pointers across loop iterations. The invariant that a front-end must maintain is that any pointer which is live across a given potential-GC-point must be reloaded from its root after a (relocating) GC might have occurred. This falls naturally out of the viewpoint that %a is not an object pointer, it's the name of an object pointer's value at a given point in time. So, of course, whenever that object pointer's value might change, there must be a new name.
To rephrase, you're saying that the current gcroot mechanism is analogous to a boxed object pointer.  We could model a safepoint by storing the unboxed value into the box, applying an opaque operation to the box, and reloading the raw object pointer from the box.  (The opaque operator is key to prevent the store/load pair from being removed.  It also implements the actual safepoint operation.)  Is this a correct restatement?

Here's an example:
gcroot a = ...; b = ...;
object* ax = *a, bx = *b;
repeat 500000 iterations {
  spin for 50 instructions
  *a = ax;
  *b = bx;
  safepoint(a, b)
  ax = *a;
  bx = *b;
}

If so, I would agree with you that this is a correct encoding of object relocation.  I think what got me confused was using the in-tree examples to reverse engineer the semantics.  Both the Erlang and Ocaml examples insert safepoints after all the LLVM IR passes are run and derived pointers were inserted.  This was the part that got me thinking that the gcroot semantics were unsuitable for a precise relocating collector.  If you completely ignored these implementations and inserted the full box/safepoint call/unbox implementation at each safepoint before invoking any optimizations, I think this would be correct. 

One problem with this encoding is that there does not currently exist an obvious mechanism to describe a safepoint in the IR itself.  (i.e. there is no llvm.gcsafepoint)  You could model this with a "well known" function which a custom GCStrategy recorded a stackmap on every call.  It would be good to extend the existing intrinsics with such a mechanism. 

At least if I'm reading things right, the current *implementation* is not a correct implementation of the intended semantics.  In particular, the safepoint operation which provides the opaque manipulation of the boxed gcroots is not preserved into the SelectionDAG.  As a result, optimizations on the SelectionDAG could eliminate the now adjacent box/unbox pairs.  This would break a relocating collector. 

Leaving this aside, there are also a number of performance problems with the current implementation.  I'll summarize them here for now, but will expand into approaches for resolving each if this looks like the most likely implementation strategy. 
1) Explicitly adding box/safepoint/unbox prevents creation of derived pointers.  This prevents optimizations such as loop-blocking.  Ideally, we'd rather let the derived pointers be created and capture them for updating at the safepoint. 
2) The redundant loads and stores required for box/unbox.  (These might be partially eliminated by other optimization passes.)
3) The inability to allocate GC roots to registers. 
4) Frame sizes are implicitly doubled since both an unboxed and boxed representation of every object pointer is required. 

Frankly, I think that (3) is likely solvable, but that (1) and (4) could be fatal for a high performance implementation.  I don't have hard numbers on that; I'm simply stating my intuition. 


The fact that the mutable memory associated with a gcroot() is allocated on the stack (rather than, say, a machine register) is an implementation detail; fixing it doesn't require altering the (conceptual) interface for LLVM's existing GC support, AFAICT.
While in principal, I agree with you here, in practice the difference is actually quite significant.  I am not convinced that the implementation would be straight forward.  Let's set this aside for the moment and focus on the more fundamental questions. 
 
We lower gc_relocate to a
pseudo opcode which lowered into nothing after register allocation.

The problem is, of course, the performance penalty.  Does it make
sense to get the register allocator "see" the gc_relocate instruction
as a copy so that they get the same register / slot?  Will that
violate the intended semantics of gc_relocate (for example, after
assigning the same register / slot to %a and %a.relocated, are there
passes that will try to cache derived pointers across loop
iterations)?

Thanks,
-- Sanjoy


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Interfacing llvm with a precise, relocating GC

Michael Lewis
I'm also highly interested in relocating-GC support from LLVM. Up until now my GC implementation has been non-relocating which is obviously kind of a bummer given that it inhibits certain classes of memory allocation/deallocation tricks.


I wrote up a bunch of my findings on the implementation of my GC here: https://code.google.com/p/epoch-language/wiki/GarbageCollectionScheme


Frankly I haven't had time to get much further than idle pondering of how I'd go about implementing relocation, but it seems to me like the existing read/write barrier intrinsics might be sufficient if abused properly by the front-end and lowered carefully. My operating hypothesis has been to stash barriers at key points - identified by the front-end itself - and possibly elide them during lowering if it's safe to do so. If my understanding is correct, it should be possible to lower the barriers into exactly the kind of boxing/unboxing procedure described in this thread.

Based on my experimentation so far, which is admittedly minimal, I think the overhead of updating relocated pointers is actually not as bad as it sounds. It isn't strictly necessary to store both a boxed *and* unboxed pointer in every frame. In fact, in the current incarnation of gcroot, marking an alloca as a gcroot effectively forces a stack spill for that alloca; this means that the relocating collector just updates the single existing pointer on the stack when it does its magic, and you're done. With proper barriers in place, and/or careful location of safepoints by the front-end, it's not that hard to make sure that a relocated pointer gets updated.

The real trick, as near as I can tell, would be updating registers that have live roots during a collection. But again based on my past investigations this should just be a matter of ensuring that post-collection you have a barrier that is lowered into a register refresh based on the on-stack relocated pointer value.


One thing I've been meaning to try and do is use this barrier-abuse scheme to work around the existing lack of support for tracing roots in machine registers, by effectively setting up an artificial stack spill just prior to a collection, and otherwise operating on registers as much as possible instead of gcroot'ed allocas. Again, I've only considered this as a thought exercise until now, so I apologize if there's some obvious flaw I'm not aware of in my reasoning. I haven't actually done any of this stuff yet so it's more speculative than anything - just trying to creatively engineer around the existing limitations :-)



 - Mike




On Fri, Oct 25, 2013 at 5:35 PM, Philip Reames <[hidden email]> wrote:
On 10/25/13 1:10 PM, Ben Karel wrote:



On Thu, Oct 24, 2013 at 6:42 PM, Sanjoy Das <[hidden email]> wrote:
Hi Rafael, Andrew,

Thank you for the prompt reply.

One approach we've been considering involves representing the
constraint "pointers to heap objects are invalidated at every
safepoint" somehow in the IR itself.  So, if %a and %b are values the
GC is interested in, the safepoint at the back-edge of a loop might
look like:

  ; <label>: body
    %a = phi i32 [ %a.relocated, %body ] [ %a.initial_value, %pred ]
    %b = phi i32 [ %b.relocated, %body ] [ %b.initial_value, %pred ]
    ;; Use %a and %b

    ;; The safepoint starts here
    %a.relocated = @llvm.gc_relocate(%a)
    %b.relocated = @llvm.gc_relocate(%b)
    br %body

This allows us to not bother with relocating derived pointers pointing
inside %a and %b, since it is semantically incorrect for llvm to reuse
them in the next iteration of the loop.

This is the right general idea, but you can already express this constraint in LLVM as it exists today, by using llvm.gcroot().  As you noted, this also solves the interior-pointer problem by making it the front end's job to convey to LLVM when it would/would not be safe to cache interior pointers across loop iterations. The invariant that a front-end must maintain is that any pointer which is live across a given potential-GC-point must be reloaded from its root after a (relocating) GC might have occurred. This falls naturally out of the viewpoint that %a is not an object pointer, it's the name of an object pointer's value at a given point in time. So, of course, whenever that object pointer's value might change, there must be a new name.
To rephrase, you're saying that the current gcroot mechanism is analogous to a boxed object pointer.  We could model a safepoint by storing the unboxed value into the box, applying an opaque operation to the box, and reloading the raw object pointer from the box.  (The opaque operator is key to prevent the store/load pair from being removed.  It also implements the actual safepoint operation.)  Is this a correct restatement?

Here's an example:
gcroot a = ...; b = ...;
object* ax = *a, bx = *b;
repeat 500000 iterations {
  spin for 50 instructions
  *a = ax;
  *b = bx;
  safepoint(a, b)
  ax = *a;
  bx = *b;
}

If so, I would agree with you that this is a correct encoding of object relocation.  I think what got me confused was using the in-tree examples to reverse engineer the semantics.  Both the Erlang and Ocaml examples insert safepoints after all the LLVM IR passes are run and derived pointers were inserted.  This was the part that got me thinking that the gcroot semantics were unsuitable for a precise relocating collector.  If you completely ignored these implementations and inserted the full box/safepoint call/unbox implementation at each safepoint before invoking any optimizations, I think this would be correct. 

One problem with this encoding is that there does not currently exist an obvious mechanism to describe a safepoint in the IR itself.  (i.e. there is no llvm.gcsafepoint)  You could model this with a "well known" function which a custom GCStrategy recorded a stackmap on every call.  It would be good to extend the existing intrinsics with such a mechanism. 

At least if I'm reading things right, the current *implementation* is not a correct implementation of the intended semantics.  In particular, the safepoint operation which provides the opaque manipulation of the boxed gcroots is not preserved into the SelectionDAG.  As a result, optimizations on the SelectionDAG could eliminate the now adjacent box/unbox pairs.  This would break a relocating collector. 

Leaving this aside, there are also a number of performance problems with the current implementation.  I'll summarize them here for now, but will expand into approaches for resolving each if this looks like the most likely implementation strategy. 
1) Explicitly adding box/safepoint/unbox prevents creation of derived pointers.  This prevents optimizations such as loop-blocking.  Ideally, we'd rather let the derived pointers be created and capture them for updating at the safepoint. 
2) The redundant loads and stores required for box/unbox.  (These might be partially eliminated by other optimization passes.)
3) The inability to allocate GC roots to registers. 
4) Frame sizes are implicitly doubled since both an unboxed and boxed representation of every object pointer is required. 

Frankly, I think that (3) is likely solvable, but that (1) and (4) could be fatal for a high performance implementation.  I don't have hard numbers on that; I'm simply stating my intuition. 


The fact that the mutable memory associated with a gcroot() is allocated on the stack (rather than, say, a machine register) is an implementation detail; fixing it doesn't require altering the (conceptual) interface for LLVM's existing GC support, AFAICT.
While in principal, I agree with you here, in practice the difference is actually quite significant.  I am not convinced that the implementation would be straight forward.  Let's set this aside for the moment and focus on the more fundamental questions. 
 
We lower gc_relocate to a
pseudo opcode which lowered into nothing after register allocation.

The problem is, of course, the performance penalty.  Does it make
sense to get the register allocator "see" the gc_relocate instruction
as a copy so that they get the same register / slot?  Will that
violate the intended semantics of gc_relocate (for example, after
assigning the same register / slot to %a and %a.relocated, are there
passes that will try to cache derived pointers across loop
iterations)?

Thanks,
-- Sanjoy


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Interfacing llvm with a precise, relocating GC

Filip Pizlo


On Oct 26, 2013, at 12:37 AM, Michael Lewis <[hidden email]> wrote:

I'm also highly interested in relocating-GC support from LLVM. Up until now my GC implementation has been non-relocating which is obviously kind of a bummer given that it inhibits certain classes of memory allocation/deallocation tricks.

You can implement a copying GC (what the kids these days call relocating) without accurate roots. 



I wrote up a bunch of my findings on the implementation of my GC here: https://code.google.com/p/epoch-language/wiki/GarbageCollectionScheme


Why aren't you just using the well-known Bartlett style of GC, which allows for relocation even in the presence of conservative roots (or accurate roots that don't allow relocation)?


Frankly I haven't had time to get much further than idle pondering of how I'd go about implementing relocation, but it seems to me like the existing read/write barrier intrinsics might be sufficient if abused properly by the front-end and lowered carefully. My operating hypothesis has been to stash barriers at key points - identified by the front-end itself - and possibly elide them during lowering if it's safe to do so. If my understanding is correct, it should be possible to lower the barriers into exactly the kind of boxing/unboxing procedure described in this thread.

Based on my experimentation so far, which is admittedly minimal, I think the overhead of updating relocated pointers is actually not as bad as it sounds. It isn't strictly necessary to store both a boxed *and* unboxed pointer in every frame. In fact, in the current incarnation of gcroot, marking an alloca as a gcroot effectively forces a stack spill for that alloca; this means that the relocating collector just updates the single existing pointer on the stack when it does its magic, and you're done. With proper barriers in place, and/or careful location of safepoints by the front-end, it's not that hard to make sure that a relocated pointer gets updated.

Bartlett collectors will give you relocation with zero overhead. You won't even need any help from LLVM to do it. 


The real trick, as near as I can tell, would be updating registers that have live roots during a collection. But again based on my past investigations this should just be a matter of ensuring that post-collection you have a barrier that is lowered into a register refresh based on the on-stack relocated pointer value.


One thing I've been meaning to try and do is use this barrier-abuse scheme to work around the existing lack of support for tracing roots in machine registers, by effectively setting up an artificial stack spill just prior to a collection, and otherwise operating on registers as much as possible instead of gcroot'ed allocas. Again, I've only considered this as a thought exercise until now, so I apologize if there's some obvious flaw I'm not aware of in my reasoning. I haven't actually done any of this stuff yet so it's more speculative than anything - just trying to creatively engineer around the existing limitations :-)



 - Mike




On Fri, Oct 25, 2013 at 5:35 PM, Philip Reames <[hidden email]> wrote:
On 10/25/13 1:10 PM, Ben Karel wrote:



On Thu, Oct 24, 2013 at 6:42 PM, Sanjoy Das <[hidden email]> wrote:
Hi Rafael, Andrew,

Thank you for the prompt reply.

One approach we've been considering involves representing the
constraint "pointers to heap objects are invalidated at every
safepoint" somehow in the IR itself.  So, if %a and %b are values the
GC is interested in, the safepoint at the back-edge of a loop might
look like:

  ; <label>: body
    %a = phi i32 [ %a.relocated, %body ] [ %a.initial_value, %pred ]
    %b = phi i32 [ %b.relocated, %body ] [ %b.initial_value, %pred ]
    ;; Use %a and %b

    ;; The safepoint starts here
    %a.relocated = @llvm.gc_relocate(%a)
    %b.relocated = @llvm.gc_relocate(%b)
    br %body

This allows us to not bother with relocating derived pointers pointing
inside %a and %b, since it is semantically incorrect for llvm to reuse
them in the next iteration of the loop.

This is the right general idea, but you can already express this constraint in LLVM as it exists today, by using llvm.gcroot().  As you noted, this also solves the interior-pointer problem by making it the front end's job to convey to LLVM when it would/would not be safe to cache interior pointers across loop iterations. The invariant that a front-end must maintain is that any pointer which is live across a given potential-GC-point must be reloaded from its root after a (relocating) GC might have occurred. This falls naturally out of the viewpoint that %a is not an object pointer, it's the name of an object pointer's value at a given point in time. So, of course, whenever that object pointer's value might change, there must be a new name.
To rephrase, you're saying that the current gcroot mechanism is analogous to a boxed object pointer.  We could model a safepoint by storing the unboxed value into the box, applying an opaque operation to the box, and reloading the raw object pointer from the box.  (The opaque operator is key to prevent the store/load pair from being removed.  It also implements the actual safepoint operation.)  Is this a correct restatement?

Here's an example:
gcroot a = ...; b = ...;
object* ax = *a, bx = *b;
repeat 500000 iterations {
  spin for 50 instructions
  *a = ax;
  *b = bx;
  safepoint(a, b)
  ax = *a;
  bx = *b;
}

If so, I would agree with you that this is a correct encoding of object relocation.  I think what got me confused was using the in-tree examples to reverse engineer the semantics.  Both the Erlang and Ocaml examples insert safepoints after all the LLVM IR passes are run and derived pointers were inserted.  This was the part that got me thinking that the gcroot semantics were unsuitable for a precise relocating collector.  If you completely ignored these implementations and inserted the full box/safepoint call/unbox implementation at each safepoint before invoking any optimizations, I think this would be correct. 

One problem with this encoding is that there does not currently exist an obvious mechanism to describe a safepoint in the IR itself.  (i.e. there is no llvm.gcsafepoint)  You could model this with a "well known" function which a custom GCStrategy recorded a stackmap on every call.  It would be good to extend the existing intrinsics with such a mechanism. 

At least if I'm reading things right, the current *implementation* is not a correct implementation of the intended semantics.  In particular, the safepoint operation which provides the opaque manipulation of the boxed gcroots is not preserved into the SelectionDAG.  As a result, optimizations on the SelectionDAG could eliminate the now adjacent box/unbox pairs.  This would break a relocating collector. 

Leaving this aside, there are also a number of performance problems with the current implementation.  I'll summarize them here for now, but will expand into approaches for resolving each if this looks like the most likely implementation strategy. 
1) Explicitly adding box/safepoint/unbox prevents creation of derived pointers.  This prevents optimizations such as loop-blocking.  Ideally, we'd rather let the derived pointers be created and capture them for updating at the safepoint. 
2) The redundant loads and stores required for box/unbox.  (These might be partially eliminated by other optimization passes.)
3) The inability to allocate GC roots to registers. 
4) Frame sizes are implicitly doubled since both an unboxed and boxed representation of every object pointer is required. 

Frankly, I think that (3) is likely solvable, but that (1) and (4) could be fatal for a high performance implementation.  I don't have hard numbers on that; I'm simply stating my intuition. 


The fact that the mutable memory associated with a gcroot() is allocated on the stack (rather than, say, a machine register) is an implementation detail; fixing it doesn't require altering the (conceptual) interface for LLVM's existing GC support, AFAICT.
While in principal, I agree with you here, in practice the difference is actually quite significant.  I am not convinced that the implementation would be straight forward.  Let's set this aside for the moment and focus on the more fundamental questions. 
 
We lower gc_relocate to a
pseudo opcode which lowered into nothing after register allocation.

The problem is, of course, the performance penalty.  Does it make
sense to get the register allocator "see" the gc_relocate instruction
as a copy so that they get the same register / slot?  Will that
violate the intended semantics of gc_relocate (for example, after
assigning the same register / slot to %a and %a.relocated, are there
passes that will try to cache derived pointers across loop
iterations)?

Thanks,
-- Sanjoy


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Interfacing llvm with a precise, relocating GC

Ben Karel
In reply to this post by Philip Reames-4



On Fri, Oct 25, 2013 at 8:35 PM, Philip Reames <[hidden email]> wrote:
On 10/25/13 1:10 PM, Ben Karel wrote:



On Thu, Oct 24, 2013 at 6:42 PM, Sanjoy Das <[hidden email]> wrote:
Hi Rafael, Andrew,

Thank you for the prompt reply.

One approach we've been considering involves representing the
constraint "pointers to heap objects are invalidated at every
safepoint" somehow in the IR itself.  So, if %a and %b are values the
GC is interested in, the safepoint at the back-edge of a loop might
look like:

  ; <label>: body
    %a = phi i32 [ %a.relocated, %body ] [ %a.initial_value, %pred ]
    %b = phi i32 [ %b.relocated, %body ] [ %b.initial_value, %pred ]
    ;; Use %a and %b

    ;; The safepoint starts here
    %a.relocated = @llvm.gc_relocate(%a)
    %b.relocated = @llvm.gc_relocate(%b)
    br %body

This allows us to not bother with relocating derived pointers pointing
inside %a and %b, since it is semantically incorrect for llvm to reuse
them in the next iteration of the loop.

This is the right general idea, but you can already express this constraint in LLVM as it exists today, by using llvm.gcroot().  As you noted, this also solves the interior-pointer problem by making it the front end's job to convey to LLVM when it would/would not be safe to cache interior pointers across loop iterations. The invariant that a front-end must maintain is that any pointer which is live across a given potential-GC-point must be reloaded from its root after a (relocating) GC might have occurred. This falls naturally out of the viewpoint that %a is not an object pointer, it's the name of an object pointer's value at a given point in time. So, of course, whenever that object pointer's value might change, there must be a new name.
To rephrase, you're saying that the current gcroot mechanism is analogous to a boxed object pointer.  We could model a safepoint by storing the unboxed value into the box, applying an opaque operation to the box, and reloading the raw object pointer from the box.  (The opaque operator is key to prevent the store/load pair from being removed.  It also implements the actual safepoint operation.)  Is this a correct restatement?

What does it mean to model a safepoint?
 
Here's an example:
gcroot a = ...; b = ...;
object* ax = *a, bx = *b;
repeat 500000 iterations {
  spin for 50 instructions
  *a = ax;
  *b = bx;
  safepoint(a, b)
  ax = *a;
  bx = *b;
}

If so, I would agree with you that this is a correct encoding of object relocation.  I think what got me confused was using the in-tree examples to reverse engineer the semantics.  Both the Erlang and Ocaml examples insert safepoints after all the LLVM IR passes are run and derived pointers were inserted.  This was the part that got me thinking that the gcroot semantics were unsuitable for a precise relocating collector.  If you completely ignored these implementations and inserted the full box/safepoint call/unbox implementation at each safepoint before invoking any optimizations, I think this would be correct.

One problem with this encoding is that there does not currently exist an obvious mechanism to describe a safepoint in the IR itself.  (i.e. there is no llvm.gcsafepoint)  You could model this with a "well known" function which a custom GCStrategy recorded a stackmap on every call.  It would be good to extend the existing intrinsics with such a mechanism. 

At least if I'm reading things right, the current *implementation* is not a correct implementation of the intended semantics.  In particular, the safepoint operation which provides the opaque manipulation of the boxed gcroots is not preserved into the SelectionDAG.  As a result, optimizations on the SelectionDAG could eliminate the now adjacent box/unbox pairs.  This would break a relocating collector. 

Since GC safepoints aren't explicitly represented at the IR level, either, SelectionDAG is a red herring: regular IR-level optimizations need the same scrutiny. But safepoints won't be inserted arbitrarily. Because there are only four places where safe points can occur (pre/post call, loop, return), there are essentially only two optimizations that might break a relocating collector: a call being reordered with the stores and loads surrounding it, or constant propagation of an alloca's contents across a loop backedge without eliminating the backedge (iff the GC plugin requests loop safepoints). Neither of these are performed by LLVM, AFAICT, which implies that the implementation is not broken in the way you suggested.

But if you can prove me wrong by producing an example that LLVM optimizations break, that would be cool :-)
 
Leaving this aside, there are also a number of performance problems with the current implementation.  I'll summarize them here for now, but will expand into approaches for resolving each if this looks like the most likely implementation strategy. 
1) Explicitly adding box/safepoint/unbox prevents creation of derived pointers.  This prevents optimizations such as loop-blocking.  Ideally, we'd rather let the derived pointers be created and capture them for updating at the safepoint. 
2) The redundant loads and stores required for box/unbox.  (These might be partially eliminated by other optimization passes.)
3) The inability to allocate GC roots to registers. 
4) Frame sizes are implicitly doubled since both an unboxed and boxed representation of every object pointer is required. 

Frankly, I think that (3) is likely solvable, but that (1) and (4) could be fatal for a high performance implementation.  I don't have hard numbers on that; I'm simply stating my intuition. 


The fact that the mutable memory associated with a gcroot() is allocated on the stack (rather than, say, a machine register) is an implementation detail; fixing it doesn't require altering the (conceptual) interface for LLVM's existing GC support, AFAICT.
While in principal, I agree with you here, in practice the difference is actually quite significant.  I am not convinced that the implementation would be straight forward.  Let's set this aside for the moment and focus on the more fundamental questions. 
 
We lower gc_relocate to a
pseudo opcode which lowered into nothing after register allocation.

The problem is, of course, the performance penalty.  Does it make
sense to get the register allocator "see" the gc_relocate instruction
as a copy so that they get the same register / slot?  Will that
violate the intended semantics of gc_relocate (for example, after
assigning the same register / slot to %a and %a.relocated, are there
passes that will try to cache derived pointers across loop
iterations)?

Thanks,
-- Sanjoy


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Interfacing llvm with a precise, relocating GC

Philip Reames-4
In reply to this post by Filip Pizlo
On 10/26/13 7:40 AM, Filip Pizlo wrote:
> You can implement a copying GC (what the kids these days call
> relocating) without accurate roots.
I use "relocating" to brush over the distinction between "copying" and
"compacting" collectors.  For the purposes of our discussions, the two
are interchangeable though.
> Why aren't you just using the well-known Bartlett style of GC, which
> allows for relocation even in the presence of conservative roots (or
> accurate roots that don't allow relocation)?
You've brought this up a few times, so I want to go ahead and address
your point in detail.

Background for others following this discussion: A Bartlett style
garbage collector conservatively scans the stack and promotes* any
object reachable from a possible heap pointer in place.  Objects
directly reachable from the stack are not relocated.  Objects reachable
from these objects are treated normally by the relocating collector and
may be moved.   To support in place promotion, a small amount of extra
heap metadata is generally required.

* "Promote" is used to indicate the object is conceptually moved into
the "to space".  This is not the same promotion used when discussing
generational collectors.

I agree that Bartlett collectors have significant advantages with
regards to compatibility with legacy code and compilers.  A Bartlett
collector does not require derived pointer tracking inside the
compiler.  A Bartlett collector simplifies interfacing with code in
other languages (i.e. C/C++) since it has far fewer assumptions about
stack layout.

Unfortunately, Bartlett collectors also have a number of serious
disadvantages.  To name a few:
- False Retention - Due to conservative stack scanning, objects which
are not live may be falsely retained.  (i.e. false roots or dead roots)
- External Fragmentation - Promoted-in-place objects can be anywhere in
the heap.  This introduces fragmentation and complicates both allocation
and collection.
- Internal Fragmentation - Promoted-in-place objects cause the retention
of the entire "page" containing them.  Unless using a free-list style
allocator, all of this space is inaccessible.
- Garbage Drag - Since falsely retained data can point to other heap
objects, an unknown fraction of the heap may be falsely kept live. This
is particularly problematic for programs with extreme heap usage.  
Consider a program which maintains a large B-tree in memory with
updates.  Or a program manipulating large graphs.  In either case, a
single stray pointer can force the retention of large portions of the
heap graph.  (In practice, this does happen.)

I do acknowledge that each of these is individually surmountable.
However, doing so requires significant work beyond the basic design
described in Bartlett's initial work.  Given the complexity of
engineering a production grade garbage collector, this additional
complexity is highly undesirable.

Taking a step back from the technical merits, I have one other
overriding reason for not pursuing a Bartlett style approach: we already
have a fully precise relocating collector.  We are not interested in
abandoning this.  From our side, the only question is whether LLVM is
suitable for use with our existing collector.

I'm happy to keep debating the merits of various approaches with you
(this is fun!), but from a practical perspective, using an alternate
approach to garbage collection is simply a non-starter for us.

Yours,
Philip
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Interfacing llvm with a precise, relocating GC

Philip Reames-4
In reply to this post by Ben Karel
On 10/26/13 9:08 AM, Ben Karel wrote:



On Fri, Oct 25, 2013 at 8:35 PM, Philip Reames <[hidden email]> wrote:
On 10/25/13 1:10 PM, Ben Karel wrote:



On Thu, Oct 24, 2013 at 6:42 PM, Sanjoy Das <[hidden email]> wrote:
Hi Rafael, Andrew,

Thank you for the prompt reply.

One approach we've been considering involves representing the
constraint "pointers to heap objects are invalidated at every
safepoint" somehow in the IR itself.  So, if %a and %b are values the
GC is interested in, the safepoint at the back-edge of a loop might
look like:

  ; <label>: body
    %a = phi i32 [ %a.relocated, %body ] [ %a.initial_value, %pred ]
    %b = phi i32 [ %b.relocated, %body ] [ %b.initial_value, %pred ]
    ;; Use %a and %b

    ;; The safepoint starts here
    %a.relocated = @llvm.gc_relocate(%a)
    %b.relocated = @llvm.gc_relocate(%b)
    br %body

This allows us to not bother with relocating derived pointers pointing
inside %a and %b, since it is semantically incorrect for llvm to reuse
them in the next iteration of the loop.

This is the right general idea, but you can already express this constraint in LLVM as it exists today, by using llvm.gcroot().  As you noted, this also solves the interior-pointer problem by making it the front end's job to convey to LLVM when it would/would not be safe to cache interior pointers across loop iterations. The invariant that a front-end must maintain is that any pointer which is live across a given potential-GC-point must be reloaded from its root after a (relocating) GC might have occurred. This falls naturally out of the viewpoint that %a is not an object pointer, it's the name of an object pointer's value at a given point in time. So, of course, whenever that object pointer's value might change, there must be a new name.
To rephrase, you're saying that the current gcroot mechanism is analogous to a boxed object pointer.  We could model a safepoint by storing the unboxed value into the box, applying an opaque operation to the box, and reloading the raw object pointer from the box.  (The opaque operator is key to prevent the store/load pair from being removed.  It also implements the actual safepoint operation.)  Is this a correct restatement?

What does it mean to model a safepoint?
Er, not quite sure what your question is.

My intent was to describe an abstract model which provides the correct semantics for a safepoint.  Is your confusion about my definition of "safepoint"?  Or my desire for an abstract model?  Let me know and I'll try to explain.
 
Here's an example:
gcroot a = ...; b = ...;
object* ax = *a, bx = *b;
repeat 500000 iterations {
  spin for 50 instructions
  *a = ax;
  *b = bx;
  safepoint(a, b)
  ax = *a;
  bx = *b;
}

If so, I would agree with you that this is a correct encoding of object relocation.  I think what got me confused was using the in-tree examples to reverse engineer the semantics.  Both the Erlang and Ocaml examples insert safepoints after all the LLVM IR passes are run and derived pointers were inserted.  This was the part that got me thinking that the gcroot semantics were unsuitable for a precise relocating collector.  If you completely ignored these implementations and inserted the full box/safepoint call/unbox implementation at each safepoint before invoking any optimizations, I think this would be correct.

One problem with this encoding is that there does not currently exist an obvious mechanism to describe a safepoint in the IR itself.  (i.e. there is no llvm.gcsafepoint)  You could model this with a "well known" function which a custom GCStrategy recorded a stackmap on every call.  It would be good to extend the existing intrinsics with such a mechanism. 

At least if I'm reading things right, the current *implementation* is not a correct implementation of the intended semantics.  In particular, the safepoint operation which provides the opaque manipulation of the boxed gcroots is not preserved into the SelectionDAG.  As a result, optimizations on the SelectionDAG could eliminate the now adjacent box/unbox pairs.  This would break a relocating collector. 

Since GC safepoints aren't explicitly represented at the IR level, either, SelectionDAG is a red herring: regular IR-level optimizations need the same scrutiny.  But safepoints won't be inserted arbitrarily. Because there are only four places where safe points can occur (pre/post call, loop, return), there are essentially only two optimizations that might break a relocating collector: a call being reordered with the stores and loads surrounding it, or constant propagation of an alloca's contents across a loop backedge without eliminating the backedge (iff the GC plugin requests loop safepoints). Neither of these are performed by LLVM, AFAICT, which implies that the implementation is not broken in the way you suggested.
I agree that the SelectionDAG bit was a red herring.  It turned out I had misread the code involved with selecting safepoints.  I had been under the impression that ran between LLVM IR opts and SelectionDAG opts.  It doesn't; instead, it runs after machine dependent optimization and register allocation.  Sorry for the resulting confusion.  It's definitely time for me to transition to actually trying out examples rather than simply relying on code inspection.

On a different note, your response was based on two assumptions that seem problematic.  I'm not trying to address your argument directly, just pointing out possible issues for future discussions.
1) You're assuming that these are the only place a reasonable GC might want to place safepoints.  This is untrue.  As an example, you'd like the optimizer to be able to move a safepoint outside a bounded loop with a small trip count.  Not sure how this might effect things; just want to get it out there. 
2) You're basing your arguments on what LLVM currently does, not what would be semantically correct for it to do.  Given how quickly LLVM is evolving, this is problematic.  (I'm not saying your reasoning is otherwise right or wrong.  I haven't thought it through yet.)


But if you can prove me wrong by producing an example that LLVM optimizations break, that would be cool :-)
I'm sure I'll have broken examples in the future, but not just yet.  :)

 
Leaving this aside, there are also a number of performance problems with the current implementation.  I'll summarize them here for now, but will expand into approaches for resolving each if this looks like the most likely implementation strategy. 
1) Explicitly adding box/safepoint/unbox prevents creation of derived pointers.  This prevents optimizations such as loop-blocking.  Ideally, we'd rather let the derived pointers be created and capture them for updating at the safepoint. 
2) The redundant loads and stores required for box/unbox.  (These might be partially eliminated by other optimization passes.)
3) The inability to allocate GC roots to registers. 
4) Frame sizes are implicitly doubled since both an unboxed and boxed representation of every object pointer is required. 

Frankly, I think that (3) is likely solvable, but that (1) and (4) could be fatal for a high performance implementation.  I don't have hard numbers on that; I'm simply stating my intuition. 


The fact that the mutable memory associated with a gcroot() is allocated on the stack (rather than, say, a machine register) is an implementation detail; fixing it doesn't require altering the (conceptual) interface for LLVM's existing GC support, AFAICT.
While in principal, I agree with you here, in practice the difference is actually quite significant.  I am not convinced that the implementation would be straight forward.  Let's set this aside for the moment and focus on the more fundamental questions. 
 
We lower gc_relocate to a
pseudo opcode which lowered into nothing after register allocation.

The problem is, of course, the performance penalty.  Does it make
sense to get the register allocator "see" the gc_relocate instruction
as a copy so that they get the same register / slot?  Will that
violate the intended semantics of gc_relocate (for example, after
assigning the same register / slot to %a and %a.relocated, are there
passes that will try to cache derived pointers across loop
iterations)?

Thanks,
-- Sanjoy


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev




_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Interfacing llvm with a precise, relocating GC

Philip Reames-4
In reply to this post by Michael Lewis
On 10/25/13 9:37 PM, Michael Lewis wrote:
I'm also highly interested in relocating-GC support from LLVM. Up until now my GC implementation has been non-relocating which is obviously kind of a bummer given that it inhibits certain classes of memory allocation/deallocation tricks.
Out of idle curiosity, which optimizations were you most interested in?
I wrote up a bunch of my findings on the implementation of my GC here: https://code.google.com/p/epoch-language/wiki/GarbageCollectionScheme
Thanks for sharing your experiences.  I had actually run across this a while ago and read through it.  It's always nice to learn from others rather than repeating the same lessons. For example, the stack coloring problem you mention was one I'm glad to know I don't have to discover myself.  :)


Frankly I haven't had time to get much further than idle pondering of how I'd go about implementing relocation, but it seems to me like the existing read/write barrier intrinsics might be sufficient if abused properly by the front-end and lowered carefully. My operating hypothesis has been to stash barriers at key points - identified by the front-end itself - and possibly elide them during lowering if it's safe to do so. If my understanding is correct, it should be possible to lower the barriers into exactly the kind of boxing/unboxing procedure described in this thread.

Based on my experimentation so far, which is admittedly minimal, I think the overhead of updating relocated pointers is actually not as bad as it sounds. It isn't strictly necessary to store both a boxed *and* unboxed pointer in every frame. In fact, in the current incarnation of gcroot, marking an alloca as a gcroot effectively forces a stack spill for that alloca; this means that the relocating collector just updates the single existing pointer on the stack when it does its magic, and you're done. With proper barriers in place, and/or careful location of safepoints by the front-end, it's not that hard to make sure that a relocated pointer gets updated.
I'm not concerned about the correctness of such an implementation.  I am concerned about the performance.  I think it's time for us to do some prototyping on our side to see how things work out in practice.  I can't promise to share the detailed results, but I will try to share as much as I can. 

The real trick, as near as I can tell, would be updating registers that have live roots during a collection. But again based on my past investigations this should just be a matter of ensuring that post-collection you have a barrier that is lowered into a register refresh based on the on-stack relocated pointer value.


One thing I've been meaning to try and do is use this barrier-abuse scheme to work around the existing lack of support for tracing roots in machine registers, by effectively setting up an artificial stack spill just prior to a collection, and otherwise operating on registers as much as possible instead of gcroot'ed allocas. Again, I've only considered this as a thought exercise until now, so I apologize if there's some obvious flaw I'm not aware of in my reasoning. I haven't actually done any of this stuff yet so it's more speculative than anything - just trying to creatively engineer around the existing limitations :-)
Hey speculation is fine!  That's what I'm doing at the moment.  You have to speculate while trying figure out if something is worth actually implementing.  :)

This is conceptually close to the scheme I'm considering, though different in details.  If you get a chance, let me know how it works out. 



 - Mike




On Fri, Oct 25, 2013 at 5:35 PM, Philip Reames <[hidden email]> wrote:
On 10/25/13 1:10 PM, Ben Karel wrote:



On Thu, Oct 24, 2013 at 6:42 PM, Sanjoy Das <[hidden email]> wrote:
Hi Rafael, Andrew,

Thank you for the prompt reply.

One approach we've been considering involves representing the
constraint "pointers to heap objects are invalidated at every
safepoint" somehow in the IR itself.  So, if %a and %b are values the
GC is interested in, the safepoint at the back-edge of a loop might
look like:

  ; <label>: body
    %a = phi i32 [ %a.relocated, %body ] [ %a.initial_value, %pred ]
    %b = phi i32 [ %b.relocated, %body ] [ %b.initial_value, %pred ]
    ;; Use %a and %b

    ;; The safepoint starts here
    %a.relocated = @llvm.gc_relocate(%a)
    %b.relocated = @llvm.gc_relocate(%b)
    br %body

This allows us to not bother with relocating derived pointers pointing
inside %a and %b, since it is semantically incorrect for llvm to reuse
them in the next iteration of the loop.

This is the right general idea, but you can already express this constraint in LLVM as it exists today, by using llvm.gcroot().  As you noted, this also solves the interior-pointer problem by making it the front end's job to convey to LLVM when it would/would not be safe to cache interior pointers across loop iterations. The invariant that a front-end must maintain is that any pointer which is live across a given potential-GC-point must be reloaded from its root after a (relocating) GC might have occurred. This falls naturally out of the viewpoint that %a is not an object pointer, it's the name of an object pointer's value at a given point in time. So, of course, whenever that object pointer's value might change, there must be a new name.
To rephrase, you're saying that the current gcroot mechanism is analogous to a boxed object pointer.  We could model a safepoint by storing the unboxed value into the box, applying an opaque operation to the box, and reloading the raw object pointer from the box.  (The opaque operator is key to prevent the store/load pair from being removed.  It also implements the actual safepoint operation.)  Is this a correct restatement?

Here's an example:
gcroot a = ...; b = ...;
object* ax = *a, bx = *b;
repeat 500000 iterations {
  spin for 50 instructions
  *a = ax;
  *b = bx;
  safepoint(a, b)
  ax = *a;
  bx = *b;
}

If so, I would agree with you that this is a correct encoding of object relocation.  I think what got me confused was using the in-tree examples to reverse engineer the semantics.  Both the Erlang and Ocaml examples insert safepoints after all the LLVM IR passes are run and derived pointers were inserted.  This was the part that got me thinking that the gcroot semantics were unsuitable for a precise relocating collector.  If you completely ignored these implementations and inserted the full box/safepoint call/unbox implementation at each safepoint before invoking any optimizations, I think this would be correct. 

One problem with this encoding is that there does not currently exist an obvious mechanism to describe a safepoint in the IR itself.  (i.e. there is no llvm.gcsafepoint)  You could model this with a "well known" function which a custom GCStrategy recorded a stackmap on every call.  It would be good to extend the existing intrinsics with such a mechanism. 

At least if I'm reading things right, the current *implementation* is not a correct implementation of the intended semantics.  In particular, the safepoint operation which provides the opaque manipulation of the boxed gcroots is not preserved into the SelectionDAG.  As a result, optimizations on the SelectionDAG could eliminate the now adjacent box/unbox pairs.  This would break a relocating collector. 

Leaving this aside, there are also a number of performance problems with the current implementation.  I'll summarize them here for now, but will expand into approaches for resolving each if this looks like the most likely implementation strategy. 
1) Explicitly adding box/safepoint/unbox prevents creation of derived pointers.  This prevents optimizations such as loop-blocking.  Ideally, we'd rather let the derived pointers be created and capture them for updating at the safepoint. 
2) The redundant loads and stores required for box/unbox.  (These might be partially eliminated by other optimization passes.)
3) The inability to allocate GC roots to registers. 
4) Frame sizes are implicitly doubled since both an unboxed and boxed representation of every object pointer is required. 

Frankly, I think that (3) is likely solvable, but that (1) and (4) could be fatal for a high performance implementation.  I don't have hard numbers on that; I'm simply stating my intuition. 


The fact that the mutable memory associated with a gcroot() is allocated on the stack (rather than, say, a machine register) is an implementation detail; fixing it doesn't require altering the (conceptual) interface for LLVM's existing GC support, AFAICT.
While in principal, I agree with you here, in practice the difference is actually quite significant.  I am not convinced that the implementation would be straight forward.  Let's set this aside for the moment and focus on the more fundamental questions. 
 
We lower gc_relocate to a
pseudo opcode which lowered into nothing after register allocation.

The problem is, of course, the performance penalty.  Does it make
sense to get the register allocator "see" the gc_relocate instruction
as a copy so that they get the same register / slot?  Will that
violate the intended semantics of gc_relocate (for example, after
assigning the same register / slot to %a and %a.relocated, are there
passes that will try to cache derived pointers across loop
iterations)?

Thanks,
-- Sanjoy


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev




_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Interfacing llvm with a precise, relocating GC

Filip Pizlo
In reply to this post by Philip Reames-4

On Oct 28, 2013, at 11:03 AM, Philip Reames <[hidden email]> wrote:

> On 10/26/13 7:40 AM, Filip Pizlo wrote:
>> You can implement a copying GC (what the kids these days call relocating) without accurate roots.
> I use "relocating" to brush over the distinction between "copying" and "compacting" collectors.  For the purposes of our discussions, the two are interchangeable though.
>> Why aren't you just using the well-known Bartlett style of GC, which allows for relocation even in the presence of conservative roots (or accurate roots that don't allow relocation)?
> You've brought this up a few times, so I want to go ahead and address your point in detail.
>
> Background for others following this discussion: A Bartlett style garbage collector conservatively scans the stack and promotes* any object reachable from a possible heap pointer in place.  Objects directly reachable from the stack are not relocated.  Objects reachable from these objects are treated normally by the relocating collector and may be moved.   To support in place promotion, a small amount of extra heap metadata is generally required.
>
> * "Promote" is used to indicate the object is conceptually moved into the "to space".  This is not the same promotion used when discussing generational collectors.
>
> I agree that Bartlett collectors have significant advantages with regards to compatibility with legacy code and compilers.  A Bartlett collector does not require derived pointer tracking inside the compiler.  A Bartlett collector simplifies interfacing with code in other languages (i.e. C/C++) since it has far fewer assumptions about stack layout.
>
> Unfortunately, Bartlett collectors also have a number of serious disadvantages.  To name a few:
> - False Retention - Due to conservative stack scanning, objects which are not live may be falsely retained.  (i.e. false roots or dead roots)

I don't know of evidence that this is a problem in practice.  Bartlett collectors have been used (and are being used) in shipping products.  The good ole' IBM production VM used it, and we use it in WebKit.  Some other - lesser known, less widely used - projects have also used it, like the Modula 3 runtime.

> - External Fragmentation - Promoted-in-place objects can be anywhere in the heap.  This introduces fragmentation and complicates both allocation and collection.

I only know of one example of this being a problem: if you're trying to build a 32-bit VM that is intended to be used with 4GB heaps (i.e. you are maxing out your virtual address space) and the mutator intends to use most of the available memory for a single huge array.

In a 64-bit (err, 48-bit) address space that is common these days, this ain't gonna happen.

> - Internal Fragmentation - Promoted-in-place objects cause the retention of the entire "page" containing them.  Unless using a free-list style allocator, all of this space is inaccessible.

I don't know of evidence that this is a problem in practice.  The heap is much huger than the stack(s) - often 3 orders of magnitude.  Bartlett collectors usually run with small-ish "pages" - 4KB or not much bigger.  In order to waste a significant amount of heap space due to internal fragmentation, you'd have to find yourself in a situation where each slot on your stacks points to a distinct heap page.  The probability of such a thing happening is probably on the same order of magnitude as the probability of a cosmic ray frying your RAM.  Anyway, the empirical evidence appears to suggest that false retention is negligible.

And yes, I have often considered implementing free-listing for those pages.  But right now I view this as a solution looking for a problem.

> - Garbage Drag - Since falsely retained data can point to other heap objects, an unknown fraction of the heap may be falsely kept live. This is particularly problematic for programs with extreme heap usage.  Consider a program which maintains a large B-tree in memory with updates.  Or a program manipulating large graphs.  In either case, a single stray pointer can force the retention of large portions of the heap graph.  (In practice, this does happen.)

I don't know of evidence that this ever happens.

>
> I do acknowledge that each of these is individually surmountable. However, doing so requires significant work beyond the basic design described in Bartlett's initial work.  Given the complexity of engineering a production grade garbage collector, this additional complexity is highly undesirable.

Are you suggesting that others who have shipped Bartlett-based collectors were in fact failing to ship a production-grade collector, or that they actually shipped something more sophisticated than a Bartlett collector?

WebKit doesn't attempt to get around the disadvantages that you listed, because these disadvantages are purely theoretical.  We track memory use bugs all the time and we haven't found any such bug that can be traced to conservative roots.

I don't know to what extent IBM's production VM used Bartlett "to the letter", or if they made some changes to it.

You've brought up some of theoretical disadvantages of Bartlett collectors and you refer to them as being "serious".  I am curious what empirical evidence you are using to support the claim that these disadvantages are anything but philosophical?  Remember, GCs are not a theoretician's game.  This is an empirical science and production collectors tend to be based on techniques that arose through experiments.

>
> Taking a step back from the technical merits, I have one other overriding reason for not pursuing a Bartlett style approach: we already have a fully precise relocating collector.  We are not interested in abandoning this.  From our side, the only question is whether LLVM is suitable for use with our existing collector.

I can appreciate this and I do not mean to imply that accurate GC shouldn't be considered at all.  It's an interesting compiler feature.

But I want to make sure that such a discussion is grounded in reality: adding accurate GC to a compiler is hard, and you don't need it to do copying.  You can build a production-quality garbage collector without it.

>
> I'm happy to keep debating the merits of various approaches with you (this is fun!), but from a practical perspective, using an alternate approach to garbage collection is simply a non-starter for us.
>
> Yours,
> Philip

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Interfacing llvm with a precise, relocating GC

Talin-3
In reply to this post by Sanjoy Das-2
Sanjoy: This document which I wrote several years ago may be of some use to you:


I have successfully implemented a copying collector using LLVM. I did not implement support for interior pointers, however I have a number of ideas on how to approach it. The language that I was implementing was similar to C# in that classes were divided into "reference" types and "value" types. A pointer to a reference type on the heap always pointed to the start of an allocation, whereas a pointer to a value type was always an interior pointer. (In other words, a value type could never exist by itself in the heap, it had to be embedded within some reference type). This means that the compiler could always know whether a pointer was an internal pointer or not. Thus, pointers to reference types were just machine-level pointers, whereas pointers to value types consisted of a pointer+offset, with the pointer part pointing to the start of a heap allocation. Combined with the fact that pointers to value types were relatively rare, this allowed internal pointers with a minimum of overhead.

Now, having said all that, I feel compelled to give a few warnings. Part of the reason I abandoned this project was because of limitations in LLVM's garbage collection intrinsics, which I have written about extensively on this list. The current llvm.gcroot strategy requires the frontend to be very complex, generate highly inefficient code, and that code is mostly unoptimizable since LLVM's optimizers generally won't touch a value that has been marked as a GC root.

Worse, the support for GC in the LLVM community is fairly low - the garbage collection intrinsics in LLVM have not been updated or improved in the 7 years of my following the project, and there's been very little discussion of GC on the mailing list (I do a search for the word "collect" in the LLVM archives about once a month, which is how this thread came to my attention.) Most of the people working on/with LLVM are working with non-GC languages, or with languages that have simple enough memory models (e.g. "everything is an atom") that the existing intrinsics are sufficient. There are also a few people who have gotten around the problems by defining their own stack frames instead of using the LLVM intrinsics.

There have been numerous proposals over the years for better GC intrinsics, but nothing has come out of these discussions so far. There's a good reason for this: improving the GC support would require a major commitment, since all of the backend code generators and optimizers would potentially be affected.

My current favorite GC proposal involves annotating types - that is, to define a new kind of derived type that is essentially a 2-tuple consisting of a base type + GC metadata (the second argument to llvm.gcroot). This means that "root-ness" would be a property of a type rather than a value, which means that the rootness could automatically be propagated to intermediate values or SSA values through optimization without the frontend having to do a lot of spilling and reloading of values. (Plus having the ability to associate metadata with types might be useful for other things besides GC.)



On Thu, Oct 24, 2013 at 2:32 PM, Sanjoy Das <[hidden email]> wrote:
Hello llvm-dev!

My colleages and I are currently evaluating llvm's suitability as a
JIT compiler interfacing with a precise, relocating garbage collector.
While we couldn't find code or writeups that deal with the issues
specific to this design goal, it is entirely possible that we may have
missed something; we would appreciate references to relevant code or
writeups that people on this list may be aware of.

As an example, one issue that makes this non-trivial is that llvm (as
far as we know) is free to manufacture pointers to locations _inside_
objects, something referred to as a "derived pointer" in some places.
Since these pointers need to be updated in sync with the objects they
point into, a relocating GC needs to be aware of them; and the runtime
needs to be able to read off which registers and stack slots hold such
pointers at every safepoint.  We've looked into llvm's existing GC
support, and the mechanism it provides does not seem to help in this
use case.

This email is deliberately terse, but we are more than happy to get
into details about the approaches we've considered as the discussion
progresses.  Pointers to existing work related to this or similar
issues is especially welcome.

Thanks!

-- Sanjoy

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



--
-- Talin

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Interfacing llvm with a precise, relocating GC

Sanjoy Das-2
Hi Talin,

Thank you for your response!

Since you mention that you don't implement derived pointers, I'm a
little confused about how your approach solves the issue I brought up.
It seems to me that unless you plan on pinning some objects, support
for derived pointers isn't optional.  I'm not talking about derived
pointers that the front-end introduces (which can be tracked) but the
ones that are introduced by llvm during IR level optimizations or,
worse, during and after instruction selection.

To be sure that we're talking about the same thing, as an example,
consider the loop (in pseudo llvm IR):

for (int i = 0 to (length - 1)) {
 total += a[i];
 safepoint(); // This thread may be stopped here, and `a'
              // may be relocated.
}

llvm can transform this loop to

for (int *i = &a[0] to &a[length - 1]) {
 total += *i;
 safepoint(); // llvm has now introduced an additional
              // derived pointer, `i'.
}

From llvm's point of view this is a valid transformation; but it ends
up creating a new pointer the GC has to be aware of, and needs to be
relocated in sync with a.

Thanks!
-- Sanjoy

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Interfacing llvm with a precise, relocating GC

Reid Kleckner-2
On Tue, Oct 29, 2013 at 1:17 PM, Sanjoy Das <[hidden email]> wrote:
Hi Talin,

Thank you for your response!

Since you mention that you don't implement derived pointers, I'm a
little confused about how your approach solves the issue I brought up.
It seems to me that unless you plan on pinning some objects, support
for derived pointers isn't optional.  I'm not talking about derived
pointers that the front-end introduces (which can be tracked) but the
ones that are introduced by llvm during IR level optimizations or,
worse, during and after instruction selection.

To be sure that we're talking about the same thing, as an example,
consider the loop (in pseudo llvm IR):

for (int i = 0 to (length - 1)) {
 total += a[i];
 safepoint(); // This thread may be stopped here, and `a'
              // may be relocated.
}

llvm can transform this loop to

for (int *i = &a[0] to &a[length - 1]) {
 total += *i;
 safepoint(); // llvm has now introduced an additional
              // derived pointer, `i'.
}

From llvm's point of view this is a valid transformation; but it ends
up creating a new pointer the GC has to be aware of, and needs to be
relocated in sync with a.

LLVM shouldn't do this if you're using the gcroot intrinsic.  Your example would be:

llvm.gcroot(a) // a escapes
for (int i = 0 to length - 1) {
  total += a[i];
  safepoint() // a could be modified since it escaped
}

See, like Talin said, gcroot blocks a lot of optimization.  :)

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Interfacing llvm with a precise, relocating GC

Rayiner Hashem
In reply to this post by Sanjoy Das-2
With regard to Bartlett-style collectors, they are also used in CMUCL/SBCL, OpenDylan, and various products based on Ravenbrook's Memory Pool System. Also, while Mono doesn't use a Bartlett-style collector, it does support pinning objects referenced from ambiguous stack roots, in an otherwise copying collector. While ITA has expressed concerns with SBCL's GC, they seem to involve the kind of problems copying collectors generally face with huge heaps, not anything particular to conservative stack scanning. In any case, people interested in the trade-offs involved might want to talk to people involved with these projects about their experiences.

Also, it is probably possible to support a more-precise method of stack scanning without additional support from LLVM. I bet the method of this paper: http://www.cs.kent.ac.uk/pubs/2009/3128/content.pdf, could be implemented more directly in LLVM using the exception handling framework and a custom stack walker. The basic idea would be to generate code in a landing pad to save the values (but not the addresses) of live roots to some memory location. Instead of call instructions, the front-end emits invoke instructions that can branch to that landing pad. During GC, a custom stack walker can walk up the stack and call the code in the landing pads to gather roots, without unwinding the stack. This approach allows LLVM to promote roots to SSA values, and encodes the right semantics: it's okay to create derived roots across a call (safepoint) so long as the original value is available for the landing pad. It does not allow the values to be changed during GC, but at least addresses the concern of false retention from ambiguous roots.

It should be possible, if somewhat more difficult, to generate landing pads that merge control flow back to the non-exceptional case, after reloading the values of roots from some memory location. This approach expresses the data-flow semantic that the values of roots may be reloaded from memory across a call. The tricky part with this approach is how to express to handle the fact that LLVM's invoke instruction only defines a value along the non-exceptional edge. But I'm sure there is some cleverness that could address that.

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
12