Future plans for GC in LLVM

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Future plans for GC in LLVM

Philip Reames-4
Now that the statepoint changes have landed, I wanted to start a
discussion about what's next for GC support in LLVM.  I'm going to
sketch out a strawman proposal, but I'm not set on any of this.  I
mostly just want to draw interested parties out of the woodwork.  :)

Overall Direction:
In the short term, my intent is to preserve the functionality of the
existing code, but migrate towards a position where the gcroot specific
pieces are optional and well separated.  I also plan to start updating
the documentation to reflect a separation between the general support
for garbage collection (function attributes, identifying references,
load and store barrier lowering, generating stack maps) and the
implementation choices (gcroot & it's lowering vs statepoints & addr
spaces for identifying references).

Longer term, I plan to *EVENTUALLY DELETE* the existing gcroot lowering
code and in tree GCStrategies unless an interesting party speaks up.  I
have no problem with retaining some of the existing pieces for legacy
support or helping users to migrate, but as of right now, I don't know
of any such active users.  The only exception to this might be the
shadow stack GC.  Eventually in this context is at least six months from
now, but likely less than 18 months.  Hopefully, that's vague enough.  :)

HELP - If anyone knows which Ocaml implementation and which Erlang
implementation triggered the in tree GC strategies, please let me know!


Near Term Changes:
- Migrate ownership of GCStrategy objects from GCModuleInfo to
LLVMContext.  In theory, this looses the ability for two different
Modules to have the same collector with different state, but I know of
no use case for this.
- Modify the primary Function::getGC/setGC interface to return a
reference the GCStrategy object, not a string.  I will provide a
Function::setGCString and getGCString.
- Extend the GCStrategy class to include a notion of which compilation
strategy is being used.  The two choices right now will be Legacy and
Statepoint.  (Longer term, this will likely become a more fine grained
choice.)
- Separate GCStategy and related pieces from the
GCFunctionInfo/GCModuleInfo/GCMetadataPrinter lowering code.  At first,
this will simply mean clarifying documentation and rearranging code a bit.
- Document/clarify the callbacks used to customize the lowering. Decide
which of these make sense to preserve and document.

(Lest anyone get the wrong idea, the above changes are intended to be
minor cleanup.  I'm not looking to do anything controversial yet.)

Questions:
- Is proving the ability to generate a custom binary stack map format a
valuable feature?  Adapting the new statepoint infrastructure to work
with the existing GCMetadataPrinter classes wouldn't be particularly hard.
- Are there any GCs out there that need gcroot's single stack slot per
value implementation?   By default, statepoints may generate a different
stackmap for every safepoint in a function.
- Is using gcroot and allocas to mark pointers as garbage collected
references valuable?  (As opposed to using address spaces on the SSA
values themselves?)  Long term, should we retain the gcroot marker
intrinsics at all?


Philip

Appendix: The Current Implementations Key Classes:

GCStrategy - Provides a configurable description of the collector. The
strategy can also override parts of the default GC root lowering
strategy.  The concept of such a collector description is very valuable,
but the current implementation could use some cleanup.  In particular,
the custom lowering hooks are a bit of a mess.

GCMetadataPrinter - Provides a means to dump a custom binary format
describing each functions safepoints.  All safepoints in a function must
share a single root Value to stack slot mapping.

GCModuleInfo/GCFunctionInfo - These contain the metadata which is saved
to enable GCMetadataPrinter.

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Future plans for GC in LLVM

Gordon Henriksen-3
Excellent direction. My input, although I'm verrrrry far removed from the project at this point:

• I vehemently support replacement of the gcroot intrinsic. It makes the system make unnecessarily conservative decisions by default, which is incompatible with producing a high quality compiler for garbage collected languages. I  just never got ambitious enough to replace it.
• The Ocaml-specific code can be jettisoned. Ocaml was hostile to contributors outside of France and INRIA in specific, so no published callers use it that I'm aware of. (The ocaml runtime also requires an alternative calling convention, so it's highly unlikely that there's a stray caller!)
• Shadow-stack is probably useful for bootstrapping up new runtimes/new languages because it's crazy simple to interoperate with. It's not threadsafe or anything tho, so don't let it stand in the way of a high quality implementation.
• Generating a different stack map at every safepoint is highly desirable. Anything that depends on the contrary deserves to be broken!
• It was my experience that porting a compiler was simplified by not concurrently porting the runtime. So the ability to customize serialization of the stack map was valuable, particularly if it is inexpensive to preserve that behavior. (If LLVM has gotten into the business of providing collector runtimes and memory allocators and …, then dump it!)
• Interior pointers pose a particular challenge once the gcroot intrinsic is removed. I was totally punting on that problem domain.

Psyched to see movement here. :)

- Gordon

P.S. Please keep tagged pointers in mind somehow. *coughcoughlispmachinecough*

> On Dec 4, 2014, at 20:50, Philip Reames <[hidden email]> wrote:
>
> Now that the statepoint changes have landed, I wanted to start a discussion about what's next for GC support in LLVM.  I'm going to sketch out a strawman proposal, but I'm not set on any of this.  I mostly just want to draw interested parties out of the woodwork.  :)
>
> Overall Direction:
> In the short term, my intent is to preserve the functionality of the existing code, but migrate towards a position where the gcroot specific pieces are optional and well separated.  I also plan to start updating the documentation to reflect a separation between the general support for garbage collection (function attributes, identifying references, load and store barrier lowering, generating stack maps) and the implementation choices (gcroot & it's lowering vs statepoints & addr spaces for identifying references).
>
> Longer term, I plan to *EVENTUALLY DELETE* the existing gcroot lowering code and in tree GCStrategies unless an interesting party speaks up.  I have no problem with retaining some of the existing pieces for legacy support or helping users to migrate, but as of right now, I don't know of any such active users.  The only exception to this might be the shadow stack GC.  Eventually in this context is at least six months from now, but likely less than 18 months.  Hopefully, that's vague enough.  :)
>
> HELP - If anyone knows which Ocaml implementation and which Erlang implementation triggered the in tree GC strategies, please let me know!
>
>
> Near Term Changes:
> - Migrate ownership of GCStrategy objects from GCModuleInfo to LLVMContext.  In theory, this looses the ability for two different Modules to have the same collector with different state, but I know of no use case for this.
> - Modify the primary Function::getGC/setGC interface to return a reference the GCStrategy object, not a string.  I will provide a Function::setGCString and getGCString.
> - Extend the GCStrategy class to include a notion of which compilation strategy is being used.  The two choices right now will be Legacy and Statepoint.  (Longer term, this will likely become a more fine grained choice.)
> - Separate GCStategy and related pieces from the GCFunctionInfo/GCModuleInfo/GCMetadataPrinter lowering code.  At first, this will simply mean clarifying documentation and rearranging code a bit.
> - Document/clarify the callbacks used to customize the lowering. Decide which of these make sense to preserve and document.
>
> (Lest anyone get the wrong idea, the above changes are intended to be minor cleanup.  I'm not looking to do anything controversial yet.)
>
> Questions:
> - Is proving the ability to generate a custom binary stack map format a valuable feature?  Adapting the new statepoint infrastructure to work with the existing GCMetadataPrinter classes wouldn't be particularly hard.
> - Are there any GCs out there that need gcroot's single stack slot per value implementation?   By default, statepoints may generate a different stackmap for every safepoint in a function.
> - Is using gcroot and allocas to mark pointers as garbage collected references valuable?  (As opposed to using address spaces on the SSA values themselves?)  Long term, should we retain the gcroot marker intrinsics at all?
>
>
> Philip
>
> Appendix: The Current Implementations Key Classes:
>
> GCStrategy - Provides a configurable description of the collector. The strategy can also override parts of the default GC root lowering strategy.  The concept of such a collector description is very valuable, but the current implementation could use some cleanup.  In particular, the custom lowering hooks are a bit of a mess.
>
> GCMetadataPrinter - Provides a means to dump a custom binary format describing each functions safepoints.  All safepoints in a function must share a single root Value to stack slot mapping.
>
> GCModuleInfo/GCFunctionInfo - These contain the metadata which is saved to enable GCMetadataPrinter.
>

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Future plans for GC in LLVM

Philip Reames-4

On 12/04/2014 07:38 PM, Gordon Henriksen wrote:
> Excellent direction. My input, although I'm verrrrry far removed from the project at this point:
>
> • I vehemently support replacement of the gcroot intrinsic. It makes the system make unnecessarily conservative decisions by default, which is incompatible with producing a high quality compiler for garbage collected languages. I  just never got ambitious enough to replace it.
> • The Ocaml-specific code can be jettisoned. Ocaml was hostile to contributors outside of France and INRIA in specific, so no published callers use it that I'm aware of. (The ocaml runtime also requires an alternative calling convention, so it's highly unlikely that there's a stray caller!)
Great!  Glad to hear.  I suspected this was dead, but I wasn't sure.
> • Shadow-stack is probably useful for bootstrapping up new runtimes/new languages because it's crazy simple to interoperate with. It's not threadsafe or anything tho, so don't let it stand in the way of a high quality implementation.
So far, I see no reason not to keep it.  It's fairly separate from the
problematic bits of gcroot and should be easy-ish to keep working.
> • Generating a different stack map at every safepoint is highly desirable. Anything that depends on the contrary deserves to be broken!
> • It was my experience that porting a compiler was simplified by not concurrently porting the runtime. So the ability to customize serialization of the stack map was valuable, particularly if it is inexpensive to preserve that behavior. (If LLVM has gotten into the business of providing collector runtimes and memory allocators and …, then dump it!)
We haven't.  But we might be in the business for shipping a parser for
the stack map format.  Would that solve the problem?
> • Interior pointers pose a particular challenge once the gcroot intrinsic is removed. I was totally punting on that problem domain.
Interior pointers (what I've been calling derived pointers) have first
class support with gc.statepoint.  I have no plans to try to make them
work with gcroot.
>
> Psyched to see movement here. :)
Thanks for the feedback.
>
> - Gordon
>
> P.S. Please keep tagged pointers in mind somehow. *coughcoughlispmachinecough*
So far, I haven't deliberately designed them in (or out.)  Now that
gc.statepoints has landed, I'm open to ideas and proposals if you want
to extend them.  I've given them some thought before, but I have no
current plans to make a proposal in this space.

>
>> On Dec 4, 2014, at 20:50, Philip Reames <[hidden email]> wrote:
>>
>> Now that the statepoint changes have landed, I wanted to start a discussion about what's next for GC support in LLVM.  I'm going to sketch out a strawman proposal, but I'm not set on any of this.  I mostly just want to draw interested parties out of the woodwork.  :)
>>
>> Overall Direction:
>> In the short term, my intent is to preserve the functionality of the existing code, but migrate towards a position where the gcroot specific pieces are optional and well separated.  I also plan to start updating the documentation to reflect a separation between the general support for garbage collection (function attributes, identifying references, load and store barrier lowering, generating stack maps) and the implementation choices (gcroot & it's lowering vs statepoints & addr spaces for identifying references).
>>
>> Longer term, I plan to *EVENTUALLY DELETE* the existing gcroot lowering code and in tree GCStrategies unless an interesting party speaks up.  I have no problem with retaining some of the existing pieces for legacy support or helping users to migrate, but as of right now, I don't know of any such active users.  The only exception to this might be the shadow stack GC.  Eventually in this context is at least six months from now, but likely less than 18 months.  Hopefully, that's vague enough.  :)
>>
>> HELP - If anyone knows which Ocaml implementation and which Erlang implementation triggered the in tree GC strategies, please let me know!
>>
>>
>> Near Term Changes:
>> - Migrate ownership of GCStrategy objects from GCModuleInfo to LLVMContext.  In theory, this looses the ability for two different Modules to have the same collector with different state, but I know of no use case for this.
>> - Modify the primary Function::getGC/setGC interface to return a reference the GCStrategy object, not a string.  I will provide a Function::setGCString and getGCString.
>> - Extend the GCStrategy class to include a notion of which compilation strategy is being used.  The two choices right now will be Legacy and Statepoint.  (Longer term, this will likely become a more fine grained choice.)
>> - Separate GCStategy and related pieces from the GCFunctionInfo/GCModuleInfo/GCMetadataPrinter lowering code.  At first, this will simply mean clarifying documentation and rearranging code a bit.
>> - Document/clarify the callbacks used to customize the lowering. Decide which of these make sense to preserve and document.
>>
>> (Lest anyone get the wrong idea, the above changes are intended to be minor cleanup.  I'm not looking to do anything controversial yet.)
>>
>> Questions:
>> - Is proving the ability to generate a custom binary stack map format a valuable feature?  Adapting the new statepoint infrastructure to work with the existing GCMetadataPrinter classes wouldn't be particularly hard.
>> - Are there any GCs out there that need gcroot's single stack slot per value implementation?   By default, statepoints may generate a different stackmap for every safepoint in a function.
>> - Is using gcroot and allocas to mark pointers as garbage collected references valuable?  (As opposed to using address spaces on the SSA values themselves?)  Long term, should we retain the gcroot marker intrinsics at all?
>>
>>
>> Philip
>>
>> Appendix: The Current Implementations Key Classes:
>>
>> GCStrategy - Provides a configurable description of the collector. The strategy can also override parts of the default GC root lowering strategy.  The concept of such a collector description is very valuable, but the current implementation could use some cleanup.  In particular, the custom lowering hooks are a bit of a mess.
>>
>> GCMetadataPrinter - Provides a means to dump a custom binary format describing each functions safepoints.  All safepoints in a function must share a single root Value to stack slot mapping.
>>
>> GCModuleInfo/GCFunctionInfo - These contain the metadata which is saved to enable GCMetadataPrinter.
>>

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Future plans for GC in LLVM

Russell Hadley
In reply to this post by Philip Reames-4
Hi,

This looks great.  I'm excited to see this come in and the direction looks right to me.  I've started to look at how to use LLVM to target the MS CLR GC (now being open sourced) and the statepoints are much closer to the current implementation so that could make my life easier.  On your questions, add my vote to the ability to implement a custom binary format for safepoints since it means I can just target the CLRs format.    I've been trying to keep up with your progress but since I'm coming out of lurk mode I have a few questions:

- From the documentation it looks like you're using the patchpoint stackmap format (http://llvm.org/docs/StackMaps.html#stackmap-format).  In that format you can describe register locations - but from the overview (http://llvm.org/docs/Statepoints.html#overview) it implies that all gc pointers are spilled to the stack.  Is the spilling to memory required?  Or is the plan to allow gc pointers to reside in register as well.  (I'm hoping that a store/load at safepoinsts won't be required and that they can stack register resident)
- I'm still fuzzy how code motion is blocked from moving SSA uses past the safepoint once they've been inserted?   I'm likely just missing some invariant in LLVM or the design since I can't seem to noodle it out from what I've seen.
- In the CLR GC we don't require the base object pointer to be kept alive for a derived managed pointer (interior pointer) but in your design there is  the requirement to maintain a base, derived pairing.  (If I remember right this is a Java requirement) Is this a hard requirement?  Or is there the potential for other collectors to deal just with managed pointers

Thanks,

-R

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Philip Reames
Sent: Thursday, December 4, 2014 5:50 PM
To: LLVM Developers Mailing List
Cc: [hidden email]; [hidden email]
Subject: [LLVMdev] Future plans for GC in LLVM

Now that the statepoint changes have landed, I wanted to start a discussion about what's next for GC support in LLVM.  I'm going to sketch out a strawman proposal, but I'm not set on any of this.  I mostly just want to draw interested parties out of the woodwork.  :)

Overall Direction:
In the short term, my intent is to preserve the functionality of the existing code, but migrate towards a position where the gcroot specific pieces are optional and well separated.  I also plan to start updating the documentation to reflect a separation between the general support for garbage collection (function attributes, identifying references, load and store barrier lowering, generating stack maps) and the implementation choices (gcroot & it's lowering vs statepoints & addr spaces for identifying references).

Longer term, I plan to *EVENTUALLY DELETE* the existing gcroot lowering code and in tree GCStrategies unless an interesting party speaks up.  I have no problem with retaining some of the existing pieces for legacy support or helping users to migrate, but as of right now, I don't know of any such active users.  The only exception to this might be the shadow stack GC.  Eventually in this context is at least six months from now, but likely less than 18 months.  Hopefully, that's vague enough.  :)

HELP - If anyone knows which Ocaml implementation and which Erlang implementation triggered the in tree GC strategies, please let me know!


Near Term Changes:
- Migrate ownership of GCStrategy objects from GCModuleInfo to
LLVMContext.  In theory, this looses the ability for two different
Modules to have the same collector with different state, but I know of
no use case for this.
- Modify the primary Function::getGC/setGC interface to return a
reference the GCStrategy object, not a string.  I will provide a
Function::setGCString and getGCString.
- Extend the GCStrategy class to include a notion of which compilation
strategy is being used.  The two choices right now will be Legacy and
Statepoint.  (Longer term, this will likely become a more fine grained
choice.)
- Separate GCStategy and related pieces from the
GCFunctionInfo/GCModuleInfo/GCMetadataPrinter lowering code.  At first,
this will simply mean clarifying documentation and rearranging code a bit.
- Document/clarify the callbacks used to customize the lowering. Decide
which of these make sense to preserve and document.

(Lest anyone get the wrong idea, the above changes are intended to be
minor cleanup.  I'm not looking to do anything controversial yet.)

Questions:
- Is proving the ability to generate a custom binary stack map format a
valuable feature?  Adapting the new statepoint infrastructure to work
with the existing GCMetadataPrinter classes wouldn't be particularly hard.
- Are there any GCs out there that need gcroot's single stack slot per
value implementation?   By default, statepoints may generate a different
stackmap for every safepoint in a function.
- Is using gcroot and allocas to mark pointers as garbage collected
references valuable?  (As opposed to using address spaces on the SSA
values themselves?)  Long term, should we retain the gcroot marker
intrinsics at all?


Philip

Appendix: The Current Implementations Key Classes:

GCStrategy - Provides a configurable description of the collector. The
strategy can also override parts of the default GC root lowering
strategy.  The concept of such a collector description is very valuable,
but the current implementation could use some cleanup.  In particular,
the custom lowering hooks are a bit of a mess.

GCMetadataPrinter - Provides a means to dump a custom binary format
describing each functions safepoints.  All safepoints in a function must
share a single root Value to stack slot mapping.

GCModuleInfo/GCFunctionInfo - These contain the metadata which is saved
to enable GCMetadataPrinter.

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Future plans for GC in LLVM

Ben Karel
In reply to this post by Philip Reames-4
Hello Philip,

I am an active user of LLVM's GC infrastructure. Here are some notes on how I currently use gcroot():

1) I use several IRs. A high-level (CPS/SSA hybrid) IR is the target of inlining and other such high-level optimization; most allocation is implicit in the high level IR. Closure conversion/lambda-lifting produces a variant of that IR, extended with explicit allocation constructs. A separate pass introduces the allocations themselves.
2) Next, a dataflow-based root insertion pass inserts the minimal(ish) set of roots and root reloads to preserve the invariant that every live GCable pointer is in a root at each potential GC point. The root insertion pass also computes liveness and uses it to minimize the set of roots via slot coloring. The result of root insertion is (very close to) LLVM IR.
3) Potential GC points are determined inter-procedurally, and call sites which are statically known to (transitively) not GC do not break GCable pointer live ranges.
4) I use a lightly-modified variant of the OCaml plugin's stackmap format.
5) I don't currently use load or store barriers, but do eventually plan to.
6) I do support GCing through global variables.
7) I wrote a custom LLVM pass to verify that values loaded from GC roots aren't used across GC points.

When I first started using gcroot(), I used the naïve approach of reloading roots after every call, and encountered significant overhead. The optimizations sketched above significantly reduced that overhead.  Unfortunately, I don't know of any meaningful language-independent benchmark suites for measuring that sort of overhead, and of course the overhead on any given program will strongly depend on details of how that program is written...

Also, FWIW when I first started, I got up and running with the shadow stack, just as Gordon described, before transitioning to the "real" infrastructure. As long as it doesn't carry a significant burden on the implementation side, I think it's worth having, because it significantly improves the learning curve for new users.

OK, direct answers to some of your questions:

* I think custom stack map formats have small but non-zero value. There have been a few papers (most in the early 90's, I think) which showed that stack maps can make up a non-trivial fraction of a binary's size, and thus are increasingly desirable to optimize as programs grow larger. My verdict: if support for custom formats are ever actively impeding forward progress, toss 'em; otherwise, there should (eventually) be a more detailed look at the costs and benefits.

* As above, I think the primary benefit of one-stackmap-per-safepoint is saving space at the cost of time (and root traffic). AFAIK the primary virtue of the one-stack-slot-per-gcroot() implementation is implementation simplicity, nothing more.

* The ultimate strength & weakness of gcroot() is that basically everything is left to the frontend. There's very little unavoidable overhead imposed on a frontend that is willing to go the distance to generate non-naïve code -- any knowledge that the frontend has can be used to generate better code. Unfortunately, this also places a rather heavy burden on the frontend to generate valid & efficient code.


Since I haven't had the chance to look beyond mailing list & blog posts on statepoints, I can't comment much on their tradeoffs vs gcroots. The high-level impression I get is that statepoints will allow a less-sophisticated frontend to get better results than naïve usage of gcroot(). It also looks like statepoints will do a better job of steering frontends away from the pitfalls of using GC-invalidated pointer values.  I have no idea whether there will be any codegen benefit for more sophisticated frontends, but I'll be happy to port my implementation to statepoints once they've gotten a chance to settle down somewhat, and provide more detailed feedback to help determine what to eventually do with gcroot(). I can currently only do ad-hoc benchmarking, but hopefully by the time gcroot() is on the chopping block, I'll have a more extensive automated performance regression test suite ;-)


One quick question: I think I understand why address spaces are needed to distinguish GCable pointers for late safepoint placement, but are they also needed if the frontend is inserting all safepoints itself?


Finally, thank you for taking on the burden of improving LLVM's GC functionality! 


On Thu, Dec 4, 2014 at 8:50 PM, Philip Reames <[hidden email]> wrote:
Now that the statepoint changes have landed, I wanted to start a discussion about what's next for GC support in LLVM.  I'm going to sketch out a strawman proposal, but I'm not set on any of this.  I mostly just want to draw interested parties out of the woodwork.  :)

Overall Direction:
In the short term, my intent is to preserve the functionality of the existing code, but migrate towards a position where the gcroot specific pieces are optional and well separated.  I also plan to start updating the documentation to reflect a separation between the general support for garbage collection (function attributes, identifying references, load and store barrier lowering, generating stack maps) and the implementation choices (gcroot & it's lowering vs statepoints & addr spaces for identifying references).

Longer term, I plan to *EVENTUALLY DELETE* the existing gcroot lowering code and in tree GCStrategies unless an interesting party speaks up.  I have no problem with retaining some of the existing pieces for legacy support or helping users to migrate, but as of right now, I don't know of any such active users.  The only exception to this might be the shadow stack GC.  Eventually in this context is at least six months from now, but likely less than 18 months.  Hopefully, that's vague enough.  :)

HELP - If anyone knows which Ocaml implementation and which Erlang implementation triggered the in tree GC strategies, please let me know!


Near Term Changes:
- Migrate ownership of GCStrategy objects from GCModuleInfo to LLVMContext.  In theory, this looses the ability for two different Modules to have the same collector with different state, but I know of no use case for this.
- Modify the primary Function::getGC/setGC interface to return a reference the GCStrategy object, not a string.  I will provide a Function::setGCString and getGCString.
- Extend the GCStrategy class to include a notion of which compilation strategy is being used.  The two choices right now will be Legacy and Statepoint.  (Longer term, this will likely become a more fine grained choice.)
- Separate GCStategy and related pieces from the GCFunctionInfo/GCModuleInfo/GCMetadataPrinter lowering code.  At first, this will simply mean clarifying documentation and rearranging code a bit.
- Document/clarify the callbacks used to customize the lowering. Decide which of these make sense to preserve and document.

(Lest anyone get the wrong idea, the above changes are intended to be minor cleanup.  I'm not looking to do anything controversial yet.)

Questions:
- Is proving the ability to generate a custom binary stack map format a valuable feature?  Adapting the new statepoint infrastructure to work with the existing GCMetadataPrinter classes wouldn't be particularly hard.
- Are there any GCs out there that need gcroot's single stack slot per value implementation?   By default, statepoints may generate a different stackmap for every safepoint in a function.
- Is using gcroot and allocas to mark pointers as garbage collected references valuable?  (As opposed to using address spaces on the SSA values themselves?)  Long term, should we retain the gcroot marker intrinsics at all?


Philip

Appendix: The Current Implementations Key Classes:

GCStrategy - Provides a configurable description of the collector. The strategy can also override parts of the default GC root lowering strategy.  The concept of such a collector description is very valuable, but the current implementation could use some cleanup.  In particular, the custom lowering hooks are a bit of a mess.

GCMetadataPrinter - Provides a means to dump a custom binary format describing each functions safepoints.  All safepoints in a function must share a single root Value to stack slot mapping.

GCModuleInfo/GCFunctionInfo - These contain the metadata which is saved to enable GCMetadataPrinter.

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Future plans for GC in LLVM

Sanjoy Das
In reply to this post by Russell Hadley
> - From the documentation it looks like you're using the patchpoint
>   stackmap format
>   (http://llvm.org/docs/StackMaps.html#stackmap-format).  In that
>   format you can describe register locations - but from the overview
>   (http://llvm.org/docs/Statepoints.html#overview) it implies that all
>   gc pointers are spilled to the stack.  Is the spilling to memory
>   required?  Or is the plan to allow gc pointers to reside in register
>   as well.  (I'm hoping that a store/load at safepoinsts won't be
>   required and that they can stack register resident)

We're currently spilling to stack to keep the implementation simple.
Ideally we should be able to lower the complete gc.statepoint
construct to a no-op; and have the GC deal with whatever decision the
register allocator made.

> - I'm still fuzzy how code motion is blocked from moving SSA uses past
>   the safepoint once they've been inserted?  I'm likely just missing
>   some invariant in LLVM or the design since I can't seem to noodle it
>   out from what I've seen.

The representation only prevents "observable" uses of the GC pointers
from being moved across safepoints.  The semantics of gc.statepoint is
not that all uses of `%ptr' automatically become uses of the latest,
most relocated value of the object `%ptr' points to; but that the
gc.statepoint *explicitly* returns a `%ptr.reloc' that you're supposed
to use instead of `%ptr' once you've dynamically passed the
gc.statepoint.  Ensuring this can involve inserting phi nodes.

So, the two following pieces of code are semantically equivalent (in
pseudo-llvm):

  %cmp = (%ptr == null)
  tok = statepoint(relocate %ptr)
  %ptr.reloc = relocate(tok, %ptr)

Vs.

  tok = statepoint(relocate %ptr)
  %ptr.reloc = relocate(tok, %ptr)
  %cmp = (%ptr == null)

In both the code segments, `%cmp' holds true if the *unrelocated* %ptr
is null.  In both the code segments, nothing looks at where %ptr was
relocated to.

Since gc.statepoint is specified to possibly have arbitrary
side-effects and can read/write arbitrary memory, the following two
are *not* equivalent:

  %cmp = (%ptr == null)
  if (%cmp) *global = 42;
  tok = statepoint(relocate %ptr)
  %ptr.reloc = relocate(tok, %ptr)

Vs.

  tok = statepoint(relocate %ptr)
  %ptr.reloc = relocate(tok, %ptr)
  %cmp = (%ptr == null)
  if (%cmp) *global = 42;

but the second one is equivalent to

  %cmp = (%ptr == null)
  tok = statepoint(relocate %ptr)
  %ptr.reloc = relocate(tok, %ptr)
  if (%cmp) *global = 42;

> - In the CLR GC we don't require the base object pointer to be kept
>   alive for a derived managed pointer (interior pointer) but in your
>   design there is the requirement to maintain a base, derived pairing.
>   (If I remember right this is a Java requirement) Is this a hard
>   requirement?  Or is there the potential for other collectors to deal
>   just with managed pointers

I'm not familiar with the CLR GC, won't you need base pointers to
be able to relocate stack roots?  In any case, if you don't need
derived pointers, you can just have the identity map as the
base-derived relationship (i.e. every pointer is a base pointer for
itself).

-- Sanjoy
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Future plans for GC in LLVM

Philip Reames-4
In reply to this post by Russell Hadley

On 12/08/2014 04:02 PM, Russell Hadley wrote:
> Hi,
>
> This looks great.  I'm excited to see this come in and the direction looks right to me.  I've started to look at how to use LLVM to target the MS CLR GC (now being open sourced) and the statepoints are much closer to the current implementation so that could make my life easier.  On your questions, add my vote to the ability to implement a custom binary format for safepoints since it means I can just target the CLRs format.
It seems like there's a strong desire to preserve the custom binary
format mechanism.  I wasn't really expecting that, but I see no real
downsides other than some minor code complexity.

>   I've been trying to keep up with your progress but since I'm coming out of lurk mode I have a few questions:
>
> - From the documentation it looks like you're using the patchpoint stackmap format (http://llvm.org/docs/StackMaps.html#stackmap-format).  In that format you can describe register locations - but from the overview (http://llvm.org/docs/Statepoints.html#overview) it implies that all gc pointers are spilled to the stack.  Is the spilling to memory required?  Or is the plan to allow gc pointers to reside in register as well.  (I'm hoping that a store/load at safepoinsts won't be required and that they can stack register resident)
At the moment, we will eagerly spill and the stack map will only contain
stack slots.  My hope is in the not too distant future to extend the
backend infrastructure to allow accurate reporting of gc pointers in
registers.  The format specification already supports this, we're just
not able to actually exploit that in the backend yet.
> - I'm still fuzzy how code motion is blocked from moving SSA uses past the safepoint once they've been inserted?   I'm likely just missing some invariant in LLVM or the design since I can't seem to noodle it out from what I've seen.
I think Sanjoy's response did a pretty good job on this one.  If it's
still unclear, let me know.
> - In the CLR GC we don't require the base object pointer to be kept alive for a derived managed pointer (interior pointer) but in your design there is  the requirement to maintain a base, derived pairing.  (If I remember right this is a Java requirement) Is this a hard requirement?  Or is there the potential for other collectors to deal just with managed pointers
I'm a little unclear on what you're trying to ask here.  A pointer
associated with an object (say, the address of a field) must keep the
object alive; doing otherwise would create use-after-free errors.  I'm
guessing that you simply don't keep interior pointers live across a
safepoint?  That's fine and everything should work normally.

If - at the safepoint - all of your pointers are base pointers, then you
can simply list it for both the base and derived fields in the
gc.relocate.  This mechanism definitely works; it's a pretty common case
for any compiled code.

The other case you might be referring to is so-called 'contained
objects'.  (i.e. one gc managed object embedded within another, but
whose lifetimes are distinct)  This is a more complicated case, so
unless this is actually what you're getting at, I'm going to avoid
explaining all the complexities.  :)

>
> Thanks,
>
> -R
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Philip Reames
> Sent: Thursday, December 4, 2014 5:50 PM
> To: LLVM Developers Mailing List
> Cc: [hidden email]; [hidden email]
> Subject: [LLVMdev] Future plans for GC in LLVM
>
> Now that the statepoint changes have landed, I wanted to start a discussion about what's next for GC support in LLVM.  I'm going to sketch out a strawman proposal, but I'm not set on any of this.  I mostly just want to draw interested parties out of the woodwork.  :)
>
> Overall Direction:
> In the short term, my intent is to preserve the functionality of the existing code, but migrate towards a position where the gcroot specific pieces are optional and well separated.  I also plan to start updating the documentation to reflect a separation between the general support for garbage collection (function attributes, identifying references, load and store barrier lowering, generating stack maps) and the implementation choices (gcroot & it's lowering vs statepoints & addr spaces for identifying references).
>
> Longer term, I plan to *EVENTUALLY DELETE* the existing gcroot lowering code and in tree GCStrategies unless an interesting party speaks up.  I have no problem with retaining some of the existing pieces for legacy support or helping users to migrate, but as of right now, I don't know of any such active users.  The only exception to this might be the shadow stack GC.  Eventually in this context is at least six months from now, but likely less than 18 months.  Hopefully, that's vague enough.  :)
>
> HELP - If anyone knows which Ocaml implementation and which Erlang implementation triggered the in tree GC strategies, please let me know!
>
>
> Near Term Changes:
> - Migrate ownership of GCStrategy objects from GCModuleInfo to
> LLVMContext.  In theory, this looses the ability for two different
> Modules to have the same collector with different state, but I know of
> no use case for this.
> - Modify the primary Function::getGC/setGC interface to return a
> reference the GCStrategy object, not a string.  I will provide a
> Function::setGCString and getGCString.
> - Extend the GCStrategy class to include a notion of which compilation
> strategy is being used.  The two choices right now will be Legacy and
> Statepoint.  (Longer term, this will likely become a more fine grained
> choice.)
> - Separate GCStategy and related pieces from the
> GCFunctionInfo/GCModuleInfo/GCMetadataPrinter lowering code.  At first,
> this will simply mean clarifying documentation and rearranging code a bit.
> - Document/clarify the callbacks used to customize the lowering. Decide
> which of these make sense to preserve and document.
>
> (Lest anyone get the wrong idea, the above changes are intended to be
> minor cleanup.  I'm not looking to do anything controversial yet.)
>
> Questions:
> - Is proving the ability to generate a custom binary stack map format a
> valuable feature?  Adapting the new statepoint infrastructure to work
> with the existing GCMetadataPrinter classes wouldn't be particularly hard.
> - Are there any GCs out there that need gcroot's single stack slot per
> value implementation?   By default, statepoints may generate a different
> stackmap for every safepoint in a function.
> - Is using gcroot and allocas to mark pointers as garbage collected
> references valuable?  (As opposed to using address spaces on the SSA
> values themselves?)  Long term, should we retain the gcroot marker
> intrinsics at all?
>
>
> Philip
>
> Appendix: The Current Implementations Key Classes:
>
> GCStrategy - Provides a configurable description of the collector. The
> strategy can also override parts of the default GC root lowering
> strategy.  The concept of such a collector description is very valuable,
> but the current implementation could use some cleanup.  In particular,
> the custom lowering hooks are a bit of a mess.
>
> GCMetadataPrinter - Provides a means to dump a custom binary format
> describing each functions safepoints.  All safepoints in a function must
> share a single root Value to stack slot mapping.
>
> GCModuleInfo/GCFunctionInfo - These contain the metadata which is saved
> to enable GCMetadataPrinter.
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Future plans for GC in LLVM

Philip Reames-4
In reply to this post by Ben Karel

On 12/09/2014 03:12 AM, Ben Karel wrote:
Hello Philip,

I am an active user of LLVM's GC infrastructure. Here are some notes on how I currently use gcroot():
Wow, thank you for sharing!  Your usage is far beyond anything else I've heard of.  Can I ask which project/language this is?  Or is that proprietary? 

1) I use several IRs. A high-level (CPS/SSA hybrid) IR is the target of inlining and other such high-level optimization; most allocation is implicit in the high level IR. Closure conversion/lambda-lifting produces a variant of that IR, extended with explicit allocation constructs. A separate pass introduces the allocations themselves.
2) Next, a dataflow-based root insertion pass inserts the minimal(ish) set of roots and root reloads to preserve the invariant that every live GCable pointer is in a root at each potential GC point. The root insertion pass also computes liveness and uses it to minimize the set of roots via slot coloring. The result of root insertion is (very close to) LLVM IR.
I would be really interested in hearing more about your implementation for this part.  We need to solve a similar problem in the statepoint lowering code.  What we have to date is a simply greedy scheme that works reasonable well in practice, but leaves lots of room for improvement. 
3) Potential GC points are determined inter-procedurally, and call sites which are statically known to (transitively) not GC do not break GCable pointer live ranges.
I was planning something like this for LLVM at some point.  It's relatively low on my priority list since IPO tends to be of minimal interest when JITing which is my primary use case. 
4) I use a lightly-modified variant of the OCaml plugin's stackmap format.
5) I don't currently use load or store barriers, but do eventually plan to.
6) I do support GCing through global variables.
7) I wrote a custom LLVM pass to verify that values loaded from GC roots aren't used across GC points.
Cool.  We have a similar one for statepoints.  (If you can share, getting both into the common tree would be nice.)

When I first started using gcroot(), I used the naïve approach of reloading roots after every call, and encountered significant overhead. The optimizations sketched above significantly reduced that overhead.  Unfortunately, I don't know of any meaningful language-independent benchmark suites for measuring that sort of overhead, and of course the overhead on any given program will strongly depend on details of how that program is written...

Also, FWIW when I first started, I got up and running with the shadow stack, just as Gordon described, before transitioning to the "real" infrastructure. As long as it doesn't carry a significant burden on the implementation side, I think it's worth having, because it significantly improves the learning curve for new users.
I'm planning on retaining the shadow stack mechanism. 

OK, direct answers to some of your questions:

* I think custom stack map formats have small but non-zero value. There have been a few papers (most in the early 90's, I think) which showed that stack maps can make up a non-trivial fraction of a binary's size, and thus are increasingly desirable to optimize as programs grow larger. My verdict: if support for custom formats are ever actively impeding forward progress, toss 'em; otherwise, there should (eventually) be a more detailed look at the costs and benefits.
My preferred usage model would be: LLVM generates standard format, runtime parses and saves in custom binary format.

Having said that, retaining the capability doesn't seem to involve too much complexity.  I see no reason to kill it since multiple folks seem to want it. 

* As above, I think the primary benefit of one-stackmap-per-safepoint is saving space at the cost of time (and root traffic). AFAIK the primary virtue of the one-stack-slot-per-gcroot() implementation is implementation simplicity, nothing more.

* The ultimate strength & weakness of gcroot() is that basically everything is left to the frontend. There's very little unavoidable overhead imposed on a frontend that is willing to go the distance to generate non-naïve code -- any knowledge that the frontend has can be used to generate better code. Unfortunately, this also places a rather heavy burden on the frontend to generate valid & efficient code.


Since I haven't had the chance to look beyond mailing list & blog posts on statepoints, I can't comment much on their tradeoffs vs gcroots. The high-level impression I get is that statepoints will allow a less-sophisticated frontend to get better results than naïve usage of gcroot(). It also looks like statepoints will do a better job of steering frontends away from the pitfalls of using GC-invalidated pointer values.  I have no idea whether there will be any codegen benefit for more sophisticated frontends, but I'll be happy to port my implementation to statepoints once they've gotten a chance to settle down somewhat, and provide more detailed feedback to help determine what to eventually do with gcroot(). I can currently only do ad-hoc benchmarking, but hopefully by the time gcroot() is on the chopping block, I'll have a more extensive automated performance regression test suite ;-)
I'll be very interested in your results.  I'm quite sure you'll find stability and performance bugs.  The existing code 'works' but has really only been hammered in one particular use case (source language, virtual machine).  Every new consumer will help to make the code more robust.  Let me know when you're ready to start prototyping this.  I'm happy to answer questions and help guide you around any bugs you might find. 

I suspect from what you've said above that we might want to extend the mechanism slightly to allow you to retain your preassigned stack slots.  You may be getting better spilling code than the current implementation would give you by default.  This seems like it might be a generally useful mechanism.  We could either use a call attribute on the gc.relocate, a bit of metadata, or possible an optional third argument that specifies an 'abstract slot'.  It would depend on your initial results and how much we've managed to improve the lowering code by then.  :)


One quick question: I think I understand why address spaces are needed to distinguish GCable pointers for late safepoint placement, but are they also needed if the frontend is inserting all safepoints itself?
Nope.  They might allow some additional sanity checks, but nothing that is currently checked in relies on address spaces in any way.  My plan is to make the pointer distinction mechanism (addrspace, gcroot, vs custom) a property of the GCStrategy with easily control extension points. 


Finally, thank you for taking on the burden of improving LLVM's GC functionality!
Thank you for sharing your experience!


On Thu, Dec 4, 2014 at 8:50 PM, Philip Reames <[hidden email]> wrote:
Now that the statepoint changes have landed, I wanted to start a discussion about what's next for GC support in LLVM.  I'm going to sketch out a strawman proposal, but I'm not set on any of this.  I mostly just want to draw interested parties out of the woodwork.  :)

Overall Direction:
In the short term, my intent is to preserve the functionality of the existing code, but migrate towards a position where the gcroot specific pieces are optional and well separated.  I also plan to start updating the documentation to reflect a separation between the general support for garbage collection (function attributes, identifying references, load and store barrier lowering, generating stack maps) and the implementation choices (gcroot & it's lowering vs statepoints & addr spaces for identifying references).

Longer term, I plan to *EVENTUALLY DELETE* the existing gcroot lowering code and in tree GCStrategies unless an interesting party speaks up.  I have no problem with retaining some of the existing pieces for legacy support or helping users to migrate, but as of right now, I don't know of any such active users.  The only exception to this might be the shadow stack GC.  Eventually in this context is at least six months from now, but likely less than 18 months.  Hopefully, that's vague enough.  :)

HELP - If anyone knows which Ocaml implementation and which Erlang implementation triggered the in tree GC strategies, please let me know!


Near Term Changes:
- Migrate ownership of GCStrategy objects from GCModuleInfo to LLVMContext.  In theory, this looses the ability for two different Modules to have the same collector with different state, but I know of no use case for this.
- Modify the primary Function::getGC/setGC interface to return a reference the GCStrategy object, not a string.  I will provide a Function::setGCString and getGCString.
- Extend the GCStrategy class to include a notion of which compilation strategy is being used.  The two choices right now will be Legacy and Statepoint.  (Longer term, this will likely become a more fine grained choice.)
- Separate GCStategy and related pieces from the GCFunctionInfo/GCModuleInfo/GCMetadataPrinter lowering code.  At first, this will simply mean clarifying documentation and rearranging code a bit.
- Document/clarify the callbacks used to customize the lowering. Decide which of these make sense to preserve and document.

(Lest anyone get the wrong idea, the above changes are intended to be minor cleanup.  I'm not looking to do anything controversial yet.)

Questions:
- Is proving the ability to generate a custom binary stack map format a valuable feature?  Adapting the new statepoint infrastructure to work with the existing GCMetadataPrinter classes wouldn't be particularly hard.
- Are there any GCs out there that need gcroot's single stack slot per value implementation?   By default, statepoints may generate a different stackmap for every safepoint in a function.
- Is using gcroot and allocas to mark pointers as garbage collected references valuable?  (As opposed to using address spaces on the SSA values themselves?)  Long term, should we retain the gcroot marker intrinsics at all?


Philip

Appendix: The Current Implementations Key Classes:

GCStrategy - Provides a configurable description of the collector. The strategy can also override parts of the default GC root lowering strategy.  The concept of such a collector description is very valuable, but the current implementation could use some cleanup.  In particular, the custom lowering hooks are a bit of a mess.

GCMetadataPrinter - Provides a means to dump a custom binary format describing each functions safepoints.  All safepoints in a function must share a single root Value to stack slot mapping.

GCModuleInfo/GCFunctionInfo - These contain the metadata which is saved to enable GCMetadataPrinter.

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Future plans for GC in LLVM

Ben Karel


On Tue, Dec 9, 2014 at 7:27 PM, Philip Reames <[hidden email]> wrote:

On 12/09/2014 03:12 AM, Ben Karel wrote:
Hello Philip,

I am an active user of LLVM's GC infrastructure. Here are some notes on how I currently use gcroot():
Wow, thank you for sharing!  Your usage is far beyond anything else I've heard of.  Can I ask which project/language this is?  Or is that proprietary? 


The interesting bits are in Haskell, so you probably won't be able to reuse much code apart from, maybe, the custom LLVM passes.

1) I use several IRs. A high-level (CPS/SSA hybrid) IR is the target of inlining and other such high-level optimization; most allocation is implicit in the high level IR. Closure conversion/lambda-lifting produces a variant of that IR, extended with explicit allocation constructs. A separate pass introduces the allocations themselves.
2) Next, a dataflow-based root insertion pass inserts the minimal(ish) set of roots and root reloads to preserve the invariant that every live GCable pointer is in a root at each potential GC point. The root insertion pass also computes liveness and uses it to minimize the set of roots via slot coloring. The result of root insertion is (very close to) LLVM IR.
I would be really interested in hearing more about your implementation for this part.  We need to solve a similar problem in the statepoint lowering code.  What we have to date is a simply greedy scheme that works reasonable well in practice, but leaves lots of room for improvement. 

Ack, I was tired when I wrote that -- ignore the "via slot coloring". I'm also doing greedy reuse of dead slots with no backtracking or anything. The implementation is rather hairy, in part because the underlying problem goes against the grain of the dataflow library I'm using (Hoopl).
3) Potential GC points are determined inter-procedurally, and call sites which are statically known to (transitively) not GC do not break GCable pointer live ranges.
I was planning something like this for LLVM at some point.  It's relatively low on my priority list since IPO tends to be of minimal interest when JITing which is my primary use case.

Ah, that makes sense. I think it's a much, much bigger deal for a static compiler. 

4) I use a lightly-modified variant of the OCaml plugin's stackmap format.
5) I don't currently use load or store barriers, but do eventually plan to.
6) I do support GCing through global variables.
7) I wrote a custom LLVM pass to verify that values loaded from GC roots aren't used across GC points.
Cool.  We have a similar one for statepoints.  (If you can share, getting both into the common tree would be nice.)

My code's at https://code.google.com/p/foster/source/browse/compiler/llvm/passes/GCRootSafetyChecker.cpp -- I'm not sure how useful it would be in the common tree, though, because it relies on the frontend adding metadata reflecting interprocedural may-GC information.
When I first started using gcroot(), I used the naïve approach of reloading roots after every call, and encountered significant overhead. The optimizations sketched above significantly reduced that overhead.  Unfortunately, I don't know of any meaningful language-independent benchmark suites for measuring that sort of overhead, and of course the overhead on any given program will strongly depend on details of how that program is written...

Also, FWIW when I first started, I got up and running with the shadow stack, just as Gordon described, before transitioning to the "real" infrastructure. As long as it doesn't carry a significant burden on the implementation side, I think it's worth having, because it significantly improves the learning curve for new users.
I'm planning on retaining the shadow stack mechanism. 

OK, direct answers to some of your questions:

* I think custom stack map formats have small but non-zero value. There have been a few papers (most in the early 90's, I think) which showed that stack maps can make up a non-trivial fraction of a binary's size, and thus are increasingly desirable to optimize as programs grow larger. My verdict: if support for custom formats are ever actively impeding forward progress, toss 'em; otherwise, there should (eventually) be a more detailed look at the costs and benefits.
My preferred usage model would be: LLVM generates standard format, runtime parses and saves in custom binary format.

Having said that, retaining the capability doesn't seem to involve too much complexity.  I see no reason to kill it since multiple folks seem to want it.

Ah, yeah -- again, this is more important for a static compiler, since the stackmaps are embedded into the compiled binary.
 
* As above, I think the primary benefit of one-stackmap-per-safepoint is saving space at the cost of time (and root traffic). AFAIK the primary virtue of the one-stack-slot-per-gcroot() implementation is implementation simplicity, nothing more.

* The ultimate strength & weakness of gcroot() is that basically everything is left to the frontend. There's very little unavoidable overhead imposed on a frontend that is willing to go the distance to generate non-naïve code -- any knowledge that the frontend has can be used to generate better code. Unfortunately, this also places a rather heavy burden on the frontend to generate valid & efficient code.


Since I haven't had the chance to look beyond mailing list & blog posts on statepoints, I can't comment much on their tradeoffs vs gcroots. The high-level impression I get is that statepoints will allow a less-sophisticated frontend to get better results than naïve usage of gcroot(). It also looks like statepoints will do a better job of steering frontends away from the pitfalls of using GC-invalidated pointer values.  I have no idea whether there will be any codegen benefit for more sophisticated frontends, but I'll be happy to port my implementation to statepoints once they've gotten a chance to settle down somewhat, and provide more detailed feedback to help determine what to eventually do with gcroot(). I can currently only do ad-hoc benchmarking, but hopefully by the time gcroot() is on the chopping block, I'll have a more extensive automated performance regression test suite ;-)
I'll be very interested in your results.  I'm quite sure you'll find stability and performance bugs.  The existing code 'works' but has really only been hammered in one particular use case (source language, virtual machine).  Every new consumer will help to make the code more robust.  Let me know when you're ready to start prototyping this.  I'm happy to answer questions and help guide you around any bugs you might find. 

I suspect from what you've said above that we might want to extend the mechanism slightly to allow you to retain your preassigned stack slots.  You may be getting better spilling code than the current implementation would give you by default.  This seems like it might be a generally useful mechanism.  We could either use a call attribute on the gc.relocate, a bit of metadata, or possible an optional third argument that specifies an 'abstract slot'.  It would depend on your initial results and how much we've managed to improve the lowering code by then.  :)
 
Will have to measure to find out :-)   No rush, though, I probably won't be able to get around to it for several months.

One quick question: I think I understand why address spaces are needed to distinguish GCable pointers for late safepoint placement, but are they also needed if the frontend is inserting all safepoints itself?
Nope.  They might allow some additional sanity checks, but nothing that is currently checked in relies on address spaces in any way.  My plan is to make the pointer distinction mechanism (addrspace, gcroot, vs custom) a property of the GCStrategy with easily control extension points. 

Sounds like a solid plan! 

Finally, thank you for taking on the burden of improving LLVM's GC functionality!
Thank you for sharing your experience!



On Thu, Dec 4, 2014 at 8:50 PM, Philip Reames <[hidden email]> wrote:
Now that the statepoint changes have landed, I wanted to start a discussion about what's next for GC support in LLVM.  I'm going to sketch out a strawman proposal, but I'm not set on any of this.  I mostly just want to draw interested parties out of the woodwork.  :)

Overall Direction:
In the short term, my intent is to preserve the functionality of the existing code, but migrate towards a position where the gcroot specific pieces are optional and well separated.  I also plan to start updating the documentation to reflect a separation between the general support for garbage collection (function attributes, identifying references, load and store barrier lowering, generating stack maps) and the implementation choices (gcroot & it's lowering vs statepoints & addr spaces for identifying references).

Longer term, I plan to *EVENTUALLY DELETE* the existing gcroot lowering code and in tree GCStrategies unless an interesting party speaks up.  I have no problem with retaining some of the existing pieces for legacy support or helping users to migrate, but as of right now, I don't know of any such active users.  The only exception to this might be the shadow stack GC.  Eventually in this context is at least six months from now, but likely less than 18 months.  Hopefully, that's vague enough.  :)

HELP - If anyone knows which Ocaml implementation and which Erlang implementation triggered the in tree GC strategies, please let me know!


Near Term Changes:
- Migrate ownership of GCStrategy objects from GCModuleInfo to LLVMContext.  In theory, this looses the ability for two different Modules to have the same collector with different state, but I know of no use case for this.
- Modify the primary Function::getGC/setGC interface to return a reference the GCStrategy object, not a string.  I will provide a Function::setGCString and getGCString.
- Extend the GCStrategy class to include a notion of which compilation strategy is being used.  The two choices right now will be Legacy and Statepoint.  (Longer term, this will likely become a more fine grained choice.)
- Separate GCStategy and related pieces from the GCFunctionInfo/GCModuleInfo/GCMetadataPrinter lowering code.  At first, this will simply mean clarifying documentation and rearranging code a bit.
- Document/clarify the callbacks used to customize the lowering. Decide which of these make sense to preserve and document.

(Lest anyone get the wrong idea, the above changes are intended to be minor cleanup.  I'm not looking to do anything controversial yet.)

Questions:
- Is proving the ability to generate a custom binary stack map format a valuable feature?  Adapting the new statepoint infrastructure to work with the existing GCMetadataPrinter classes wouldn't be particularly hard.
- Are there any GCs out there that need gcroot's single stack slot per value implementation?   By default, statepoints may generate a different stackmap for every safepoint in a function.
- Is using gcroot and allocas to mark pointers as garbage collected references valuable?  (As opposed to using address spaces on the SSA values themselves?)  Long term, should we retain the gcroot marker intrinsics at all?


Philip

Appendix: The Current Implementations Key Classes:

GCStrategy - Provides a configurable description of the collector. The strategy can also override parts of the default GC root lowering strategy.  The concept of such a collector description is very valuable, but the current implementation could use some cleanup.  In particular, the custom lowering hooks are a bit of a mess.

GCMetadataPrinter - Provides a means to dump a custom binary format describing each functions safepoints.  All safepoints in a function must share a single root Value to stack slot mapping.

GCModuleInfo/GCFunctionInfo - These contain the metadata which is saved to enable GCMetadataPrinter.

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev




_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Future plans for GC in LLVM

Manuel Jacob
In reply to this post by Philip Reames-4
Hi,

On 2014-12-05 02:50, Philip Reames wrote:
> Questions:
> - Is proving the ability to generate a custom binary stack map format
> a valuable feature?  Adapting the new statepoint infrastructure to
> work with the existing GCMetadataPrinter classes wouldn't be
> particularly hard.

I think custom binary stack map formats are valuable (smaller and easier
to parse).  I'm also willing to implement this after the code around
GCStrategy is a bit more stable.

> - Are there any GCs out there that need gcroot's single stack slot per
> value implementation?   By default, statepoints may generate a
> different stackmap for every safepoint in a function.

Probably not, since the GC runtime looks only at one safepoint per stack
frame and doesn't care about the other safepoints in the function.  For
saving space it's probably a better idea to have a custom stack map
format with some form of compression (e.g. a bitmap).

> - Is using gcroot and allocas to mark pointers as garbage collected
> references valuable?  (As opposed to using address spaces on the SSA
> values themselves?)  Long term, should we retain the gcroot marker
> intrinsics at all?

According to [1], the gcroot intrinsic was made for frontends which rely
on allocas and mem2reg to bring their code in a SSA form.  In this case
the code emitted by the frontend has the following properties:

1) The GC root is loaded from the alloca before each use.
2) The GC root is stored into the alloca after each assignment.
3) No interior pointers are passed across safepoints.

The resulting code should always be correct albeit slow.  It shouldn't
be too hard to add a case to mem2reg which transforms gcroot'ed allocas
into something the late safepoint placement pass understands as being a
GC root.

-Manuel

[1] http://llvm.org/docs/GarbageCollection.html#gcroot
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Future plans for GC in LLVM

Philip Reames-4

On 01/25/2015 06:56 AM, Manuel Jacob wrote:

> Hi,
>
> On 2014-12-05 02:50, Philip Reames wrote:
>> Questions:
>> - Is proving the ability to generate a custom binary stack map format
>> a valuable feature?  Adapting the new statepoint infrastructure to
>> work with the existing GCMetadataPrinter classes wouldn't be
>> particularly hard.
>
> I think custom binary stack map formats are valuable (smaller and
> easier to parse).  I'm also willing to implement this after the code
> around GCStrategy is a bit more stable.
Great to hear.
>
>> - Are there any GCs out there that need gcroot's single stack slot per
>> value implementation?   By default, statepoints may generate a
>> different stackmap for every safepoint in a function.
>
> Probably not, since the GC runtime looks only at one safepoint per
> stack frame and doesn't care about the other safepoints in the
> function.  For saving space it's probably a better idea to have a
> custom stack map format with some form of compression (e.g. a bitmap).
The default lowering for gc.root definitely supports multiple safepoints
per frame.  They're required to have the same *layout* but multiple
safepoints were supported.

p.s. Your use of the phrase "gc runtime" above is mildly ambiguous.
Hopefully I inferred the right meaning.

>
>> - Is using gcroot and allocas to mark pointers as garbage collected
>> references valuable?  (As opposed to using address spaces on the SSA
>> values themselves?)  Long term, should we retain the gcroot marker
>> intrinsics at all?
>
> According to [1], the gcroot intrinsic was made for frontends which
> rely on allocas and mem2reg to bring their code in a SSA form.  In
> this case the code emitted by the frontend has the following properties:
>
> 1) The GC root is loaded from the alloca before each use.
> 2) The GC root is stored into the alloca after each assignment.
> 3) No interior pointers are passed across safepoints.
>
> The resulting code should always be correct albeit slow.  It shouldn't
> be too hard to add a case to mem2reg which transforms gcroot'ed
> allocas into something the late safepoint placement pass understands
> as being a GC root.
This is harder than it might first seem, but might be doable.  I would
strongly suggest that it not be part of mem2reg though.  :)
>
> -Manuel
>
> [1] http://llvm.org/docs/GarbageCollection.html#gcroot

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev