Code for late safepoint placement available

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Code for late safepoint placement available

Philip Reames-4
As I've mentioned on the mailing list a couple of times over the last few months, we've been working on an approach for supporting precise fully relocating garbage collection in LLVM.  I am happy to announce that we now have a version of the code available for public view and discussion. 

https://github.com/AzulSystems/llvm-late-safepoint-placement


Our goal is to eventually see this merged into the LLVM tree.  There's a fair amount of cleanup that needs to happen before that point, but we are actively working towards that eventual goal. 

Please note that there are a couple of known issues with the current version (see the README).  This is best considered a proof of concept implementation and is not yet ready for production use.  We will be addressing the remaining issues over the next few weeks and will be sharing updates as they occur. 

In the meantime, I'd like to get the discussion started on how these changes will eventually land in tree.  Part of the reason for sharing the code in an early state is to be able to build a history of working in the open, and to to able to merge minor fixes into the main LLVM repository before trying to upstream the core changes.  We are aware this is a fairly major change set and are happy to work within the community process in that regard. 

I've included a list of specific questions I know we'd like to get feedback on, but general comments or questions are also very welcome. 

Open Topics:
  • How should we factor the core GC support for review?  Our current intent is to separate logically distinct pieces, and share each layer one at a time.  (e.g. first infrastructure enhancements, then intrinsics and codegen support, then verifiers, then safepoint insertion passes)  Is this the right approach?
  • How configurable does the GC support need to be for inclusion in LLVM?  Currently, we expect the frontend to mark GC pointers using address spaces.  Do we need to support alternate mechanisms?  If so, what interface should this take?
  • How should we approach removing the existing partial support for garbage collection? (gcroot)  Do we want to support both going forward?  Do we need to provide a forward migration path in bitcode?  Given the usage is generally though MCJIT, we would prefer we simply deprecate the existing gcroot support and target it for complete removal a couple of releases down the road..
  • What programmatic interface should we present at the IR level and where should it live?  We're moving towards a CallSite like interface for statepoints, gc_relocates, and gc_results call sites.  Is this the right approach?  If so, should it live in the IR subtree, or Support?  (Note: The current code is only about 40% migrated to the new interface.)
  • To support invokable calls with safepoints, we need to make the statepoint intrinsic invokable.  This is new for intrinsics in LLVM.  Is there any reason that InvokeInst must be a subclass of CallInst? (rather than a view over either calls or invokes like CallSite)  Would changes to support invokable intrinsics be accepted upstream?  Alternate approaches are welcome.  
  • Is the concept of an abstract VM state something LLVM should know about?  If so, how should it be represented?  We're actively exploring this topic, but don't have strong opinions on the topic yet.  
  • Our statepoint shares a lot in the way of implementation and semantics with patchpoint and stackmap.  Is it better to submit new intrinsics, or try to identify a single intrinsic which could represent both?  Our current feeling is to keep them separate semantically, but share implementation where possible.  

Yours,
Philip (& team)

p.s. Sanjoy, one of my co-workers,  will be helping to answer questions as they arise. 

p.p.s. For those wondering why the current gcroot mechanism isn't sufficient, I covered that in a previous blog post:
[1] http://www.philipreames.com/Blog/2014/02/21/why-not-use-gcroot/

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Code for late safepoint placement available

David Chisnall-5
Hi Philip,

The first thing that I notice on looking at the code is the lack of comments.  For example, about the only comment that I see in include /llvm/IR/Statepoint.h is a note telling me that a class is only intended to be used on the stack.  Doxygen comments and, for a feature like this, some high-level overview of how it's intended to be used are essential.  

We currently use address spaces on our architecture to differentiate between traditional pointers and fat pointers (both of which are supported on our system in hardware).  It would be nice, rather than hard-coding an address space, to have a property on DataLayout that would tell you if a specific address space contained GC pointers.  We could then designate an address space that was both GCable and contained fat pointers, and another for non-GCable fat pointers.

The tests don't look to be in a sensible format for LLVM - in particular, LLVM tests should not depend on clang.  It would be better, I think, to commit the IR generated from your Python script than the script itself (although putting it in utils might be helpful for people to regenerate them).  

David

On 4 Jun 2014, at 17:35, Philip Reames <[hidden email]> wrote:

> As I've mentioned on the mailing list a couple of times over the last few months, we've been working on an approach for supporting precise fully relocating garbage collection in LLVM.  I am happy to announce that we now have a version of the code available for public view and discussion.  
>
> https://github.com/AzulSystems/llvm-late-safepoint-placement
>
> Our goal is to eventually see this merged into the LLVM tree.  There's a fair amount of cleanup that needs to happen before that point, but we are actively working towards that eventual goal.  
>
> Please note that there are a couple of known issues with the current version (see the README).  This is best considered a proof of concept implementation and is not yet ready for production use.  We will be addressing the remaining issues over the next few weeks and will be sharing updates as they occur.  
>
> In the meantime, I'd like to get the discussion started on how these changes will eventually land in tree.  Part of the reason for sharing the code in an early state is to be able to build a history of working in the open, and to to able to merge minor fixes into the main LLVM repository before trying to upstream the core changes.  We are aware this is a fairly major change set and are happy to work within the community process in that regard.  
>
> I've included a list of specific questions I know we'd like to get feedback on, but general comments or questions are also very welcome.  
>
> Open Topics:
> • How should we factor the core GC support for review?  Our current intent is to separate logically distinct pieces, and share each layer one at a time.  (e.g. first infrastructure enhancements, then intrinsics and codegen support, then verifiers, then safepoint insertion passes)  Is this the right approach?
> • How configurable does the GC support need to be for inclusion in LLVM?  Currently, we expect the frontend to mark GC pointers using address spaces.  Do we need to support alternate mechanisms?  If so, what interface should this take?
> • How should we approach removing the existing partial support for garbage collection? (gcroot)  Do we want to support both going forward?  Do we need to provide a forward migration path in bitcode?  Given the usage is generally though MCJIT, we would prefer we simply deprecate the existing gcroot support and target it for complete removal a couple of releases down the road..
> • What programmatic interface should we present at the IR level and where should it live?  We're moving towards a CallSite like interface for statepoints, gc_relocates, and gc_results call sites.  Is this the right approach?  If so, should it live in the IR subtree, or Support?  (Note: The current code is only about 40% migrated to the new interface.)
> • To support invokable calls with safepoints, we need to make the statepoint intrinsic invokable.  This is new for intrinsics in LLVM.  Is there any reason that InvokeInst must be a subclass of CallInst? (rather than a view over either calls or invokes like CallSite)  Would changes to support invokable intrinsics be accepted upstream?  Alternate approaches are welcome.  
> • Is the concept of an abstract VM state something LLVM should know about?  If so, how should it be represented?  We're actively exploring this topic, but don't have strong opinions on the topic yet.  
> • Our statepoint shares a lot in the way of implementation and semantics with patchpoint and stackmap.  Is it better to submit new intrinsics, or try to identify a single intrinsic which could represent both?  Our current feeling is to keep them separate semantically, but share implementation where possible.  
>
> Yours,
> Philip (& team)
>
> p.s. Sanjoy, one of my co-workers,  will be helping to answer questions as they arise.  
>
> p.p.s. For those wondering why the current gcroot mechanism isn't sufficient, I covered that in a previous blog post:
> [1] http://www.philipreames.com/Blog/2014/02/21/why-not-use-gcroot/
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Code for late safepoint placement available

Philip Reames-4
Thanks for the comments and for taking a look.

On 06/05/2014 02:19 AM, David Chisnall wrote:
> Hi Philip,
>
> The first thing that I notice on looking at the code is the lack of comments.  For example, about the only comment that I see in include /llvm/IR/Statepoint.h is a note telling me that a class is only intended to be used on the stack.  Doxygen comments and, for a feature like this, some high-level overview of how it's intended to be used are essential.
Agreed.  This is mandatory before upstreaming.  We've gone through a
couple of implementations and before sharing I tried to go through and
rip out the actively misleading comments.  Unfortunately, that didn't
leave much left.  :(
> We currently use address spaces on our architecture to differentiate between traditional pointers and fat pointers (both of which are supported on our system in hardware).  It would be nice, rather than hard-coding an address space, to have a property on DataLayout that would tell you if a specific address space contained GC pointers.  We could then designate an address space that was both GCable and contained fat pointers, and another for non-GCable fat pointers.
Making this configurable is definitely desirable.  I like your idea of
including a list of address spaces in data layout which contain GC
pointers and making that query-able.  That sounds quite reasonable.  
Unless anyone makes another proposal, I'll run with this.

>
> The tests don't look to be in a sensible format for LLVM - in particular, LLVM tests should not depend on clang.  It would be better, I think, to commit the IR generated from your Python script than the script itself (although putting it in utils might be helpful for people to regenerate them).
By the point we upstream this, those particular tests should probably
just be removed.  When we were iterating on possible designs, it was
useful not having the tests in IR form.  At this point, those tests have
lost most of their value.  Creating a set of IR tests which use address
spaces properly and cover all the same corner cases is probably the
right approach.

Philip

>
> On 4 Jun 2014, at 17:35, Philip Reames <[hidden email]> wrote:
>
>> As I've mentioned on the mailing list a couple of times over the last few months, we've been working on an approach for supporting precise fully relocating garbage collection in LLVM.  I am happy to announce that we now have a version of the code available for public view and discussion.
>>
>> https://github.com/AzulSystems/llvm-late-safepoint-placement
>>
>> Our goal is to eventually see this merged into the LLVM tree.  There's a fair amount of cleanup that needs to happen before that point, but we are actively working towards that eventual goal.
>>
>> Please note that there are a couple of known issues with the current version (see the README).  This is best considered a proof of concept implementation and is not yet ready for production use.  We will be addressing the remaining issues over the next few weeks and will be sharing updates as they occur.
>>
>> In the meantime, I'd like to get the discussion started on how these changes will eventually land in tree.  Part of the reason for sharing the code in an early state is to be able to build a history of working in the open, and to to able to merge minor fixes into the main LLVM repository before trying to upstream the core changes.  We are aware this is a fairly major change set and are happy to work within the community process in that regard.
>>
>> I've included a list of specific questions I know we'd like to get feedback on, but general comments or questions are also very welcome.
>>
>> Open Topics:
>> • How should we factor the core GC support for review?  Our current intent is to separate logically distinct pieces, and share each layer one at a time.  (e.g. first infrastructure enhancements, then intrinsics and codegen support, then verifiers, then safepoint insertion passes)  Is this the right approach?
>> • How configurable does the GC support need to be for inclusion in LLVM?  Currently, we expect the frontend to mark GC pointers using address spaces.  Do we need to support alternate mechanisms?  If so, what interface should this take?
>> • How should we approach removing the existing partial support for garbage collection? (gcroot)  Do we want to support both going forward?  Do we need to provide a forward migration path in bitcode?  Given the usage is generally though MCJIT, we would prefer we simply deprecate the existing gcroot support and target it for complete removal a couple of releases down the road..
>> • What programmatic interface should we present at the IR level and where should it live?  We're moving towards a CallSite like interface for statepoints, gc_relocates, and gc_results call sites.  Is this the right approach?  If so, should it live in the IR subtree, or Support?  (Note: The current code is only about 40% migrated to the new interface.)
>> • To support invokable calls with safepoints, we need to make the statepoint intrinsic invokable.  This is new for intrinsics in LLVM.  Is there any reason that InvokeInst must be a subclass of CallInst? (rather than a view over either calls or invokes like CallSite)  Would changes to support invokable intrinsics be accepted upstream?  Alternate approaches are welcome.
>> • Is the concept of an abstract VM state something LLVM should know about?  If so, how should it be represented?  We're actively exploring this topic, but don't have strong opinions on the topic yet.
>> • Our statepoint shares a lot in the way of implementation and semantics with patchpoint and stackmap.  Is it better to submit new intrinsics, or try to identify a single intrinsic which could represent both?  Our current feeling is to keep them separate semantically, but share implementation where possible.
>>
>> Yours,
>> Philip (& team)
>>
>> p.s. Sanjoy, one of my co-workers,  will be helping to answer questions as they arise.
>>
>> p.p.s. For those wondering why the current gcroot mechanism isn't sufficient, I covered that in a previous blog post:
>> [1] http://www.philipreames.com/Blog/2014/02/21/why-not-use-gcroot/
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Code for late safepoint placement available

Talin-3



On Thu, Jun 5, 2014 at 9:45 AM, Philip Reames <[hidden email]> wrote:
Thanks for the comments and for taking a look.


On 06/05/2014 02:19 AM, David Chisnall wrote:
Hi Philip,

The first thing that I notice on looking at the code is the lack of comments.  For example, about the only comment that I see in include /llvm/IR/Statepoint.h is a note telling me that a class is only intended to be used on the stack.  Doxygen comments and, for a feature like this, some high-level overview of how it's intended to be used are essential.
Agreed.  This is mandatory before upstreaming.  We've gone through a couple of implementations and before sharing I tried to go through and rip out the actively misleading comments.  Unfortunately, that didn't leave much left.  :(

We currently use address spaces on our architecture to differentiate between traditional pointers and fat pointers (both of which are supported on our system in hardware).  It would be nice, rather than hard-coding an address space, to have a property on DataLayout that would tell you if a specific address space contained GC pointers.  We could then designate an address space that was both GCable and contained fat pointers, and another for non-GCable fat pointers.
Making this configurable is definitely desirable.  I like your idea of including a list of address spaces in data layout which contain GC pointers and making that query-able.  That sounds quite reasonable.  Unless anyone makes another proposal, I'll run with this.

Does it need to be a list, or could it be a range? It the set of GC address spaces was contiguous over some range of integers, then you'd only need to store a min / max value, which might make the implementation simpler. 



The tests don't look to be in a sensible format for LLVM - in particular, LLVM tests should not depend on clang.  It would be better, I think, to commit the IR generated from your Python script than the script itself (although putting it in utils might be helpful for people to regenerate them).
By the point we upstream this, those particular tests should probably just be removed.  When we were iterating on possible designs, it was useful not having the tests in IR form.  At this point, those tests have lost most of their value.  Creating a set of IR tests which use address spaces properly and cover all the same corner cases is probably the right approach.

Philip


On 4 Jun 2014, at 17:35, Philip Reames <[hidden email]> wrote:

As I've mentioned on the mailing list a couple of times over the last few months, we've been working on an approach for supporting precise fully relocating garbage collection in LLVM.  I am happy to announce that we now have a version of the code available for public view and discussion.

https://github.com/AzulSystems/llvm-late-safepoint-placement

Our goal is to eventually see this merged into the LLVM tree.  There's a fair amount of cleanup that needs to happen before that point, but we are actively working towards that eventual goal.

Please note that there are a couple of known issues with the current version (see the README).  This is best considered a proof of concept implementation and is not yet ready for production use.  We will be addressing the remaining issues over the next few weeks and will be sharing updates as they occur.

In the meantime, I'd like to get the discussion started on how these changes will eventually land in tree.  Part of the reason for sharing the code in an early state is to be able to build a history of working in the open, and to to able to merge minor fixes into the main LLVM repository before trying to upstream the core changes.  We are aware this is a fairly major change set and are happy to work within the community process in that regard.

I've included a list of specific questions I know we'd like to get feedback on, but general comments or questions are also very welcome.

Open Topics:
        • How should we factor the core GC support for review?  Our current intent is to separate logically distinct pieces, and share each layer one at a time.  (e.g. first infrastructure enhancements, then intrinsics and codegen support, then verifiers, then safepoint insertion passes)  Is this the right approach?
        • How configurable does the GC support need to be for inclusion in LLVM?  Currently, we expect the frontend to mark GC pointers using address spaces.  Do we need to support alternate mechanisms?  If so, what interface should this take?
        • How should we approach removing the existing partial support for garbage collection? (gcroot)  Do we want to support both going forward?  Do we need to provide a forward migration path in bitcode?  Given the usage is generally though MCJIT, we would prefer we simply deprecate the existing gcroot support and target it for complete removal a couple of releases down the road..
        • What programmatic interface should we present at the IR level and where should it live?  We're moving towards a CallSite like interface for statepoints, gc_relocates, and gc_results call sites.  Is this the right approach?  If so, should it live in the IR subtree, or Support?  (Note: The current code is only about 40% migrated to the new interface.)
        • To support invokable calls with safepoints, we need to make the statepoint intrinsic invokable.  This is new for intrinsics in LLVM.  Is there any reason that InvokeInst must be a subclass of CallInst? (rather than a view over either calls or invokes like CallSite)  Would changes to support invokable intrinsics be accepted upstream?  Alternate approaches are welcome.
        • Is the concept of an abstract VM state something LLVM should know about?  If so, how should it be represented?  We're actively exploring this topic, but don't have strong opinions on the topic yet.
        • Our statepoint shares a lot in the way of implementation and semantics with patchpoint and stackmap.  Is it better to submit new intrinsics, or try to identify a single intrinsic which could represent both?  Our current feeling is to keep them separate semantically, but share implementation where possible.

Yours,
Philip (& team)

p.s. Sanjoy, one of my co-workers,  will be helping to answer questions as they arise.

p.p.s. For those wondering why the current gcroot mechanism isn't sufficient, I covered that in a previous blog post:
[1] http://www.philipreames.com/Blog/2014/02/21/why-not-use-gcroot/
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



--
-- Talin

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Code for late safepoint placement available

Talin-3
In reply to this post by Philip Reames-4



On Wed, Jun 4, 2014 at 9:35 AM, Philip Reames <[hidden email]> wrote:
As I've mentioned on the mailing list a couple of times over the last few months, we've been working on an approach for supporting precise fully relocating garbage collection in LLVM.  I am happy to announce that we now have a version of the code available for public view and discussion. 

https://github.com/AzulSystems/llvm-late-safepoint-placement


Our goal is to eventually see this merged into the LLVM tree.  There's a fair amount of cleanup that needs to happen before that point, but we are actively working towards that eventual goal. 

Please note that there are a couple of known issues with the current version (see the README).  This is best considered a proof of concept implementation and is not yet ready for production use.  We will be addressing the remaining issues over the next few weeks and will be sharing updates as they occur. 

In the meantime, I'd like to get the discussion started on how these changes will eventually land in tree.  Part of the reason for sharing the code in an early state is to be able to build a history of working in the open, and to to able to merge minor fixes into the main LLVM repository before trying to upstream the core changes.  We are aware this is a fairly major change set and are happy to work within the community process in that regard. 

I've included a list of specific questions I know we'd like to get feedback on, but general comments or questions are also very welcome. 

Open Topics:
  • How should we factor the core GC support for review?  Our current intent is to separate logically distinct pieces, and share each layer one at a time.  (e.g. first infrastructure enhancements, then intrinsics and codegen support, then verifiers, then safepoint insertion passes)  Is this the right approach?
  • How configurable does the GC support need to be for inclusion in LLVM?  Currently, we expect the frontend to mark GC pointers using address spaces.  Do we need to support alternate mechanisms?  If so, what interface should this take?
  • How should we approach removing the existing partial support for garbage collection? (gcroot)  Do we want to support both going forward?  Do we need to provide a forward migration path in bitcode?  Given the usage is generally though MCJIT, we would prefer we simply deprecate the existing gcroot support and target it for complete removal a couple of releases down the road..
  • What programmatic interface should we present at the IR level and where should it live?  We're moving towards a CallSite like interface for statepoints, gc_relocates, and gc_results call sites.  Is this the right approach?  If so, should it live in the IR subtree, or Support?  (Note: The current code is only about 40% migrated to the new interface.)
Chris and I had a discussion about 3 years ago where we talked about keeping both, but it really depends on how difficult it is. Although the existing intrinsics have many different kinds of horribleness, the one advantage that they have is that roots don't have to be pointers - they can be structs containing pointers, such as tagged unions or Go-style interface values, which have fields that may contain either a pointer or some other data type depending on the value of some other field. I know we talked in email about ways to work around this limitation, but those workarounds have some complex edge cases which it would be nice to avoid - like for example passing a tagged union as a parameter.

That being said, I'm probably the only person who cares about this particular issue :) And while removing support for non-pointer roots will make my life harder in some ways, the new system will make it easier in many other ways.
  • To support invokable calls with safepoints, we need to make the statepoint intrinsic invokable.  This is new for intrinsics in LLVM.  Is there any reason that InvokeInst must be a subclass of CallInst? (rather than a view over either calls or invokes like CallSite)  Would changes to support invokable intrinsics be accepted upstream?  Alternate approaches are welcome.  
  • Is the concept of an abstract VM state something LLVM should know about?  If so, how should it be represented?  We're actively exploring this topic, but don't have strong opinions on the topic yet.  
  • Our statepoint shares a lot in the way of implementation and semantics with patchpoint and stackmap.  Is it better to submit new intrinsics, or try to identify a single intrinsic which could represent both?  Our current feeling is to keep them separate semantically, but share implementation where possible.  

Yours,
Philip (& team)

p.s. Sanjoy, one of my co-workers,  will be helping to answer questions as they arise. 

p.p.s. For those wondering why the current gcroot mechanism isn't sufficient, I covered that in a previous blog post:
[1] http://www.philipreames.com/Blog/2014/02/21/why-not-use-gcroot/

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev




--
-- Talin

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Code for late safepoint placement available

Philip Reames-4
In reply to this post by Talin-3
On 06/10/2014 11:15 AM, Talin wrote:

We currently use address spaces on our architecture to differentiate between traditional pointers and fat pointers (both of which are supported on our system in hardware).  It would be nice, rather than hard-coding an address space, to have a property on DataLayout that would tell you if a specific address space contained GC pointers.  We could then designate an address space that was both GCable and contained fat pointers, and another for non-GCable fat pointers.
Making this configurable is definitely desirable.  I like your idea of including a list of address spaces in data layout which contain GC pointers and making that query-able.  That sounds quite reasonable.  Unless anyone makes another proposal, I'll run with this.

Does it need to be a list, or could it be a range? It the set of GC address spaces was contiguous over some range of integers, then you'd only need to store a min / max value, which might make the implementation simpler.
A range is also fine. David, does that meet your use case as well?

Philip

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Code for late safepoint placement available

Philip Reames-4
In reply to this post by Talin-3
On 06/10/2014 11:23 AM, Talin wrote:
On Wed, Jun 4, 2014 at 9:35 AM, Philip Reames <[hidden email]> wrote:

I've included a list of specific questions I know we'd like to get feedback on, but general comments or questions are also very welcome. 

Open Topics:
  • How should we approach removing the existing partial support for garbage collection? (gcroot)  Do we want to support both going forward?  Do we need to provide a forward migration path in bitcode?  Given the usage is generally though MCJIT, we would prefer we simply deprecate the existing gcroot support and target it for complete removal a couple of releases down the road..
Chris and I had a discussion about 3 years ago where we talked about keeping both, but it really depends on how difficult it is.
First, did I trim the response correctly?  It sounds like you're responding to the question of how to integrate late safepoint placement with gcroot, but your actual response was inlined after another unrelated question.

Honestly, other than the maintenance headache, the two code paths basically don't touch.  Keeping both is entirely viable, though I'd have to ask: why?  What benefit does gcroot have?  (You started addressing this already, just making the question explicit.)
Although the existing intrinsics have many different kinds of horribleness, the one advantage that they have is that roots don't have to be pointers - they can be structs containing pointers, such as tagged unions or Go-style interface values, which have fields that may contain either a pointer or some other data type depending on the value of some other field. I know we talked in email about ways to work around this limitation, but those workarounds have some complex edge cases which it would be nice to avoid - like for example passing a tagged union as a parameter.

That being said, I'm probably the only person who cares about this particular issue :) And while removing support for non-pointer roots will make my life harder in some ways, the new system will make it easier in many other ways.
For those who weren't part of the previous discussion, I summarized the subset of IR which our safepoint placement scheme handles in this blog post: http://www.philipreames.com/Blog/2014/06/12/ir-restrictions-for-late-safepoint-placement/

Short version as it applies to Talin's point: We do not currently support pointers to garbage collected objects in aggregate types.  This should be fairly straight forward to add.  Patches are welcome and we should support them at some point.  We have no plans to support tagged unions as roots.  If you're willing to bake in knowledge of the tag functions, doing so wouldn't be particularly hard, but I don't know of any way to do so in a general way. 

One easy to implement support for aggregate types would be to explode and reconstruct the aggregate over the safepoint.  A more efficient, but also more complicated, scheme would be to encode the interior reference to the aggregate type into the arguments of the statepoint intrinsic.  Both schemes suffer from the fact they have to rely on type information (of the aggregate itself) being accurate.  This is a stronger assumption than we currently need for non-aggregates and we would need to verify that the optimizer actually upheld it. 

Talin, hopefully my restatement above helps clarify.  I'll also note that tagged unions as parameters are fine if the values are extracted immediately on entry before any safepoint happens.  The safepoint mechanism only considers values which are live (have potential uses reachable from) the safepoint itself.  As a result, dead tagged unions are fine.  Admittedly, that's a bit of a hack, but you might find it a useful hack.  :)

Philip



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Code for late safepoint placement available

David Chisnall-5
In reply to this post by Philip Reames-4
On 13 Jun 2014, at 11:23, Philip Reames <[hidden email]> wrote:

>> Does it need to be a list, or could it be a range? It the set of GC address spaces was contiguous over some range of integers, then you'd only need to store a min / max value, which might make the implementation simpler.
> A range is also fine. David, does that meet your use case as well?

The way that we've implemented the is-a-fat-pointer check is in DataLayout, querying an address space and I was expecting the is-a-GC'd-AS check to be done in the same way.  I don't see what you'd gain by forcing them to be a contiguous range.  If a particular target wants to make them a range, then that's fine.  You can also perhaps make a public field on the DataLayout to allow a cheap does-this-target-have-support-for-any-GC'd-address-spaces query to be a single field access, avoiding the virtual call for the no-GC case.

If you want to speed up the check, then a better idea would be to reserve a few bits of the AS ID for properties.  We support, I think, 24-bit address space IDs, so losing one bit to identify GC'd address spaces, maybe losing another to identify fat pointers, and another to identify ROM (or other immutable storage) would still give a few orders of magnitude more than anyone is currently using.  

David


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev