RFC: GSoC Project

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

RFC: GSoC Project

Sanjoy Das
Hi All!

I will be applying to the LLVM project for this GSoC, and I wanted some
preliminary sanity check on my project idea.

I intend to implement split (segmented) stacks for LLVM (like we have in
Go, and as being implemented for GCC [1]). A lot of what follows is
lifted from [1]; I will progressively add more details as I get more
familiar with the LLVM codebase.

I intend to start with the simplest possible approach - representing the
stack as a doubly linked list of _block_s, the size of each _block_
being a power of two. This can later be modified to improve performance
and accommodate other factors. Blocks will be chained together into a
doubly linked list structure (using the first two words in the block as
the next and previous pointers).

In the prologue, a function will check whether the current block has
enough stack space. This is easily done for function which don't have
variable sized allocas, and for ones which do, we can assume some
worst-case upper bound. The prologue can then call an intrinsic (let's
call it llvm.adjust_stack) which allocates a new block (possibly by
delegating this to a user-provided callback), copies the arguments,
saves the previous stack pointer (in the new block), and adjusts the
next and previous pointers. It will also have to adjust the stack
pointer, and the frame pointer, if it is being maintained. Cleanup can
be done by hijacking the return value, as also mentioned in [1]. It
might make sense to leave the allocated blocks around, to prevent
re-allocating the next time the program needs more stack space.

DWARF info can be generated as follows: since we know the offset of base
of the stack frame from the stack pointer (or we are maintaining a frame
pointer), we can always say whether the concerned call frame is the
first call frame or not. In the second case, all the previous register
values can be computed as usual, and in the first case, we will add an
extra indirection, involving looking up the stack pointer saved in this
block's header.

One thing I'd really like some input on is whether implementing split
stacks would be useful enough to warrant the effort (especially keeping
in mind that this is pretty useless on 64 bit architectures).

[1] http://gcc.gnu.org/wiki/SplitStacks
--
Sanjoy Das
http://playingwithpointers.com
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: GSoC Project

Joerg Sonnenberger
On Wed, Mar 23, 2011 at 03:37:02PM +0530, Sanjoy Das wrote:
> I intend to start with the simplest possible approach - representing the
> stack as a doubly linked list of _block_s, the size of each _block_
> being a power of two. This can later be modified to improve performance
> and accommodate other factors. Blocks will be chained together into a
> doubly linked list structure (using the first two words in the block as
> the next and previous pointers).

Where do you plan to store the pointers? What changes to the runtime
environment does this require?

> In the prologue, a function will check whether the current block has
> enough stack space. This is easily done for function which don't have
> variable sized allocas, and for ones which do, we can assume some
> worst-case upper bound.

If there are allocas involved, it is quite likely they are inside loops
etc., in which case there are no simple stack boundaries. This shouldn't
be a problem if the space for all static variables was allocated at the
beginning.

> The prologue can then call an intrinsic (let's
> call it llvm.adjust_stack) which allocates a new block (possibly by
> delegating this to a user-provided callback), copies the arguments,
> saves the previous stack pointer (in the new block), and adjusts the
> next and previous pointers.

Why do you need to copy the arguments? In fact, why do you think you can
actually copy the arguments? Consider printf for this.

> It will also have to adjust the stack pointer, and the frame pointer,
> if it is being maintained. Cleanup can be done by hijacking the return
> value, as also mentioned in [1]. It might make sense to leave the
> allocated blocks around, to prevent re-allocating the next time the
> program needs more stack space.

Hijacking the return value is a nice trick. I'm not sure about freeing /
not-freeing the unused block, this again has implications for the
runtime environment.

Have you considered how much work call graph optimisations require?
Especially with LTO it should be possible to kill many of the checks by
computing used stack space across segments of the call graph. One of the
papers on service scalability discussed this for light weight threads.
Forgot which one, it's been a while.

> One thing I'd really like some input on is whether implementing split
> stacks would be useful enough to warrant the effort (especially keeping
> in mind that this is pretty useless on 64 bit architectures).

I don't think it is useless on 64bit architectures. You can't always
make arbitrary large reservations of address space (e.g. optimised output of
Chicken). They also don't come without a price. Also consider something
like a kernel environment, where you really want to have a minimal
default stack, but be able to fallback gracefully.

Joerg
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: GSoC Project

Sanjoy Das
Hi!

Thanks for the review. What follows is a revised version of the
proposal. I'll post this on Melange after getting some more feedback.

Segmented stacks for LLVM, GSoC '11
-----------------------------------

* Overview: Goal & Rationale
  Support segmented (split) stacks in LLVM.

  The idea is to allocate stack space to a thread incrementally, instead
of allocating a (large) worst-case block of memory. This is especially
important on 32 bit systems (where the virtual address space is
limited). On 64 bit systems, (as Joerg pointed out), it will be useful
in situations where the code uses a continuation passing style (like
Chicken Scheme) or in critical software like kernels where it may not
actually be possible to assign more than a certain amount of memory to
the stack. In a situation where stacks can smaller in size than a single
page, everyone wins.

  This summer, I plan to implement this for the x86 (including x86-64)
platform, since that is the platform I'm most familiar with.

* Overall Design
  The overall design of is described below. More concrete details have
been put under the 'Implementation' section.

** The idea
   The central idea is to represent the stack as a doubly-linked list of
memory blocks instead of a (large) contiguous chunk of memory. Each
stack-frame will be made to fit entirely in one block, but one block may
(and, in general, will) contain more than one stack frame.

*** Block lifetime
    Blocks once allocated are not de-allocated - since they are still a
part of the doubly-linked list (and hence, not /lost/), there will be no
memory leak (and the blocks will be re-used the next time more memory is
required).

** Issues with unwinding
   Dividing up the stack into a list of smaller blocks violates one
important constraint: stack addresses are no longer continuous between
function invocations. This may affect any runtime magic that involves
unwinding the stack, like exception handling and scanning the stack for
GC roots.
        However, assuming the unwinding routine uses the .eh_frame section,
unwinding should work fine; since we'll also emit a correct  .eh_frame.
Edge cases (if they arise) will have to be eliminated on a case by case
basis.

* Implementation
  This section describes the proposed implementation in more detail.

** The block
   The block is a chunk of memory, whose size is a power of two (in
bytes). It has four words reserved in the beginning of the block:

 1. previous - Pointing to the previous block.
 2. next - Pointing to the next block.
 3. saved_ip - The original return address back to the callee.
 4. saved_sp - The value of the SP when this block was created.

** Changes to the function prologue
   In its prologue, a function will check whether the current block has
enough stack space. This is easily done for function which use a fixed
amount of stack space. For the ones which don't, we can assume some
worst-case upper bound. Since the block size is a constant power of two,
this can be done using a bitwise and and an integer comparison. If there
isn't enough space in the current block, the control jmps (does not
call) to a naked function (called setup_new_block)

*** setup_new_block:
setup_new_block will be small naked function emitted into all programs
being compiled with segmented stacks. It does the following:

 1. Checks if the current block's next pointer is populated or not.
 2. In case it isn't (i.e. is NULL), it allocates a new block, sets the
next pointer to point to the new block. It then sets the 'previous'
pointer of the new block to point to the old block. Otherwise, it
directly goes to step 3.
 3. Everything addressed relative to the stack pointer (in the function
body) is copied onto the new block. This is so that the SP relative
loads and stores work correctly.
    For x86 systems, this includes the function arguments passed on
stack, a return address (not the original, mentioned below), the
previous stack pointer.
    I don't think we will need to copy the actual argument values in
case of a vararg function (we can't, in fact), since we'll be accessing
the arguments using the fields in the va_list structure (and they point
back to the previous block, where the arguments actually are).
    The return address placed on the new block is not the original
return address placed on the old stack by the callee, but is the address
of a another naked function (called cleanup_new_block) described below.
 4. The stack pointer is set to (the address of the new block + 3 * word
size).
 5. Control is returned to the prolgue, and execution continues
normally. We cannot use a usual ret here, so we'll have to use one of
the caller-saved registers to communicate the address of the next
instruction to setup_new_block.

**** Frame pointers
     In case the frame pointer is being maintained for the function, the
value at %rbp will also have to be copied to the new block, and %rbp
will have to be changed to point to this new location (so that the FP
relative indexing works). %rbp will then be set to its "correct" value
by the epilogue itself. We'll probably need a new routine
(set_new_block_fp) for this situation.

**** Stack space for setup_new_block
     setup_new_block and setup_new_block_fp will be written to use the
minimum amount of stack space possible, and that stack space will have
to be factored into the check in the function prologue (so every
function prologue checks for the space it needs _plus_ the space
required to run setup_new_block once, by a callee).

*** cleanup_new_block
    cleaup_new_block will be a simple operation, which restores the old
SP (saved_sp block header) and jumps to saved_ip (also from the block
header). Later, if we decide to free unused blocks then that cleanup can
be done here.

*** Eliminating allocations and checks
    It is evident that setup_new_block is too expensive to run in every
prologue. In fact, it is probably worth the extra effort to eliminate
the check in as many situations as possible. I was pointed to a paper
earlier - `Whole-Program Optimization for Time and Space Efficient
Threads', which talks about link-time determination of upper bounds on
stack space required by threads. It does this by creating a call-graph
and using the same techniques used for analyzing a program's control
flow graph (dominators, back edges, et. al.).
    So, if the function A is dominated by the function B, and A needs 32
bytes of stack space, and B needs 64, we can allocate 96 bytes in B's
prologue, and the check can be eliminated from A altogether. Even
otherwise, if leaf functions F, G and H are reachable only via I and J,
and F, G, H, I, J take f, g, h, i, j bytes of stack space respectively,
we should be able to check for (MAX (f, g, h) + i) bytes in I and (MAX
(f, g, h) + j) bytes in J; and eliminate the checks in F, G and H
altogether. In my opinion, this should lead to substantial savings,
especially keeping in mind the typical pattern of a module having a few
publicly visible functions which delegate the core functionality to a
battery of smaller functions with internal linkage and private visibility.

*** The red zone
    The red zone can be usable only if there actually _is_ 128 bytes of
stack space left in the current block. That will need to be factored
into the check (in X86FrameLowering::emitPrologue).

*** DWARF .eh_frame
    (This is largely unchanged from the previous version I sent on the
list. I've repeated it here for completeness).
    DWARF info can be generated as follows: since we know the offset of
base of the stack frame from the stack pointer (or we are maintaining a
frame pointer), we can always say whether the concerned frame is the
first frame in the block. If not, all the previous register values can
be computed as usual, and in the first case, we can add an extra
indirection, involving looking up the stack pointer saved in this
block's header.

*** Interoperability
    Clearly, we can't safely make calls from code using segmented-stacks
to code expecting a vanilla contiguous stack. This can be fixed by the
linker by padding calls into non-segmented-stack code from
segmented-stack code with allocation and de-allocation of a 64 k block.
For the linker to be able to do this, the compiler needs to emit a dummy
section into object files compiled with segmented stacks. The linker
will then use as a flag. This is exactly the approach mentioned in [1].
However, this approach is suboptimal, and to get full benefits of
segmented stacks, all code needs to be compiled with segmented stacks.

[1] http://gcc.gnu.org/wiki/SplitStacks


--
Sanjoy Das
http://playingwithpointers.com
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: GSoC Project

Sanjoy Das
Just a quick clarification: is anyone interested in mentoring me for
this project?

--
Sanjoy Das
http://playingwithpointers.com
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: GSoC Project

Duncan Sands
Hi Sanjoy,

> Just a quick clarification: is anyone interested in mentoring me for
> this project?

it seems like a great project (but I am not competent to mentor it).  I guess
it is needed if dragonegg is ever going to support the Go language.  It would
also be pleasant for Ada if task stacks grew automagically when needed (having
to set an adequate but not too ridiculous stack size when creating tasks is a
pain).

Ciao,

Duncan.
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: GSoC Project

Talin-3
In reply to this post by Sanjoy Das
I wonder - would something like this allow for multiple stacks for a single thread? I'm thinking of something like continuations / fibers / green threads, which would be very handy.

On Wed, Mar 23, 2011 at 3:07 AM, Sanjoy Das <[hidden email]> wrote:
Hi All!

I will be applying to the LLVM project for this GSoC, and I wanted some
preliminary sanity check on my project idea.

I intend to implement split (segmented) stacks for LLVM (like we have in
Go, and as being implemented for GCC [1]). A lot of what follows is
lifted from [1]; I will progressively add more details as I get more
familiar with the LLVM codebase.

I intend to start with the simplest possible approach - representing the
stack as a doubly linked list of _block_s, the size of each _block_
being a power of two. This can later be modified to improve performance
and accommodate other factors. Blocks will be chained together into a
doubly linked list structure (using the first two words in the block as
the next and previous pointers).

In the prologue, a function will check whether the current block has
enough stack space. This is easily done for function which don't have
variable sized allocas, and for ones which do, we can assume some
worst-case upper bound. The prologue can then call an intrinsic (let's
call it llvm.adjust_stack) which allocates a new block (possibly by
delegating this to a user-provided callback), copies the arguments,
saves the previous stack pointer (in the new block), and adjusts the
next and previous pointers. It will also have to adjust the stack
pointer, and the frame pointer, if it is being maintained. Cleanup can
be done by hijacking the return value, as also mentioned in [1]. It
might make sense to leave the allocated blocks around, to prevent
re-allocating the next time the program needs more stack space.

DWARF info can be generated as follows: since we know the offset of base
of the stack frame from the stack pointer (or we are maintaining a frame
pointer), we can always say whether the concerned call frame is the
first call frame or not. In the second case, all the previous register
values can be computed as usual, and in the first case, we will add an
extra indirection, involving looking up the stack pointer saved in this
block's header.

One thing I'd really like some input on is whether implementing split
stacks would be useful enough to warrant the effort (especially keeping
in mind that this is pretty useless on 64 bit architectures).

[1] http://gcc.gnu.org/wiki/SplitStacks
--
Sanjoy Das
http://playingwithpointers.com
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



--
-- Talin

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: GSoC Project

Chris Lattner-2

On Apr 10, 2011, at 2:45 PM, Talin wrote:

I wonder - would something like this allow for multiple stacks for a single thread? I'm thinking of something like continuations / fibers / green threads, which would be very handy.

I haven't looked at the proposal, but yes, this would be very useful functionality for LLVM to provide.

-Chris


On Wed, Mar 23, 2011 at 3:07 AM, Sanjoy Das <[hidden email]> wrote:
Hi All!

I will be applying to the LLVM project for this GSoC, and I wanted some
preliminary sanity check on my project idea.

I intend to implement split (segmented) stacks for LLVM (like we have in
Go, and as being implemented for GCC [1]). A lot of what follows is
lifted from [1]; I will progressively add more details as I get more
familiar with the LLVM codebase.

I intend to start with the simplest possible approach - representing the
stack as a doubly linked list of _block_s, the size of each _block_
being a power of two. This can later be modified to improve performance
and accommodate other factors. Blocks will be chained together into a
doubly linked list structure (using the first two words in the block as
the next and previous pointers).

In the prologue, a function will check whether the current block has
enough stack space. This is easily done for function which don't have
variable sized allocas, and for ones which do, we can assume some
worst-case upper bound. The prologue can then call an intrinsic (let's
call it llvm.adjust_stack) which allocates a new block (possibly by
delegating this to a user-provided callback), copies the arguments,
saves the previous stack pointer (in the new block), and adjusts the
next and previous pointers. It will also have to adjust the stack
pointer, and the frame pointer, if it is being maintained. Cleanup can
be done by hijacking the return value, as also mentioned in [1]. It
might make sense to leave the allocated blocks around, to prevent
re-allocating the next time the program needs more stack space.

DWARF info can be generated as follows: since we know the offset of base
of the stack frame from the stack pointer (or we are maintaining a frame
pointer), we can always say whether the concerned call frame is the
first call frame or not. In the second case, all the previous register
values can be computed as usual, and in the first case, we will add an
extra indirection, involving looking up the stack pointer saved in this
block's header.

One thing I'd really like some input on is whether implementing split
stacks would be useful enough to warrant the effort (especially keeping
in mind that this is pretty useless on 64 bit architectures).

[1] http://gcc.gnu.org/wiki/SplitStacks
--
Sanjoy Das
http://playingwithpointers.com
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



--
-- Talin
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: GSoC Project

Talin-3
On Sun, Apr 10, 2011 at 4:16 PM, Chris Lattner <[hidden email]> wrote:

On Apr 10, 2011, at 2:45 PM, Talin wrote:

I wonder - would something like this allow for multiple stacks for a single thread? I'm thinking of something like continuations / fibers / green threads, which would be very handy.

I haven't looked at the proposal, but yes, this would be very useful functionality for LLVM to provide.

Another thing I'd like is for any proposed segmented stack mechanism to be garbage-collector friendly - which essentially means an API that can reliably iterate through all of the stack frames, starting from the current function, and calculate the correct return address for each stack frame. I'm envisioning something like this:

    frame = llvm.currentframe();
    while frame != NULL {
      retaddr = llvm.returnaddr(frame);
      // do stack tracing using frame and retaddr to identify stack roots.
      frame = llvm.nextframe(frame);
    }

This is essentially what I do now with the EBP register - but it would be better if it were encapsulated behind an API, so that frontends wouldn't have to know about specific registers.

-Chris


On Wed, Mar 23, 2011 at 3:07 AM, Sanjoy Das <[hidden email]> wrote:
Hi All!

I will be applying to the LLVM project for this GSoC, and I wanted some
preliminary sanity check on my project idea.

I intend to implement split (segmented) stacks for LLVM (like we have in
Go, and as being implemented for GCC [1]). A lot of what follows is
lifted from [1]; I will progressively add more details as I get more
familiar with the LLVM codebase.

I intend to start with the simplest possible approach - representing the
stack as a doubly linked list of _block_s, the size of each _block_
being a power of two. This can later be modified to improve performance
and accommodate other factors. Blocks will be chained together into a
doubly linked list structure (using the first two words in the block as
the next and previous pointers).

In the prologue, a function will check whether the current block has
enough stack space. This is easily done for function which don't have
variable sized allocas, and for ones which do, we can assume some
worst-case upper bound. The prologue can then call an intrinsic (let's
call it llvm.adjust_stack) which allocates a new block (possibly by
delegating this to a user-provided callback), copies the arguments,
saves the previous stack pointer (in the new block), and adjusts the
next and previous pointers. It will also have to adjust the stack
pointer, and the frame pointer, if it is being maintained. Cleanup can
be done by hijacking the return value, as also mentioned in [1]. It
might make sense to leave the allocated blocks around, to prevent
re-allocating the next time the program needs more stack space.

DWARF info can be generated as follows: since we know the offset of base
of the stack frame from the stack pointer (or we are maintaining a frame
pointer), we can always say whether the concerned call frame is the
first call frame or not. In the second case, all the previous register
values can be computed as usual, and in the first case, we will add an
extra indirection, involving looking up the stack pointer saved in this
block's header.

One thing I'd really like some input on is whether implementing split
stacks would be useful enough to warrant the effort (especially keeping
in mind that this is pretty useless on 64 bit architectures).

[1] http://gcc.gnu.org/wiki/SplitStacks
--
Sanjoy Das
http://playingwithpointers.com
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



--
-- Talin
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev




--
-- Talin

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: GSoC Project

Joachim Durchholz
In reply to this post by Talin-3
(Sorry for replying to the wrong post, but the original one had gone out
of scope when this occurred to me.)

> On Wed, Mar 23, 2011 at 3:07 AM, Sanjoy Das
> <[hidden email]>wrote:
>
>> In the prologue, a function will check whether the current block has
>> enough stack space. This is easily done for function which don't have
>> variable sized allocas, and for ones which do, we can assume some
>> worst-case upper bound.

What would you do if a function allocas in a loop? The programmer would
know that the loop won't execute, say, more than twice, but your code
doesn't unless you're lucky.
Or if the size to be allocated is passed in as a parameter, from code
that the compiler does not even see in this run (think dynamically
linked code, or alloca sizes that depend on user input).

So you'll need a fallback strategy for such situation.

Either fall back to the standard stack (but that would more-or-less
defeat the purpose of this all).

Or make the code generators deal with stacks that have segmented
per-function stack frames (whether that's even feasible, I have no idea,
but I guess it wouldn't be without a gruesome amount of work, and this
project would stay restricted to those backends that happen to get
maintained for this).

Or deal with such cases by redirecting all allocas to the heap. With the
possible option of moving them back to the stack where the compiler can
indeed infer the stack requirement beforehand.

- - -

In general, I think it's a good idea to avoid too large stack frames
anyway. Operating systems and similar infrastructure tend to assume that
the per-function stack requirement is limited to a small constant, and
breaking that assumption can drive them into less well-tested code,
triggering security holes or inefficiencies.

Just my 2c, and back to lurker mode.

Regards,
Jo
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: GSoC Project

Joerg Sonnenberger
On Mon, Apr 11, 2011 at 07:57:08AM +0200, Joachim Durchholz wrote:

> (Sorry for replying to the wrong post, but the original one had gone out
> of scope when this occurred to me.)
>
> > On Wed, Mar 23, 2011 at 3:07 AM, Sanjoy Das
> > <[hidden email]>wrote:
> >
> >> In the prologue, a function will check whether the current block has
> >> enough stack space. This is easily done for function which don't have
> >> variable sized allocas, and for ones which do, we can assume some
> >> worst-case upper bound.
>
> What would you do if a function allocas in a loop? The programmer would
> know that the loop won't execute, say, more than twice, but your code
> doesn't unless you're lucky.

It doesn't have to. It is perfectly fine to allocate another stack
segment after the handling of fixed slots is done. All other entries on
the stack are addressed by pointer anyway. They don't care about being
in one stack segment relative to the frame pointer.

Joerg
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: GSoC Project

Sanjoy Das
In reply to this post by Talin-3
Hi!

Thanks for the feedback. For context, my implementation plan is here:
http://pastebin.com/e9JMZNCE

First, about unwinding:

In architectures like x86-64, where unwinding based on DWARF info, there
shouldn't be any problems; since the DWARF info will be emitted
correctly. Otherwise, if the unwinding is done by following BP, it
should still be possible to have BP de-reference correctly (ref. "Frame
Pointers" section in the implementation plan). SP will not always have a
correct value - I don't know if this is problem.

About co-routines:

Here is a sketch of how I think co-routines can be implemented (I'll
merge this with the main implementation plan after I get some feedback):

Have a new instruction, called "yield" to return a value from a
co-routine, preserving the state. Thus, we immediately know which
functions are co-routines. Each co-routine will have a new stack.
Associate each co-routine with a thread-local global variable (called
saved_stack here, will have to be mangled with the name of the
co-routine) which points to the start stack block for that co-routine.
This will be the first block in the chain of blocks to follow.

The structure of the block will be similar to the structure of a regular
stack block, except that it will also have space to store two registers
- this_ip and this_sp.

The prologue of a co-routine will jump to a function similar to
setup_new_block (setup_new_block_coroutine) which will work like
setup_new_block, except:

1. It will first check if saved_stack is NULL. If it is NULL, it will
allocate a new block and save it to saved_stack. It if isn't, it'll
simply restore saved_sp, saved_ip.

2. In case a new block was allocated, it will pretty much do what
setup_block does, after which it will adjust the SP to make space for
the saved registers.

The destroy_block procedure will also have to be a little different
(mentioned below).

There are four things (relevant to this discussion) a co-routine can do:

Yield

This returns control to the calling function, without forgetting the
current state of the function. To do this, we save saved_ip and
saved_sp. Every yield be padded with instructions pushing and popping
the registers live at that point. Then we set the return value (register
or memory), and restore saved_sp and saved_ip from the current block. We
can't simply return because the actual return value has been hijacked to
provide for block cleanup.

Call to regular function

Just a simple call - the caller's prologue will handle setting up a it's
own stack space etc.

Call to Co-routine

This too should "just work", since all the heavy-lifting is done in the
co-routine's prologue. However, the above approach will not work for
nested co-routines (i.e. calling the same co-routine body with one call
is still active, recursively). I'm not sure if having support for nested
co-routines will add any value.

Return

This will be a regular return. Since the return value has been hijacked
to point to a another block of code (destroy_block_coroutine), control
will jump there instead.

destroy_block_coroutine will free the linked-list of stack blocks (we
have to free this, since we will won't have a reference to this list
anymore), set saved_stack for this co-routine to NULL, and restore
saved_sp and saved_ip.

--
Sanjoy Das
http://playingwithpointers.com
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: GSoC Project

Justin Holewinski-2
On Mon, Apr 11, 2011 at 9:07 AM, Sanjoy Das <[hidden email]> wrote:
Hi!

Thanks for the feedback. For context, my implementation plan is here:
http://pastebin.com/e9JMZNCE

First, about unwinding:

In architectures like x86-64, where unwinding based on DWARF info, there
shouldn't be any problems; since the DWARF info will be emitted
correctly. Otherwise, if the unwinding is done by following BP, it
should still be possible to have BP de-reference correctly (ref. "Frame
Pointers" section in the implementation plan). SP will not always have a
correct value - I don't know if this is problem.

About co-routines:

Here is a sketch of how I think co-routines can be implemented (I'll
merge this with the main implementation plan after I get some feedback):

Have a new instruction, called "yield" to return a value from a
co-routine, preserving the state. Thus, we immediately know which
functions are co-routines. Each co-routine will have a new stack.
Associate each co-routine with a thread-local global variable (called
saved_stack here, will have to be mangled with the name of the
co-routine) which points to the start stack block for that co-routine.
This will be the first block in the chain of blocks to follow.

The structure of the block will be similar to the structure of a regular
stack block, except that it will also have space to store two registers
- this_ip and this_sp.

The prologue of a co-routine will jump to a function similar to
setup_new_block (setup_new_block_coroutine) which will work like
setup_new_block, except:

1. It will first check if saved_stack is NULL. If it is NULL, it will
allocate a new block and save it to saved_stack. It if isn't, it'll
simply restore saved_sp, saved_ip.

2. In case a new block was allocated, it will pretty much do what
setup_block does, after which it will adjust the SP to make space for
the saved registers.

The destroy_block procedure will also have to be a little different
(mentioned below).

There are four things (relevant to this discussion) a co-routine can do:

Yield

This returns control to the calling function, without forgetting the
current state of the function. To do this, we save saved_ip and
saved_sp. Every yield be padded with instructions pushing and popping
the registers live at that point. Then we set the return value (register
or memory), and restore saved_sp and saved_ip from the current block. We
can't simply return because the actual return value has been hijacked to
provide for block cleanup.

Call to regular function

Just a simple call - the caller's prologue will handle setting up a it's
own stack space etc.

Call to Co-routine

This too should "just work", since all the heavy-lifting is done in the
co-routine's prologue. However, the above approach will not work for
nested co-routines (i.e. calling the same co-routine body with one call
is still active, recursively). I'm not sure if having support for nested
co-routines will add any value.

Return

This will be a regular return. Since the return value has been hijacked
to point to a another block of code (destroy_block_coroutine), control
will jump there instead.

destroy_block_coroutine will free the linked-list of stack blocks (we
have to free this, since we will won't have a reference to this list
anymore), set saved_stack for this co-routine to NULL, and restore
saved_sp and saved_ip.

I'm wondering how much of this should be implemented as new LLVM functionality, and how much should be left to the front-end compiler.  With some additional LLVM intrinsics (e.g. llvm.stack.new.block, llvm.stack.delete.block, etc.), a front-end could take care of the details of how co-routines are actually implemented.  This would also give the front-end freedom to implement whatever semantics/conventions are necessary/required for the source language.  I'm just not sure having LLVM dictate how co-routines are implemented is the best way to approach this.
 

--
Sanjoy Das
http://playingwithpointers.com
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



--

Thanks,

Justin Holewinski


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: GSoC Project

Talin-3
On Mon, Apr 11, 2011 at 7:34 AM, Justin Holewinski <[hidden email]> wrote:
On Mon, Apr 11, 2011 at 9:07 AM, Sanjoy Das <[hidden email]> wrote:
Hi!

Thanks for the feedback. For context, my implementation plan is here:
http://pastebin.com/e9JMZNCE

First, about unwinding:

In architectures like x86-64, where unwinding based on DWARF info, there
shouldn't be any problems; since the DWARF info will be emitted
correctly. Otherwise, if the unwinding is done by following BP, it
should still be possible to have BP de-reference correctly (ref. "Frame
Pointers" section in the implementation plan). SP will not always have a
correct value - I don't know if this is problem.

About co-routines:

Here is a sketch of how I think co-routines can be implemented (I'll
merge this with the main implementation plan after I get some feedback):

Have a new instruction, called "yield" to return a value from a
co-routine, preserving the state. Thus, we immediately know which
functions are co-routines. Each co-routine will have a new stack.
Associate each co-routine with a thread-local global variable (called
saved_stack here, will have to be mangled with the name of the
co-routine) which points to the start stack block for that co-routine.
This will be the first block in the chain of blocks to follow.

The structure of the block will be similar to the structure of a regular
stack block, except that it will also have space to store two registers
- this_ip and this_sp.

The prologue of a co-routine will jump to a function similar to
setup_new_block (setup_new_block_coroutine) which will work like
setup_new_block, except:

1. It will first check if saved_stack is NULL. If it is NULL, it will
allocate a new block and save it to saved_stack. It if isn't, it'll
simply restore saved_sp, saved_ip.

2. In case a new block was allocated, it will pretty much do what
setup_block does, after which it will adjust the SP to make space for
the saved registers.

The destroy_block procedure will also have to be a little different
(mentioned below).

There are four things (relevant to this discussion) a co-routine can do:

Yield

This returns control to the calling function, without forgetting the
current state of the function. To do this, we save saved_ip and
saved_sp. Every yield be padded with instructions pushing and popping
the registers live at that point. Then we set the return value (register
or memory), and restore saved_sp and saved_ip from the current block. We
can't simply return because the actual return value has been hijacked to
provide for block cleanup.

Call to regular function

Just a simple call - the caller's prologue will handle setting up a it's
own stack space etc.

Call to Co-routine

This too should "just work", since all the heavy-lifting is done in the
co-routine's prologue. However, the above approach will not work for
nested co-routines (i.e. calling the same co-routine body with one call
is still active, recursively). I'm not sure if having support for nested
co-routines will add any value.

Return

This will be a regular return. Since the return value has been hijacked
to point to a another block of code (destroy_block_coroutine), control
will jump there instead.

destroy_block_coroutine will free the linked-list of stack blocks (we
have to free this, since we will won't have a reference to this list
anymore), set saved_stack for this co-routine to NULL, and restore
saved_sp and saved_ip.

I'm wondering how much of this should be implemented as new LLVM functionality, and how much should be left to the front-end compiler.  With some additional LLVM intrinsics (e.g. llvm.stack.new.block, llvm.stack.delete.block, etc.), a front-end could take care of the details of how co-routines are actually implemented.  This would also give the front-end freedom to implement whatever semantics/conventions are necessary/required for the source language.  I'm just not sure having LLVM dictate how co-routines are implemented is the best way to approach this.

I'm in agreement with Justin.

1) It's much easier to add new intrinsics to LLVM than it is to add new instructions. Library functions are even easier - in general you want to use intrinsics in cases where the presence of the intrinsic will affect the way the code is generated in the calling function.

2) The API that I was envisioning for creating stacks was something fairly simple (maybe too simple):

   // Create a new stack, where 'stack_handle' is some opaque object, and 'func'
   // is an LLVM function.
   stack_handle = llvm.createstack(func, data)

   // Switch to a different stack (i.e. yield)
   llvm.switchstack(stack_handle)

   // deallocate a stack
   llvm.destroystack(stack_handle)

The frontend would be responsible for managing the lifetime of the stack_handle object. For something like Python generators, the stack_handle would be stored in the iterator object that is returned from the generator, and deallocated when the generator gets garbage-collected. For cooperative threads, the handle would be stored as a private member of a Thread or Coroutine class. Different languages would support different ways of using coroutines.

Similarly, the frontend would be responsible for passing data between the different coroutines, using either thread-local variables or some other method.

Also I'm thinking the frontend would be responsible for propagating exceptions between stacks - so for example, if my coroutine throws an exception that is not caught, but instead propagates all the way to the top of it's stack, then that coroutine gets terminated, and depending on the language semantics, the same exception or another one gets thrown upon return from 'yield' in the other context. I would have no objection to implementing all of that logic in my frontend.

Because the internals of the stack_handle may be different on different platforms, I'm thinking that the frontend should only get a void* pointer - it doesn't need to know what's inside it. I think the frontend only needs a small number of operations: create, yield, destroy, and iterate through call frames (for garbage collection).

The 'yield' instruction seems somewhat inspired from Python's 'yield' statement, which unfortunately has some substantial drawbacks - such as the fact that the yield must be in the top-most call frame in order for the compiler to know that the function is a generator / coroutine. In other words, you can't in Python have a generator which calls a subroutine that does the yield. I'd like to avoid that limitation. Scheme and Lua, to name just two examples, have much more powerful models for doing continuations and coroutines, and I'd like to be able to support those. I think that can be done if we supply only the most minimal support on the LLVM side, and leave most of the details to the frontend.

 

--
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



--

Thanks,

Justin Holewinski




--
-- Talin

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: GSoC Project

Reid Kleckner-3
This reminds me of Kenneth's "context" proposal from some time back:

http://lists.cs.uiuc.edu/pipermail/llvmdev/2010-April/030787.html

I haven't compared the two too closely, but I thought I should throw
it out there as food for thought.

Reid

On Mon, Apr 11, 2011 at 3:26 PM, Talin <[hidden email]> wrote:

> On Mon, Apr 11, 2011 at 7:34 AM, Justin Holewinski
> <[hidden email]> wrote:
>>
>> On Mon, Apr 11, 2011 at 9:07 AM, Sanjoy Das
>> <[hidden email]> wrote:
>>>
>>> Hi!
>>>
>>> Thanks for the feedback. For context, my implementation plan is here:
>>> http://pastebin.com/e9JMZNCE
>>>
>>> First, about unwinding:
>>>
>>> In architectures like x86-64, where unwinding based on DWARF info, there
>>> shouldn't be any problems; since the DWARF info will be emitted
>>> correctly. Otherwise, if the unwinding is done by following BP, it
>>> should still be possible to have BP de-reference correctly (ref. "Frame
>>> Pointers" section in the implementation plan). SP will not always have a
>>> correct value - I don't know if this is problem.
>>>
>>> About co-routines:
>>>
>>> Here is a sketch of how I think co-routines can be implemented (I'll
>>> merge this with the main implementation plan after I get some feedback):
>>>
>>> Have a new instruction, called "yield" to return a value from a
>>> co-routine, preserving the state. Thus, we immediately know which
>>> functions are co-routines. Each co-routine will have a new stack.
>>> Associate each co-routine with a thread-local global variable (called
>>> saved_stack here, will have to be mangled with the name of the
>>> co-routine) which points to the start stack block for that co-routine.
>>> This will be the first block in the chain of blocks to follow.
>>>
>>> The structure of the block will be similar to the structure of a regular
>>> stack block, except that it will also have space to store two registers
>>> - this_ip and this_sp.
>>>
>>> The prologue of a co-routine will jump to a function similar to
>>> setup_new_block (setup_new_block_coroutine) which will work like
>>> setup_new_block, except:
>>>
>>> 1. It will first check if saved_stack is NULL. If it is NULL, it will
>>> allocate a new block and save it to saved_stack. It if isn't, it'll
>>> simply restore saved_sp, saved_ip.
>>>
>>> 2. In case a new block was allocated, it will pretty much do what
>>> setup_block does, after which it will adjust the SP to make space for
>>> the saved registers.
>>>
>>> The destroy_block procedure will also have to be a little different
>>> (mentioned below).
>>>
>>> There are four things (relevant to this discussion) a co-routine can do:
>>>
>>> Yield
>>>
>>> This returns control to the calling function, without forgetting the
>>> current state of the function. To do this, we save saved_ip and
>>> saved_sp. Every yield be padded with instructions pushing and popping
>>> the registers live at that point. Then we set the return value (register
>>> or memory), and restore saved_sp and saved_ip from the current block. We
>>> can't simply return because the actual return value has been hijacked to
>>> provide for block cleanup.
>>>
>>> Call to regular function
>>>
>>> Just a simple call - the caller's prologue will handle setting up a it's
>>> own stack space etc.
>>>
>>> Call to Co-routine
>>>
>>> This too should "just work", since all the heavy-lifting is done in the
>>> co-routine's prologue. However, the above approach will not work for
>>> nested co-routines (i.e. calling the same co-routine body with one call
>>> is still active, recursively). I'm not sure if having support for nested
>>> co-routines will add any value.
>>>
>>> Return
>>>
>>> This will be a regular return. Since the return value has been hijacked
>>> to point to a another block of code (destroy_block_coroutine), control
>>> will jump there instead.
>>>
>>> destroy_block_coroutine will free the linked-list of stack blocks (we
>>> have to free this, since we will won't have a reference to this list
>>> anymore), set saved_stack for this co-routine to NULL, and restore
>>> saved_sp and saved_ip.
>>
>> I'm wondering how much of this should be implemented as new LLVM
>> functionality, and how much should be left to the front-end compiler.  With
>> some additional LLVM intrinsics (e.g. llvm.stack.new.block,
>> llvm.stack.delete.block, etc.), a front-end could take care of the details
>> of how co-routines are actually implemented.  This would also give the
>> front-end freedom to implement whatever semantics/conventions are
>> necessary/required for the source language.  I'm just not sure having LLVM
>> dictate how co-routines are implemented is the best way to approach this.
>
> I'm in agreement with Justin.
> 1) It's much easier to add new intrinsics to LLVM than it is to add new
> instructions. Library functions are even easier - in general you want to use
> intrinsics in cases where the presence of the intrinsic will affect the way
> the code is generated in the calling function.
> 2) The API that I was envisioning for creating stacks was something fairly
> simple (maybe too simple):
>    // Create a new stack, where 'stack_handle' is some opaque object, and
> 'func'
>    // is an LLVM function.
>    stack_handle = llvm.createstack(func, data)
>    // Switch to a different stack (i.e. yield)
>    llvm.switchstack(stack_handle)
>    // deallocate a stack
>    llvm.destroystack(stack_handle)
> The frontend would be responsible for managing the lifetime of the
> stack_handle object. For something like Python generators, the stack_handle
> would be stored in the iterator object that is returned from the generator,
> and deallocated when the generator gets garbage-collected. For cooperative
> threads, the handle would be stored as a private member of a Thread or
> Coroutine class. Different languages would support different ways of using
> coroutines.
> Similarly, the frontend would be responsible for passing data between the
> different coroutines, using either thread-local variables or some other
> method.
> Also I'm thinking the frontend would be responsible for propagating
> exceptions between stacks - so for example, if my coroutine throws an
> exception that is not caught, but instead propagates all the way to the top
> of it's stack, then that coroutine gets terminated, and depending on the
> language semantics, the same exception or another one gets thrown upon
> return from 'yield' in the other context. I would have no objection to
> implementing all of that logic in my frontend.
> Because the internals of the stack_handle may be different on different
> platforms, I'm thinking that the frontend should only get a void* pointer -
> it doesn't need to know what's inside it. I think the frontend only needs a
> small number of operations: create, yield, destroy, and iterate through call
> frames (for garbage collection).
>
> The 'yield' instruction seems somewhat inspired from Python's 'yield'
> statement, which unfortunately has some substantial drawbacks - such as the
> fact that the yield must be in the top-most call frame in order for the
> compiler to know that the function is a generator / coroutine. In other
> words, you can't in Python have a generator which calls a subroutine that
> does the yield. I'd like to avoid that limitation. Scheme and Lua, to name
> just two examples, have much more powerful models for doing continuations
> and coroutines, and I'd like to be able to support those. I think that can
> be done if we supply only the most minimal support on the LLVM side, and
> leave most of the details to the frontend.
>>
>>
>>>
>>> --
>>> Sanjoy Das
>>> http://playingwithpointers.com
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> [hidden email]         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>>
>> --
>>
>> Thanks,
>> Justin Holewinski
>
>
>
> --
> -- Talin
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: GSoC Project

Talin-3
On Mon, Apr 11, 2011 at 12:38 PM, Reid Kleckner <[hidden email]> wrote:
This reminds me of Kenneth's "context" proposal from some time back:

http://lists.cs.uiuc.edu/pipermail/llvmdev/2010-April/030787.html

I haven't compared the two too closely, but I thought I should throw
it out there as food for thought.

That's very interesting, I had not seen that before. It would pretty much solve all of my needs, although it only supports fixed-sized stacks. If this were combined with what Sanjoy is proposing that would be terrific.

There's one point there that I don't quite understand - the doc says that the context buffer that you allocate would be large enough to store a snapshot of all the machine's registers. I would think that the machine registers would be more conveniently stored on the stack itself, and the context buffer would merely contain a pointer to the end of the stack - that means that a call to setcontext() would simply push all the current registers, change the stack pointer, and then pop each register value off of the stack before returning. However, that's a minor implementation detail and not something I really care about one way or the other.

There's one other feature that I can think of, not strictly necessary, but might help make context switching more efficient. Let's assume that I have a yield() function in my language which wraps around llvm.setcontext() and performs some additional language-specific housekeeping. So for example:

    void yield() {
       llvm.swap_context(next_context, etc);
       // If the other context returned with an exception,
       // propagate the exception to the current context.
       if (pending_throwable) {
         throw pending_throwable;
       }
    }

So the problem here is that I have to perform a check for a throwable, when 99.9% of the time it will be NULL. One could get around this by playing games with the return address:

    void yield() {
       llvm.swap_context(next_context, etc);
    normal_exit:
       return;
    error_exit:
       throw pending_throwable;
    }

The idea is that when you want to pass an exception to the other thread, you would modify the return address on the stack to point to error_exit instead of normal_exit. Unfortunately to get this to work you'd also have to convince LLVM's optimizer not to complain about unreachable code, so maybe this is more trouble than it's worth.
 
Reid

On Mon, Apr 11, 2011 at 3:26 PM, Talin <[hidden email]> wrote:
> On Mon, Apr 11, 2011 at 7:34 AM, Justin Holewinski
> <[hidden email]> wrote:
>>
>> On Mon, Apr 11, 2011 at 9:07 AM, Sanjoy Das
>> <[hidden email]> wrote:
>>>
>>> Hi!
>>>
>>> Thanks for the feedback. For context, my implementation plan is here:
>>> http://pastebin.com/e9JMZNCE
>>>
>>> First, about unwinding:
>>>
>>> In architectures like x86-64, where unwinding based on DWARF info, there
>>> shouldn't be any problems; since the DWARF info will be emitted
>>> correctly. Otherwise, if the unwinding is done by following BP, it
>>> should still be possible to have BP de-reference correctly (ref. "Frame
>>> Pointers" section in the implementation plan). SP will not always have a
>>> correct value - I don't know if this is problem.
>>>
>>> About co-routines:
>>>
>>> Here is a sketch of how I think co-routines can be implemented (I'll
>>> merge this with the main implementation plan after I get some feedback):
>>>
>>> Have a new instruction, called "yield" to return a value from a
>>> co-routine, preserving the state. Thus, we immediately know which
>>> functions are co-routines. Each co-routine will have a new stack.
>>> Associate each co-routine with a thread-local global variable (called
>>> saved_stack here, will have to be mangled with the name of the
>>> co-routine) which points to the start stack block for that co-routine.
>>> This will be the first block in the chain of blocks to follow.
>>>
>>> The structure of the block will be similar to the structure of a regular
>>> stack block, except that it will also have space to store two registers
>>> - this_ip and this_sp.
>>>
>>> The prologue of a co-routine will jump to a function similar to
>>> setup_new_block (setup_new_block_coroutine) which will work like
>>> setup_new_block, except:
>>>
>>> 1. It will first check if saved_stack is NULL. If it is NULL, it will
>>> allocate a new block and save it to saved_stack. It if isn't, it'll
>>> simply restore saved_sp, saved_ip.
>>>
>>> 2. In case a new block was allocated, it will pretty much do what
>>> setup_block does, after which it will adjust the SP to make space for
>>> the saved registers.
>>>
>>> The destroy_block procedure will also have to be a little different
>>> (mentioned below).
>>>
>>> There are four things (relevant to this discussion) a co-routine can do:
>>>
>>> Yield
>>>
>>> This returns control to the calling function, without forgetting the
>>> current state of the function. To do this, we save saved_ip and
>>> saved_sp. Every yield be padded with instructions pushing and popping
>>> the registers live at that point. Then we set the return value (register
>>> or memory), and restore saved_sp and saved_ip from the current block. We
>>> can't simply return because the actual return value has been hijacked to
>>> provide for block cleanup.
>>>
>>> Call to regular function
>>>
>>> Just a simple call - the caller's prologue will handle setting up a it's
>>> own stack space etc.
>>>
>>> Call to Co-routine
>>>
>>> This too should "just work", since all the heavy-lifting is done in the
>>> co-routine's prologue. However, the above approach will not work for
>>> nested co-routines (i.e. calling the same co-routine body with one call
>>> is still active, recursively). I'm not sure if having support for nested
>>> co-routines will add any value.
>>>
>>> Return
>>>
>>> This will be a regular return. Since the return value has been hijacked
>>> to point to a another block of code (destroy_block_coroutine), control
>>> will jump there instead.
>>>
>>> destroy_block_coroutine will free the linked-list of stack blocks (we
>>> have to free this, since we will won't have a reference to this list
>>> anymore), set saved_stack for this co-routine to NULL, and restore
>>> saved_sp and saved_ip.
>>
>> I'm wondering how much of this should be implemented as new LLVM
>> functionality, and how much should be left to the front-end compiler.  With
>> some additional LLVM intrinsics (e.g. llvm.stack.new.block,
>> llvm.stack.delete.block, etc.), a front-end could take care of the details
>> of how co-routines are actually implemented.  This would also give the
>> front-end freedom to implement whatever semantics/conventions are
>> necessary/required for the source language.  I'm just not sure having LLVM
>> dictate how co-routines are implemented is the best way to approach this.
>
> I'm in agreement with Justin.
> 1) It's much easier to add new intrinsics to LLVM than it is to add new
> instructions. Library functions are even easier - in general you want to use
> intrinsics in cases where the presence of the intrinsic will affect the way
> the code is generated in the calling function.
> 2) The API that I was envisioning for creating stacks was something fairly
> simple (maybe too simple):
>    // Create a new stack, where 'stack_handle' is some opaque object, and
> 'func'
>    // is an LLVM function.
>    stack_handle = llvm.createstack(func, data)
>    // Switch to a different stack (i.e. yield)
>    llvm.switchstack(stack_handle)
>    // deallocate a stack
>    llvm.destroystack(stack_handle)
> The frontend would be responsible for managing the lifetime of the
> stack_handle object. For something like Python generators, the stack_handle
> would be stored in the iterator object that is returned from the generator,
> and deallocated when the generator gets garbage-collected. For cooperative
> threads, the handle would be stored as a private member of a Thread or
> Coroutine class. Different languages would support different ways of using
> coroutines.
> Similarly, the frontend would be responsible for passing data between the
> different coroutines, using either thread-local variables or some other
> method.
> Also I'm thinking the frontend would be responsible for propagating
> exceptions between stacks - so for example, if my coroutine throws an
> exception that is not caught, but instead propagates all the way to the top
> of it's stack, then that coroutine gets terminated, and depending on the
> language semantics, the same exception or another one gets thrown upon
> return from 'yield' in the other context. I would have no objection to
> implementing all of that logic in my frontend.
> Because the internals of the stack_handle may be different on different
> platforms, I'm thinking that the frontend should only get a void* pointer -
> it doesn't need to know what's inside it. I think the frontend only needs a
> small number of operations: create, yield, destroy, and iterate through call
> frames (for garbage collection).
>
> The 'yield' instruction seems somewhat inspired from Python's 'yield'
> statement, which unfortunately has some substantial drawbacks - such as the
> fact that the yield must be in the top-most call frame in order for the
> compiler to know that the function is a generator / coroutine. In other
> words, you can't in Python have a generator which calls a subroutine that
> does the yield. I'd like to avoid that limitation. Scheme and Lua, to name
> just two examples, have much more powerful models for doing continuations
> and coroutines, and I'd like to be able to support those. I think that can
> be done if we supply only the most minimal support on the LLVM side, and
> leave most of the details to the frontend.
>>
>>
>>>
>>> --
>>> Sanjoy Das
>>> http://playingwithpointers.com
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> [hidden email]         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>>
>> --
>>
>> Thanks,
>> Justin Holewinski
>
>
>
> --
> -- Talin
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>



--
-- Talin

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev