Quantcast

[Proposal] Speculative execution of function calls

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Proposal] Speculative execution of function calls

Kuperstein, Michael M

Hello,

 

Chris requested I start a fresh discussion on this, so, here goes. The previous iterations can be found here (and in follow-ups):

 

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130722/182590.html

http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-July/064047.html

 

Cutting to the chase, the goal is to enhance llvm::isSafeToSpeculativelyExecute() to support call instructions.

isSafeToSpeculativelyExecute() is a query that, basically, determines whether it is safe to move an instruction that is executed conditionally into an unconditional context. One common use-case is hoisting loop-invariant instructions out of loops, during loop-invariant code motion.

For example:

 

int foo(int a, int b, int n)

{

  int sum = 0;

  for(int i = 0; i < n; ++i)

 {

    int temp = a + b;

    sum += temp;

  }

}

 

Can be transformed into:

 

int foo(int a, int b, int n)

{

  int sum = 0;

  int temp = a + b;

  for(int i = 0; i < n; ++i)

 {

    sum += temp;

  }

}

 

Because hoisting the addition is safe.

However, code that looks like this is more problematic:

 

int bar(int a);

 

int foo(int n)

{

  int sum = 0;

  for(int i = 0; i < n; ++i)

 {

    int temp = bar(n);

    sum += temp;

  }

}

 

May not, in general, be transformed into

int foo_bad(int n)

{

  int sum = 0;

  int temp = bar(n);

  for(int i = 0; i < n; ++i)

 {

    sum += temp;

  }

}

 

The first issue is that bar() may have side effects, in which case this transformation is clearly unsafe.

Unfortunately, even if bar() is marked “readnone, nounwind”, this is still not a safe transformation. The problem is that the loop is not guaranteed to have even a single iteration, and even readnone functions may not always be safe to call.

So, if bar() is defined like this:

 

int bar(int a)

{

  while(a != 0) {}

  return a;

}

 

Then foo(0) is safe, but foo_bad(0) is an infinite loop.

 

Similarly, if bar() is defined as:

int bar(int a)

{

  return 1000 / a;

}

 

Then foo(0) is safe, but foo_bad(0) has undefined behavior.

 

Unfortunately, right now, there is no way to specify that a function call IS safe to execute under any circumstances. Because of this, llvm::isSafeToSpeculativelyExecute() simply returns false for all Call instructions, except calls to some intrinsics which are special-cased, and are hard-coded into the function.

What I would like to see instead is a function attribute – or a set of function attributes – that would allow isSafeToSpeculativelyExecute() to infer that it may return “true” for a given function call.

This has two main uses:

1)      Intrinsics, including target-dependent intrinsics, can be marked with this attribute – hopefully a lot of intrinsics that do not have explicit side effects and do not rely on global state that is not currently modeled by “readnone” (e.g. rounding mode) will also not have any of the other issues.

2)      DSL Frontends (e.g. OpenCL, my specific domain) will be able to mark library functions they know are safe.

(The optimizer marking user functions as safe seems, to me, like a less likely case)

 

I see two ways to approach this:

a)      Define a new attribute that says, specifically, that a function is safe to execute speculatively (“speculatable”).

b)      Try to define a set of orthogonal attributes that, when all of them are specified, ensure speculative execution is safe, and then add the missing ones.

 

Option (b) sounds better in theory, but I find it problematic for two reasons – it’s not clear both what the precise requirements  for safety are (right now, “I know it when I see it”, and I’m not sure I want to set it in stone), and what the granularity of these orthogonal attributes should be. For example, {readnone, nounwind, halting, welldefined} sounds like a good start, but I’m not sure whether “welldefined” is not too much of a catch-all, or whether this set is, in fact, exhaustive. So I’m more inclined towards (a).

 

I’m attaching a patch that implements option (a) (the same patch from llvm-commits), but feel free to tell me it’s rubbish. :-)

 

Thanks,

   Michael

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

speculatable.diff (30K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Proposal] Speculative execution of function calls

David Chisnall-5
On 31 Jul 2013, at 10:50, "Kuperstein, Michael M" <[hidden email]> wrote:

> This has two main uses:
> 1)      Intrinsics, including target-dependent intrinsics, can be marked with this attribute – hopefully a lot of intrinsics that do not have explicit side effects and do not rely on global state that is not currently modeled by “readnone” (e.g. rounding mode) will also not have any of the other issues.
> 2)      DSL Frontends (e.g. OpenCL, my specific domain) will be able to mark library functions they know are safe.

The slightly orthogonal question to safety is the cost of execution.  For most intrinsics that represent CPU instructions, executing them speculatively is cheaper than a conditional jump, but this is not the case for all (for example, some forms of divide instructions on in-order RISC processors).  For other functions, it's even worse because the cost may be dependent on the input.  Consider as a trivial example the well-loved recursive Fibonacci function.  This is always safe to call speculatively, because it only touches local variables.  It is, however, probably never a good idea to do so.  It's also likely that the cost of a real function call is far more expensive than the elided jump, although this may not be the case on GPUs where divergent flow control is more expensive than redundant execution.  Making this decision requires knowledge of both the target architecture and the complexity of the function, which may be dependent on its inputs.  Even in your examples, some of the functions are only safe to speculatively execute for some subset of their inputs, and you haven't proposed a way of determining this.

I suspect that much of the problem here comes from modelling intrinsics as calls in the IR, when most of them are closer to arithmetic operations.  This means that optimisations have to be aware that some calls are not really calls and so don't cause any flow control effects.  I wonder if it's worth revisiting some of the design of intrinsics and having some notion of target-dependent instructions.  This would also help if anyone wants to try the route discussed at the San Jose DevMeeting last year of progressively lowering machine-independent IR to machine instructions.

A final issue that may be relevant is parallel safety.  On architectures that have very cheap userspace coroutine creation, it may be interesting to speculatively execute some functions in parallel.  On others, I can imagine transforming certain longer-running calls into libdispatch invocations followed by joins.  This, however, requires that you can detect that the call is safe to execute speculatively, doesn't have read dependencies on any shared state that might be modified, and is sufficiently expensive for the overhead of parallel execution to be worth it.  This is probably a lot beyond the scope of the current discussion.

David


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Proposal] Speculative execution of function calls

David Tweed
Hi,

| I suspect that much of the problem here comes from modelling intrinsics as
calls in the IR, when most of them are closer to arithmetic operations.
This means that optimisations have to be aware that
| some calls are not really calls and so don't cause any flow control
effects.  I wonder if it's worth revisiting some of the design of intrinsics
and having some notion of target-dependent
| instructions.  This would also help if anyone wants to try the route
discussed at the San Jose DevMeeting last year of progressively lowering
machine-independent IR to machine instructions.

The only thing I'd say is that I think it's a mistake to try to separate out
real "intrinsics" and "function calls" that should be speculatable when
we're at thte level of platform independent optimizations (afterwards it may
make more sense). Depending how exotic your hardware is there may be a lot
of things that are implemented in a speculation-safe way that you'd like to
represent (at the mid-level) as calls rather than as explicit LLVM IR
intrinsics. For example, I expect which OpenCL "built-in functions" are
functions and which are specialised in hardware varies significantly from
device to device. Having different paths for "speculating" intrinsic and
"functions which may or may not (depending on the back-end) be an intrinsic"
seems to have a lot of potential for very algorithm duplication that's prone
to drifting out of sync.

Cheers,
Dave




_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Proposal] Speculative execution of function calls

Kuperstein, Michael M
In reply to this post by David Chisnall-5
Whether cost is an issue depends on the specific use of speculative execution.

In the context of LICM, I believe it is almost always a good idea to hoist, as loop counts of 0 are relatively rare. This applies especially to expensive functions.
As to the use of speculative execution purely to elide jumps - right now, cost is not a factor in the isSafeTo...() decision in any case. A memory load may also be much more expensive than a jump, but loads, when possible, are still considered safe. So, I think this is indeed orthogonal - cost should be a separate query, perhaps. Some passes may want to perform it (In fact, SimplifyCFG already has an internal ComputeSpeculationCost() method), while others will want to speculate whenever possible (LICM).

As to being safe for only a subset of inputs - if a function is safe only for a subset of inputs, it's not safe, just like a function that is readonly for a subset of inputs is not readonly. ;-)

-----Original Message-----
From: Dr D. Chisnall [mailto:[hidden email]] On Behalf Of David Chisnall
Sent: Wednesday, July 31, 2013 13:56
To: Kuperstein, Michael M
Cc: [hidden email]
Subject: Re: [LLVMdev] [Proposal] Speculative execution of function calls

On 31 Jul 2013, at 10:50, "Kuperstein, Michael M" <[hidden email]> wrote:

> This has two main uses:
> 1)      Intrinsics, including target-dependent intrinsics, can be marked with this attribute - hopefully a lot of intrinsics that do not have explicit side effects and do not rely on global state that is not currently modeled by "readnone" (e.g. rounding mode) will also not have any of the other issues.
> 2)      DSL Frontends (e.g. OpenCL, my specific domain) will be able to mark library functions they know are safe.

The slightly orthogonal question to safety is the cost of execution.  For most intrinsics that represent CPU instructions, executing them speculatively is cheaper than a conditional jump, but this is not the case for all (for example, some forms of divide instructions on in-order RISC processors).  For other functions, it's even worse because the cost may be dependent on the input.  Consider as a trivial example the well-loved recursive Fibonacci function.  This is always safe to call speculatively, because it only touches local variables.  It is, however, probably never a good idea to do so.  It's also likely that the cost of a real function call is far more expensive than the elided jump, although this may not be the case on GPUs where divergent flow control is more expensive than redundant execution.  Making this decision requires knowledge of both the target architecture and the complexity of the function, which may be dependent on its inputs.  Even in your examples, som!
 e of the functions are only safe to speculatively execute for some subset of their inputs, and you haven't proposed a way of determining this.

I suspect that much of the problem here comes from modelling intrinsics as calls in the IR, when most of them are closer to arithmetic operations.  This means that optimisations have to be aware that some calls are not really calls and so don't cause any flow control effects.  I wonder if it's worth revisiting some of the design of intrinsics and having some notion of target-dependent instructions.  This would also help if anyone wants to try the route discussed at the San Jose DevMeeting last year of progressively lowering machine-independent IR to machine instructions.

A final issue that may be relevant is parallel safety.  On architectures that have very cheap userspace coroutine creation, it may be interesting to speculatively execute some functions in parallel.  On others, I can imagine transforming certain longer-running calls into libdispatch invocations followed by joins.  This, however, requires that you can detect that the call is safe to execute speculatively, doesn't have read dependencies on any shared state that might be modified, and is sufficiently expensive for the overhead of parallel execution to be worth it.  This is probably a lot beyond the scope of the current discussion.

David

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Proposal] Speculative execution of function calls

Renato Golin-2
In reply to this post by David Chisnall-5
On 31 July 2013 11:56, David Chisnall <[hidden email]> wrote:
The slightly orthogonal question to safety is the cost of execution.  For most intrinsics that represent CPU instructions, executing them speculatively is cheaper than a conditional jump, but this is not the case for all (for example, some forms of divide instructions on in-order RISC processors).  For other functions, it's even worse because the cost may be dependent on the input.  Consider as a trivial example the well-loved recursive Fibonacci function.  This is always safe to call speculatively, because it only touches local variables.  It is, however, probably never a good idea to do so.  It's also likely that the cost of a real function call is far more expensive than the elided jump, although this may not be the case on GPUs where divergent flow control is more expensive than redundant execution.  Making this decision requires knowledge of both the target architecture and the complexity of the function, which may be dependent on its inputs.  Even in your examples, some of the functions are only safe to speculatively execute for some subset of their inputs, and you haven't proposed a way of determining this.

David,

If I got it right, this is a proposal for a framework to annotate speculative-safe functions, not a pass that will identify all cases. So, yes, different back-ends can annotate their safe intrinsics, front-ends can annotate their safe calls, and it'll always be a small subset, as with most of other optimizations.

As for letting optimization passes use that info, well, it could in theory be possible to count the number of instructions on the callee, and make sure it has no other calls, side-effects or undefined behaviour, and again, that would have to be very conservative.

cheers,
--renato

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Proposal] Speculative execution of function calls

Nick Lewycky-2
On 31 July 2013 05:32, Renato Golin <[hidden email]> wrote:
On 31 July 2013 11:56, David Chisnall <[hidden email]> wrote:
The slightly orthogonal question to safety is the cost of execution.  For most intrinsics that represent CPU instructions, executing them speculatively is cheaper than a conditional jump, but this is not the case for all (for example, some forms of divide instructions on in-order RISC processors).  For other functions, it's even worse because the cost may be dependent on the input.  Consider as a trivial example the well-loved recursive Fibonacci function.  This is always safe to call speculatively, because it only touches local variables.  It is, however, probably never a good idea to do so.  It's also likely that the cost of a real function call is far more expensive than the elided jump, although this may not be the case on GPUs where divergent flow control is more expensive than redundant execution.  Making this decision requires knowledge of both the target architecture and the complexity of the function, which may be dependent on its inputs.  Even in your examples, some of the functions are only safe to speculatively execute for some subset of their inputs, and you haven't proposed a way of determining this.

David,

If I got it right, this is a proposal for a framework to annotate speculative-safe functions, not a pass that will identify all cases. So, yes, different back-ends can annotate their safe intrinsics, front-ends can annotate their safe calls, and it'll always be a small subset, as with most of other optimizations.

As for letting optimization passes use that info, well, it could in theory be possible to count the number of instructions on the callee, and make sure it has no other calls, side-effects or undefined behaviour, and again, that would have to be very conservative.

Correct. Safety and cost are two different things. It's important to remember that safety and cost are two different things.

The issue Michael raised is a great one, can we safely assume no undefined behaviour? Do we have intrinsics today which will exhibit undefined behaviour if called with certain arguments? Absolutely, @llvm.memcpy for one. But is there one which is "nounwind readnone" yet can't be safely speculated? It may be that all nounwind+readnone intrinsics are also speculatable, but it would help justify this patch if we had a counterexample to point to.

That aside, we have a situation where nounwind+readnone intrinsics may be speculated, but nounwind+readnone calls may not. This doesn't quite solve the problem for all users. LLVM may be used in an environment which provides intrinsic-like functions as part of the language, but for whatever reason don't belong as intrinsics (either because they aren't core llvm and can't be upstreamed -- a mythical get_gpu_context -- or because they aren't part of a single target). I don't have this problem myself though. Michael, could you give concrete examples from OpenCL?

Nick


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Loading...