|
Hi,
Over the past few years we've been trying to kill poison somehow. There have been a few proposals, but they've all failed to pass the bar and/or to gather significant support (my own proposals included). We (David, Gil, John, Juneyoung, Sanjoy, Youngju, Yoonseung, and myself) have a new proposal to kill undef instead and replace it with poison + a new 'freeze' instruction. We believe this proposal simplifies things at the IR level, allows us to fix long-standing bugs in LLVM, and is roughly performance-neutral for now (and can enable further optimizations in the future). Sorry for the longish email. We will give a talk about this proposal at the LLVM dev meeting in November, so hopefully we'll be able to discuss this further in person. We would appreciate any comments, feedback, suggestions, etc. If you have questions let me know as well (not everything below is super detailed, so please ask where things are not explicit enough). (I've CC'ed a few people that have been involved in discussions re semantics of LLVM IR in the past year or so. Apologies in advance if I forgot someone -- which probably I did. I've also CC'ed some CodeGen folks, from which we would appreciate input on how this proposal fits within the current and the future pipeline) Thanks, Nuno --------------------------------------------------------- Motivation for undef & poison ============================= There were a few motivations behind the creation of undef and poison: 1) There was a desire to exploit certain undefined behaviors without hurting optimization opportunities. This led to the creation of weaker forms of undefined behavior (UB), while leaving "full" UB to operations that can trap the CPU, like division by zero. Weaker UB operations include, for example, signed integer overflow, say 'a + b'. If signed overflow was full UB, then we couldn't speculatively execute these operations without proving that they could not overflow, otherwise the compiler would be introducing UB in a perfectly fine program. E.g., the compiler couldn't hoist signed additions out of loops. In weaker forms of UB, the triggering of full UB is delayed so that the compiler can speculatively execute these operations and there's only full UB if the result is somehow "consumed". 2) Undef was created to represent uninitialized memory reads. For example, it's very useful in conjunction with PHI nodes: int a; if (foo) a_0 = 42; a_1 = phi(a_0, undef) By using undef, we don't have to materialize some arbitrary constant on a branch where a variable is not initialized. 3) And then people realized that undef wasn't enough. For example, we'd also like to be able to optimize expressions like "((a + 1) >s a)" to "true", which isn't possible with "undef" as defined in (1), since there's no value for `%x` that makes `%x > INT_MAX` true. This was important for loop analyses and certain InstCombine patterns, and so we needed a stronger version of UB which was still not full UB. And poison was born. For example, on 64-bit platforms, poison is very handy for widening induction variables to 64 bits. Problems with undef & poison ============================ The interactions between undef and poison are particularly complicated and they inhibit innocent-looking optimizations. For example, the following is wrong: %v = select %c, %x, undef => %v = %x If %x is poison and %c = false, then we've just replaced undef with poison which is bad, since poison is stronger than undef. So, this is an example of something we want to do but we can't at the moment (even if we still do it anyway, risking miscompilations!) Goal ==== Our goal was to propose minimal changes to the IR semantics such that 1) we could keep most of the optimizations we do today, 2) would require minimal code changes, 3) compilation time, run time, and memory consumption would stay roughly the same, and 4) we could fix all the longstanding bugs due to undef/poison semantics and interactions. We believe this proposal fulfills the goals, and can even enable future optimizations. Proposal ======== 1) Kill undef 2) Create a new poison value (representable in IR, inheriting from llvm::Constant) 3) Create a new instruction, '%y = freeze %x', that stops propagation of poison. If the input is poison, then it returns an arbitrary, but fixed, value. (like old undef, but each use gets the *same* value), otherwise it just returns its input value. 4) All operations over poison return poison. For example: and %x, poison -> poison ; just like before and 0, poison -> poison ; just like before %y = freeze poison %z = and %y, 1 --- 000..0x ; just like old undef %w = xor %y, %y ; 0 -- not undef: both uses of %y have the same value Instruction-specific semantics: - br poison -> UB - select poison, %a, %b -> poison (some InstCombine patterns will need freeze, but they are wrong right now already!) - if %c is not poison, select %c, %a, %b is poison if %c = 1 and %a is poison, or %c = 0 and %b is poison (see discussion below) - bitcast between types of different bitwidth can return poison if when concatenating adjacent values one of these values is poison. For example: %x = bitcast <3 x i2> <2, poison, 2> to <2 x i3> -> %x = <poison, poison> %x = bitcast <6 x i2> <2, poison, 2, 2, 2, 2> to <4 x i3> -> %x = <poison, poison, 5, 2> Purpose of Freeze ================= Poison is propagated aggressively throughout. However, there are cases where this breaks certain optimizations, and therefore freeze is used to selectively stop poison from being propagated. A use of freeze is to enable speculative execution. For example, loop switching does the following transformation: while (C) { if (C2) { A } else { B } } => if (C2) { while (C) A } else { while (C) B } Here we are evaluating C2 before C. If the original loop never executed then we had never evaluated C2, while now we do. So we need to make sure there's no UB for branching on C2. Freeze ensures that so we would actually have 'if (freeze(C2))' instead. Note that having branch on poison not trigger UB has its own problems. We believe this is a good tradeoff. Another use is, for example, to implement bit-fields: a.x = 2 becomes: %v = load %a %v2 = freeze %v ; %v could be uninitialized data (poison) %v3 = ... bitmasking... store %a, %v3 Performance =========== We measured compile time, running time, memory consumption, and object/IR size of a few benchmarks (-O3 vs -O3 w/ freeze). They should give a rough picture of the overhead involved. Benchmarks were run on 2 machines (server1 and server2), both x86_64 running Ubuntu 16.04. Summary: - Memory consumption: there's generally a 1% increase, with the max of 8% in oggenc - Running time on SPEC: mostly in the noise (about 1% up or down) - Compile time: mostly in the noise (about 1% up or down, but usually no difference) - LNT shows 0.5% average slowdown, but with wider swings both ways. We are aware of a few things that would need tweaking to recover perf like loop unrolling (likely because of SCEV not knowing what freeze is). You can see the raw data we have collected in the links below: Memory consumption: https://docs.google.com/spreadsheets/d/1ycJaMPLzh_4YV7RQmVaLR-vHzd7ZlYiV2iES G-lBRbk/edit#gid=0 SPEC running time: https://docs.google.com/spreadsheets/d/1tAwj-Q5raI4rYD7EEg-tJd-Ex533fy9DezIV rbfeIns/edit#gid=0 Compilation time: https://docs.google.com/spreadsheets/d/1_xj6o_ANGcR8xD5Y9rN5VjWbsJaS-GeYKzeW WAR9O2c/edit#gid=0 Statistics on number of Freeze Instructions (up to 5% in total in gcc): https://docs.google.com/spreadsheets/d/1mbOpduooEetIR5i9Db07GHbu72a6mLUmkRIF 1SmZ-6Q/edit#gid=0 LNT raw data: https://github.com/aqjune/freezescript/tree/master/resultcsv-mailinglist/LNT Server1: Intel Core i7 CPU 870 @ 2.93GHz, 8 GBs RAM Server2: Intel Core i5-6600 CPU @ 3.30GHz, 8 GBs RAM Implementation ============== A prototype is available at: - https://github.com/snu-sf/llvm-freeze/tree/x86jmp - https://github.com/snu-sf/clang-freeze This implementation includes: - Make clang emit freeze for bit-field codegen - Add freeze instruction to IR, plus a few instcombine patterns to simply freeze(freeze(x)), for example - Fix a few instcombine select patterns to introduce freeze when needed (these were wrong with current semantics already) - Add a freeze node to selection DAG and add codegen support. For codegen, it uses CopyFromReg+CopyToReg to lower the freeze node, and so it assumes that from that point on LLVM will not propagate undef anymore (to ensure that 'freeze undef' always returns the same value for all uses, which is not what happens with 'undef' on its own). This needs further discussion. - Fix loop unswitch to freeze condition - Change GVN's load widening to load as vector of bits+bitcast (gvload branch only; this was not included in the tests above) (this implementation still uses undef value, but that should be replaced with poison) Deployment ========== A proposal for incremental deployment of the proposed changed: 1) Add freeze instruction + CodeGen support 2) Change clang to start emitting freeze for bit-field reads 3) Add auto-upgrade to convert undef to 'freeze poison' (undef and 'freeze poison' with one use are equivalent) 4) Fix InstCombine, Loop unswitching, etc to use freeze 5) Replace references to undef in the code with either poison or 'freeze poison' 7) Kill undef 8) Investigate any perf regressions 9) Run John's LLVM IR fuzzer with Alive to find leftover bugs Regarding perf, at least PR30615 needs fixing to enable efficient load widening at IR level (which seems it's still undecided whether GVN will continue doing it or not). Discussion on select ==================== Select is a tricky instruction to define semantics for. There are multiple views of select's purpose that are contradictory: 1) Select should be equivalent to arithmetic (e.g., allow 'select %c, true, %x' -> 'or %c, %x' and arithmetic -> select) 2) br + phi -> select should be allowed (simplifyCFG) 3) Select -> br + phi should be allowed To have 1), we need to make select poison if either of its arguments is poison. This disallows 2), since there we need to have select being poison only if the dynamically-chosen value is poison. 2) and 3) are orthogonal and do not conflict, though. 3) is hard because it requires select to be stronger than branching (UB-wise), meaning that we would need select to be UB if the condition was poison. This blocks a bunch of instcombine patterns, like: Pre: C < 0 %r = udiv %a, C => %c = icmp ult %a, C %r = select %c, 0, 1 If %a is poison, then we would be introducing UB. Adding freeze(%c) would fix the problem, though. Since we really want to have 2) since that's simplifyCFG, we propose to make 'select %c, %a, %b' poison if any of the following holds: - %c is poison - %c = true and %a is poison - %c = false and %b is poison This disallows 3) and some transformations of 1). Since 3) is only performed in CodeGenPrepare, we can safely ignore it (no aggressive optimizations are run afterwards). For 1), we will have to restrict InstCombine patterns for the cases when we are sure a given variable is non-poisonous (which is only when it came from a 'freeze' instruction or it's the result of a non-poison-producing instruction). This semantics also allows arithmetic -> select, but not select -> arithmetic (without use of freeze). Acknowledgements ================ I would personally like to thank David, John, and Sanjoy for embarking on this trip over a year ago and spending a lot of time on this project; Gil and his students (Juneyoung, Yoonseung, and Youngju) for prototyping and testing several ideas (and fixing them!); and everybody else that has contributed in the form of off-line and on-line discussions. Future work =========== This proposal only handles integers. It's a good step forward, but we are still missing pointers and floats; we are aware of problems (err, possibilities for improvement) there. We will work on these next. They are orthogonal problems, so what we are proposing today won't require further changes (hopefully). _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
|
----- Original Message -----
> From: "Nuno Lopes" <[hidden email]> > To: [hidden email] > Cc: "Sanjoy Das" <[hidden email]>, "David Majnemer" <[hidden email]>, "John Regehr" > <[hidden email]>, "Chung-Kil Hur" <[hidden email]>, "Juneyoung Lee" <[hidden email]>, > "Yoonseung Kim" <[hidden email]>, "Youngju Song" <[hidden email]>, [hidden email], "Dan > Gohman" <[hidden email]>, [hidden email], [hidden email], [hidden email], "t p northover" > <[hidden email]> > Sent: Tuesday, October 18, 2016 7:06:31 AM > Subject: RFC: Killing undef and spreading poison > > Hi, > > Over the past few years we've been trying to kill poison somehow. > There have > been a few proposals, but they've all failed to pass the bar and/or > to > gather significant support (my own proposals included). > We (David, Gil, John, Juneyoung, Sanjoy, Youngju, Yoonseung, and > myself) > have a new proposal to kill undef instead and replace it with poison > + a new > 'freeze' instruction. We believe this proposal simplifies things at > the IR > level, allows us to fix long-standing bugs in LLVM, and is roughly > performance-neutral for now (and can enable further optimizations in > the > future). > > Sorry for the longish email. We will give a talk about this proposal > at the > LLVM dev meeting in November, so hopefully we'll be able to discuss > this > further in person. > > We would appreciate any comments, feedback, suggestions, etc. If you > have > questions let me know as well (not everything below is super > detailed, so > please ask where things are not explicit enough). > (I've CC'ed a few people that have been involved in discussions re > semantics > of LLVM IR in the past year or so. Apologies in advance if I forgot > someone > -- which probably I did. I've also CC'ed some CodeGen folks, from > which we > would appreciate input on how this proposal fits within the current > and the > future pipeline) > Thanks for working on this! A couple of questions/comments below: ... > > Purpose of Freeze > ================= > Poison is propagated aggressively throughout. However, there are > cases where > this breaks certain optimizations, and therefore freeze is used to > selectively stop poison from being propagated. > > A use of freeze is to enable speculative execution. For example, > loop > switching does the following transformation: > while (C) { > if (C2) { > A > } else { > B > } > } > => > if (C2) { > while (C) > A > } else { > while (C) > B > } > > Here we are evaluating C2 before C. If the original loop never > executed > then we had never evaluated C2, while now we do. So we need to make > sure > there's no UB for branching on C2. Freeze ensures that so we would > actually > have 'if (freeze(C2))' instead. > Note that having branch on poison not trigger UB has its own > problems. Can you please elaborate on these problems? > We > believe this is a good tradeoff. > > ... > > Discussion on select > ==================== > Select is a tricky instruction to define semantics for. There are > multiple > views of select's purpose that are contradictory: > 1) Select should be equivalent to arithmetic (e.g., allow 'select > %c, > true, %x' -> 'or %c, %x' and arithmetic -> select) > 2) br + phi -> select should be allowed (simplifyCFG) > 3) Select -> br + phi should be allowed > > To have 1), we need to make select poison if either of its arguments > is > poison. This disallows 2), since there we need to have select being > poison > only if the dynamically-chosen value is poison. > 2) and 3) are orthogonal and do not conflict, though. > > 3) is hard because it requires select to be stronger than branching > (UB-wise), meaning that we would need select to be UB if the > condition was > poison. This blocks a bunch of instcombine patterns, like: > Pre: C < 0 > %r = udiv %a, C > => > %c = icmp ult %a, C > %r = select %c, 0, 1 > > If %a is poison, then we would be introducing UB. Adding freeze(%c) > would > fix the problem, though. > > > Since we really want to have 2) since that's simplifyCFG, we propose > to make > 'select %c, %a, %b' poison if any of the following holds: > - %c is poison > - %c = true and %a is poison > - %c = false and %b is poison > > This disallows 3) and some transformations of 1). Since 3) is only > performed > in CodeGenPrepare, we can safely ignore it (no aggressive > optimizations are > run afterwards). This strikes me as a dangerous game to play. CGP's transformations need to be semantically sound (even if they are anti-canonical). In part, this is because our IR-level analyses are used by CGP. Moreover, targets after CGP run IR-level transformations, and having different semantic rules there is going to make code reuse hard in subtle ways. Which brings up another issue: We currently have undef at the MI level; do you propose changing that, and if so, how? -Hal > For 1), we will have to restrict InstCombine > patterns for > the cases when we are sure a given variable is non-poisonous (which > is only > when it came from a 'freeze' instruction or it's the result of a > non-poison-producing instruction). > This semantics also allows arithmetic -> select, but not select -> > arithmetic (without use of freeze). > ... > > -- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
|
>> A use of freeze is to enable speculative execution. For example, loop
>> switching does the following transformation: >> while (C) { >> if (C2) { >> A >> } else { >> B >> } >> } >> => >> if (C2) { >> while (C) >> A >> } else { >> while (C) >> B >> } >> >> Here we are evaluating C2 before C. If the original loop never >> executed then we had never evaluated C2, while now we do. So we need >> to make sure there's no UB for branching on C2. Freeze ensures that >> so we would actually have 'if (freeze(C2))' instead. >> Note that having branch on poison not trigger UB has its own problems. > > Can you please elaborate on these problems? Sure! For example, the following transformation would be wrong if branch on poison was a non-deterministic branch instead of UB: %a = add i32 %x, 1 %c = icmp eq i32 %a, %poison br i1 %c, label %bb1, label %bb2 bb1: %b = add i32 %x, 1 %d = call i32 @bar(i32 %b) -> bb1: %d = call i32 @bar(i32 % poison) GVN will perform this kind of transformation: it concludes that %a, %b, and %poison are all equal and picks one representative value. However, these values are equal only when they are not poison. If %poison is indeed poison, then the transformation is wrong because before it was passing an ok value to bar() and after the transformation it is passing poison. On the other hand, if branching on poison is UB, the original program was executing UB already because %poison (and therefore %c) were poison. So the transformation is ok. >> Discussion on select >> ==================== >> Select is a tricky instruction to define semantics for. There are >> multiple views of select's purpose that are contradictory: >> 1) Select should be equivalent to arithmetic (e.g., allow 'select >> %c, true, %x' -> 'or %c, %x' and arithmetic -> select) >> 2) br + phi -> select should be allowed (simplifyCFG) >> 3) Select -> br + phi should be allowed >> >> Since we really want to have 2) since that's simplifyCFG, we propose >> to make 'select %c, %a, %b' poison if any of the following holds: >> - %c is poison >> - %c = true and %a is poison >> - %c = false and %b is poison >> >> This disallows 3) and some transformations of 1). Since 3) is only >> performed in CodeGenPrepare, we can safely ignore it (no aggressive >> optimizations are run afterwards). > > This strikes me as a dangerous game to play. CGP's transformations need to be semantically sound (even if they are anti-canonical). In part, this is because our IR-level analyses are used by CGP. Moreover, targets after CGP run IR-level transformations, and having different semantic rules there is going to make code reuse hard in subtle ways. Which brings up another issue: We currently have undef at the MI level; do you propose changing that, and if so, how? It is sound to do the transformation at IR level if you freeze the condition. My suggestion was to avoid introducing overhead in CGP. This is the current state of affairs and it seems to work right now. I'm not shocked to see the IR changing the semantics throughout the pipeline, since while the compiler proceeds, the type of transformations done also change. But I have to say that my understanding of LLVM after CGP is very limited. I was not aware of the code reuse from IR-level analyses. I agree this needs to be scrutinized carefully. Alternatively, we can just introduce freeze and pay the price (likely low anyway). Regarding undef at MI level, we are not proposing any change. I don't see any reason right now to change it since it seems that optimizations at MI level are fine with just having undef. There's no nsw/nuw stuff there and undef seems very useful. Poison is lowered into undef. And freeze is lowered into this pair of copy nodes that seems to stop propagation of undef (to ensure all uses see the same value). I don't know enough about MI to know if there's a better way to lower freeze, though. Thanks, Nuno _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
|
In reply to this post by via llvm-dev
On Tue, Oct 18, 2016 at 3:06 PM, Nuno Lopes via llvm-dev <[hidden email]> wrote: Hi, Ok, freeze() produces a fixed but undefined value, so that xor'ing the same frozen value produces 0, ANDing a frozen value with 1 produces known zero bits everywhere except the LSB, which is undefined. Does every instance of freeze() produce a *different* fixed but undefined value? i.e. the value produced by freeze() is labelled with a sequence number or something like that? _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
|
> Ok, freeze() produces a fixed but undefined value, so that xor'ing the same frozen value produces 0, ANDing a frozen value with 1 produces known zero bits everywhere except the LSB, which is undefined.
> > Does every instance of freeze() produce a *different* fixed but undefined value? i.e. the value produced by freeze() is labelled with a sequence number or something like that? Freeze produces an arbitrary value. The compiler is free to pick a "good" one (one that enables some optimization) if it wishes so. There's no determination of these values. It's similar to current undef, except that all uses see the same value. Nuno _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
|
In reply to this post by via llvm-dev
----- Original Message -----
> From: "Nuno Lopes" <[hidden email]> > To: "Hal Finkel" <[hidden email]> > Cc: "Sanjoy Das" <[hidden email]>, "David Majnemer" <[hidden email]>, "John Regehr" > <[hidden email]>, "Chung-Kil Hur" <[hidden email]>, "Juneyoung Lee" <[hidden email]>, > "Yoonseung Kim" <[hidden email]>, "Youngju Song" <[hidden email]>, "Dan Gohman" > <[hidden email]>, [hidden email], [hidden email], [hidden email], "t p northover" > <[hidden email]>, [hidden email] > Sent: Tuesday, October 18, 2016 10:10:41 AM > Subject: RE: RFC: Killing undef and spreading poison > > >> A use of freeze is to enable speculative execution. For example, > >> loop > >> switching does the following transformation: > >> while (C) { > >> if (C2) { > >> A > >> } else { > >> B > >> } > >> } > >> => > >> if (C2) { > >> while (C) > >> A > >> } else { > >> while (C) > >> B > >> } > >> > >> Here we are evaluating C2 before C. If the original loop never > >> executed then we had never evaluated C2, while now we do. So we > >> need > >> to make sure there's no UB for branching on C2. Freeze ensures > >> that > >> so we would actually have 'if (freeze(C2))' instead. > >> Note that having branch on poison not trigger UB has its own > >> problems. > > > > Can you please elaborate on these problems? > > Sure! For example, the following transformation would be wrong if > branch on poison was a non-deterministic branch instead of UB: > > %a = add i32 %x, 1 > %c = icmp eq i32 %a, %poison > br i1 %c, label %bb1, label %bb2 > bb1: > %b = add i32 %x, 1 > %d = call i32 @bar(i32 %b) > -> > bb1: > %d = call i32 @bar(i32 % poison) > > GVN will perform this kind of transformation: it concludes that %a, > %b, and %poison are all equal and picks one representative value. > However, these values are equal only when they are not poison. Okay; so the problem is that an instruction that is value-equivalent to a poison value is not itself necessarily poison? > If > %poison is indeed poison, then the transformation is wrong because > before it was passing an ok value to bar() and after the > transformation it is passing poison. > On the other hand, if branching on poison is UB, the original program > was executing UB already because %poison (and therefore %c) were > poison. So the transformation is ok. > > > >> Discussion on select > >> ==================== > >> Select is a tricky instruction to define semantics for. There are > >> multiple views of select's purpose that are contradictory: > >> 1) Select should be equivalent to arithmetic (e.g., allow > >> 'select > >> %c, true, %x' -> 'or %c, %x' and arithmetic -> select) > >> 2) br + phi -> select should be allowed (simplifyCFG) > >> 3) Select -> br + phi should be allowed > >> > >> Since we really want to have 2) since that's simplifyCFG, we > >> propose > >> to make 'select %c, %a, %b' poison if any of the following holds: > >> - %c is poison > >> - %c = true and %a is poison > >> - %c = false and %b is poison > >> > >> This disallows 3) and some transformations of 1). Since 3) is only > >> performed in CodeGenPrepare, we can safely ignore it (no > >> aggressive > >> optimizations are run afterwards). > > > > This strikes me as a dangerous game to play. CGP's transformations > > need to be semantically sound (even if they are anti-canonical). > > In part, this is because our IR-level analyses are used by CGP. > > Moreover, targets after CGP run IR-level transformations, and > > having different semantic rules there is going to make code reuse > > hard in subtle ways. Which brings up another issue: We currently > > have undef at the MI level; do you propose changing that, and if > > so, how? > > It is sound to do the transformation at IR level if you freeze the > condition. > My suggestion was to avoid introducing overhead in CGP. This is the > current state of affairs and it seems to work right now. I'm not > shocked to see the IR changing the semantics throughout the > pipeline, since while the compiler proceeds, the type of > transformations done also change. > But I have to say that my understanding of LLVM after CGP is very > limited. I was not aware of the code reuse from IR-level analyses. > I agree this needs to be scrutinized carefully. Alternatively, we > can just introduce freeze and pay the price (likely low anyway). I think we'd need to do this to ensure consistency; I don't think that the price would be high, especially because we soon drop into different representations and the IR is used only for analysis (for pointer aliasing, etc.). > > Regarding undef at MI level, we are not proposing any change. I > don't see any reason right now to change it since it seems that > optimizations at MI level are fine with just having undef. There's > no nsw/nuw stuff there and undef seems very useful. We've already started propagating nsw/nuw into the SDAG, and as we move toward GlobalISel, we'll have them at the MI layer as well. I think we'll need to consider how to propagate this information to the MI layer directly. -Hal > Poison is lowered into undef. And freeze is lowered into this pair of > copy nodes that seems to stop propagation of undef (to ensure all > uses see the same value). I don't know enough about MI to know if > there's a better way to lower freeze, though. > > Thanks, > Nuno > > -- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
|
In reply to this post by via llvm-dev
On 10/18/2016 5:06 AM, Nuno Lopes via llvm-dev wrote:
> Another use is, for example, to implement bit-fields: > a.x = 2 > becomes: > %v = load %a > %v2 = freeze %v ; %v could be uninitialized data (poison) > %v3 = ... bitmasking... > store %a, %v3 It seems like you're saying that an integer load which touches any uninitialized byte of memory results in poison. Therefore, load %a simplifies to "poison", and your proposed lowering of a bitfield write throws away the value of every other adjacent bitfield. Am I missing something? -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
|
> On 10/18/2016 5:06 AM, Nuno Lopes via llvm-dev wrote:
>> Another use is, for example, to implement bit-fields: >> a.x = 2 >> becomes: >> %v = load %a >> %v2 = freeze %v ; %v could be uninitialized data (poison) >> %v3 = ... bitmasking... >> store %a, %v3 > > It seems like you're saying that an integer load which touches any > uninitialized byte of memory results in poison. Therefore, load %a > simplifies to "poison", and your proposed lowering of a bitfield write > throws away the value of every other adjacent bitfield. Am I missing > something? Right, a load touching a single uninitialized bit results in poison. The trick is that on the first bitfield write, all the remaining untouched fields become initialized (with an arbitrary value, though). Essentially we are making the adjacent bitfields undef. So if you have: struct foo a; a.x = 2; a.y = 3; IR becomes: %a = alloca foo %x = load %a %x2 = freeze %v ; %x2 not poison %x3 = bitmasking ; %x3 not poison store %a, %x3 %y = load %a %y2 = freeze %y ; not needed; %y is not poison etc.. Nuno _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
|
In reply to this post by via llvm-dev
>> >> Here we are evaluating C2 before C. If the original loop never
>> >> executed then we had never evaluated C2, while now we do. So we >> >> need >> >> to make sure there's no UB for branching on C2. Freeze ensures >> >> that >> >> so we would actually have 'if (freeze(C2))' instead. >> >> Note that having branch on poison not trigger UB has its own >> >> problems. >> > >> > Can you please elaborate on these problems? >> >> Sure! For example, the following transformation would be wrong if >> branch on poison was a non-deterministic branch instead of UB: >> >> %a = add i32 %x, 1 >> %c = icmp eq i32 %a, %poison >> br i1 %c, label %bb1, label %bb2 >> bb1: >> %b = add i32 %x, 1 >> %d = call i32 @bar(i32 %b) >> -> >> bb1: >> %d = call i32 @bar(i32 % poison) >> >> GVN will perform this kind of transformation: it concludes that %a, >> %b, and %poison are all equal and picks one representative value. >> However, these values are equal only when they are not poison. > > Okay; so the problem is that an instruction that is value-equivalent to a > poison value is not itself necessarily poison? Right. I think there were other examples, but I don't have them here handy. But at least this one is very problematic for GVN. >> >> Discussion on select >> >> ==================== >> >> Select is a tricky instruction to define semantics for. There are >> >> multiple views of select's purpose that are contradictory: >> >> 1) Select should be equivalent to arithmetic (e.g., allow >> >> 'select >> >> %c, true, %x' -> 'or %c, %x' and arithmetic -> select) >> >> 2) br + phi -> select should be allowed (simplifyCFG) >> >> 3) Select -> br + phi should be allowed >> >> >> >> Since we really want to have 2) since that's simplifyCFG, we >> >> propose >> >> to make 'select %c, %a, %b' poison if any of the following holds: >> >> - %c is poison >> >> - %c = true and %a is poison >> >> - %c = false and %b is poison >> >> >> >> This disallows 3) and some transformations of 1). Since 3) is only >> >> performed in CodeGenPrepare, we can safely ignore it (no >> >> aggressive >> >> optimizations are run afterwards). >> > >> > This strikes me as a dangerous game to play. CGP's transformations >> > need to be semantically sound (even if they are anti-canonical). >> > In part, this is because our IR-level analyses are used by CGP. >> > Moreover, targets after CGP run IR-level transformations, and >> > having different semantic rules there is going to make code reuse >> > hard in subtle ways. Which brings up another issue: We currently >> > have undef at the MI level; do you propose changing that, and if >> > so, how? >> >> It is sound to do the transformation at IR level if you freeze the >> condition. >> My suggestion was to avoid introducing overhead in CGP. This is the >> current state of affairs and it seems to work right now. I'm not >> shocked to see the IR changing the semantics throughout the >> pipeline, since while the compiler proceeds, the type of >> transformations done also change. >> But I have to say that my understanding of LLVM after CGP is very >> limited. I was not aware of the code reuse from IR-level analyses. >> I agree this needs to be scrutinized carefully. Alternatively, we >> can just introduce freeze and pay the price (likely low anyway). > > I think we'd need to do this to ensure consistency; I don't think that the > price would be high, especially because we soon drop into different > representations and the IR is used only for analysis (for pointer > aliasing, etc.). Ok, makes sense; thanks! >> Regarding undef at MI level, we are not proposing any change. I >> don't see any reason right now to change it since it seems that >> optimizations at MI level are fine with just having undef. There's >> no nsw/nuw stuff there and undef seems very useful. > > We've already started propagating nsw/nuw into the SDAG, and as we move > toward GlobalISel, we'll have them at the MI layer as well. I think we'll > need to consider how to propagate this information to the MI layer > directly. I see. Then it might make sense to introduce poison in MI then. But only if there's a plan to start doing optimizations that leverage nsw/nuw at MI level as well. Otherwise undef is just fine. I guess we need to do this in phases, though, to avoid breaking everything at the same time :) Nuno _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
|
Hi,
Nuno Lopes wrote: >> Okay; so the problem is that an instruction that is value-equivalent >> to a poison value is not itself necessarily poison? > > Right. > I think there were other examples, but I don't have them here handy. But > at least this one is very problematic for GVN. Another example is this: void f(int k) { if (k != 0) { for (< finite loop >) { if (always_false_at_runtime) { print(1 / k); } } } } We'd like to hoist the `1 / k` computation to the preheader. However, we can't do that today if `k` is undef, and we've defined branching on undef to be a non-deterministic choice. I have some more details at: http://www.playingwithpointers.com/problem-with-undef.html -- Sanjoy _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
|
In reply to this post by via llvm-dev
On 10/18/2016 2:25 PM, Nuno Lopes via llvm-dev wrote:
> Right, a load touching a single uninitialized bit results in poison. > The trick is that on the first bitfield write, all the remaining > untouched fields become initialized (with an arbitrary value, though). > Essentially we are making the adjacent bitfields undef. So "undef" still exists, except now it's obtainable via "freeze(poison)"? -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
|
Hi Krzysztof,
freeze(poison) is different from undef today, in the sense that it is an instruction that produces some random, but fixed bit pattern. E.g. today in %x = undef %y = xor %x, %x we can fold %y to undef since each use of %x can independently see some arbitrary (up to the compiler / environment) bit pattern. But in the new proposal, in: %x = freeze(poison) %y = xor %x, %x that is no longer allowed (%y _has_ to be 0) -- all uses of %x will see some garbage, but fixed bit pattern. -- Sanjoy Krzysztof Parzyszek via llvm-dev wrote: > On 10/18/2016 2:25 PM, Nuno Lopes via llvm-dev wrote: >> Right, a load touching a single uninitialized bit results in poison. >> The trick is that on the first bitfield write, all the remaining >> untouched fields become initialized (with an arbitrary value, though). >> Essentially we are making the adjacent bitfields undef. > > So "undef" still exists, except now it's obtainable via "freeze(poison)"? > > -Krzysztof > LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
|
On 10/18/2016 3:12 PM, Sanjoy Das wrote:
> But in the new proposal, in: > > %x = freeze(poison) > %y = xor %x, %x > > that is no longer allowed (%y _has_ to be 0) -- all uses of %x will see > some garbage, but fixed bit pattern. What about this: %x = phi poison, poison (I'm simplifying the syntax here) Can this be simplified to "%x = poison", i.e. can we rauw(%x, poison)? Or %x = load %uninitialized_var %y = load %uninitialized_var // are %x and %y equal (i.e. is "cmp eq %x, %y" == true)? // is freeze(%x) equal to freeze(%y)? I'm wary about such rules. I have a feeling that this is going to create its own set of problems. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
|
In reply to this post by via llvm-dev
On 10/18/2016 12:25 PM, Nuno Lopes wrote:
>> On 10/18/2016 5:06 AM, Nuno Lopes via llvm-dev wrote: >>> Another use is, for example, to implement bit-fields: >>> a.x = 2 >>> becomes: >>> %v = load %a >>> %v2 = freeze %v ; %v could be uninitialized data (poison) >>> %v3 = ... bitmasking... >>> store %a, %v3 >> >> It seems like you're saying that an integer load which touches any >> uninitialized byte of memory results in poison. Therefore, load %a >> simplifies to "poison", and your proposed lowering of a bitfield >> write throws away the value of every other adjacent bitfield. Am I >> missing something? > > Right, a load touching a single uninitialized bit results in poison. > The trick is that on the first bitfield write, all the remaining > untouched fields become initialized (with an arbitrary value, though). > Essentially we are making the adjacent bitfields undef. > > So if you have: > struct foo a; > a.x = 2; > a.y = 3; > > IR becomes: > %a = alloca foo > %x = load %a > %x2 = freeze %v ; %x2 not poison > %x3 = bitmasking ; %x3 not poison > store %a, %x3 > > %y = load %a > %y2 = freeze %y ; not needed; %y is not poison > etc.. > > Nuno carefully. Sort of related testcase: struct S { int x : 24; }; S s = { 2 }; Gives: @s = local_unnamed_addr global { i8, i8, i8, i8 } { i8 2, i8 0, i8 0, i8 undef }, align 4 When undef goes away, clang will need to be patched to generate zero here? More generally, will struct padding in globals be poison, or some arbitrary value? A couple of related topics: Instcombine currently transforms "memcpy(dst, src, 4)" to "load i32 src; store i32 dst". I guess that's illegal under this model? How about the related transform "memcpy(dst, src, 4)" to "load <4 x i8> src; store <4 x i8> dst"? (SROA also performs similar transforms.) Have you given any thought to how auto-upgrade for bitcode files would work? I guess in general, you can upgrade an old "load i32" to "bitcast (freeze (load <4 x i8>)) to i32", but that's really awkward. -Eli -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
|
In reply to this post by via llvm-dev
I guess, it may help if we don't have a literal for poison (as we do for
undef). Then freeze(%x) would always be equal to freeze(%x) and there would be no question about freeze(poison). -Krzysztof On 10/18/2016 3:22 PM, Krzysztof Parzyszek via llvm-dev wrote: > On 10/18/2016 3:12 PM, Sanjoy Das wrote: >> But in the new proposal, in: >> >> %x = freeze(poison) >> %y = xor %x, %x >> >> that is no longer allowed (%y _has_ to be 0) -- all uses of %x will see >> some garbage, but fixed bit pattern. > > What about this: > %x = phi poison, poison (I'm simplifying the syntax here) > Can this be simplified to "%x = poison", i.e. can we rauw(%x, poison)? > > Or > %x = load %uninitialized_var > %y = load %uninitialized_var > // are %x and %y equal (i.e. is "cmp eq %x, %y" == true)? > // is freeze(%x) equal to freeze(%y)? > > I'm wary about such rules. I have a feeling that this is going to create > its own set of problems. > > -Krzysztof > > -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
|
In reply to this post by via llvm-dev
>> Right, a load touching a single uninitialized bit results in poison.
>> The trick is that on the first bitfield write, all the remaining >> untouched fields become initialized (with an arbitrary value, though). >> Essentially we are making the adjacent bitfields undef. >> >> So if you have: >> struct foo a; >> a.x = 2; >> a.y = 3; >> >> IR becomes: >> %a = alloca foo >> %x = load %a >> %x2 = freeze %v ; %x2 not poison >> %x3 = bitmasking ; %x3 not poison >> store %a, %x3 >> >> %y = load %a >> %y2 = freeze %y ; not needed; %y is not poison >> etc.. > > Oh, okay, that works. I should have thought that through a bit more > carefully. I must say I got scared about bitfields a few weeks ago as well :) > Sort of related testcase: > > struct S { int x : 24; }; > S s = { 2 }; > > Gives: > > @s = local_unnamed_addr global { i8, i8, i8, i8 } { i8 2, i8 0, i8 0, i8 > undef }, align 4 > > When undef goes away, clang will need to be patched to generate zero here? > More generally, will struct padding in globals be poison, or some > arbitrary value? Padding can safely become poison. > Instcombine currently transforms "memcpy(dst, src, 4)" to "load i32 src; > store i32 dst". I guess that's illegal under this model?How about the > related transform "memcpy(dst, src, 4)" to "load <4 x i8> src; store <4 x > i8> dst"? (SROA also performs similar transforms.) The first is illegal, you're right (unless you know that they aren't poison, of course). The second version with vectors is correct. The problem is that codegen for vector loads/stores doesn't seem optimal ATM. We've seen really bad cases (e.g., http://llvm.org/PR30615) > Have you given any thought to how auto-upgrade for bitcode files would > work? I guess in general, you can upgrade an old "load i32" to "bitcast > (freeze (load <4 x i8>)) to i32", but that's really awkward. Ah, good point, I forgot about that. I was assuming auto-upgrade only needed to change undef into "freeze poison", but you're right that load semantics are also changing and would need patching as well. It actually needs to be "load <32 x i1>" because undef is per bit and one bit being poison cannot taint the whole byte. Of course we could have a few optimizations for common patterns such as (load (alloca)) -> (freeze (load (alloca)), but ATM I don't see any way around the pattern you suggest. Thanks, Nuno _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
|
In reply to this post by via llvm-dev
> On 10/18/2016 3:12 PM, Sanjoy Das wrote:
>> But in the new proposal, in: >> >> %x = freeze(poison) >> %y = xor %x, %x >> >> that is no longer allowed (%y _has_ to be 0) -- all uses of %x will see >> some garbage, but fixed bit pattern. > > What about this: > %x = phi poison, poison (I'm simplifying the syntax here) > Can this be simplified to "%x = poison", i.e. can we rauw(%x, poison)? Yes, that's ok. > %x = load %uninitialized_var > %y = load %uninitialized_var > // are %x and %y equal (i.e. is "cmp eq %x, %y" == true)? > // is freeze(%x) equal to freeze(%y)? "icmp %x, %y" would be poison (so you can chose true or false if you wish). There's no change here with respect to current poison semantics. freeze(%x) is not necessarily the same as freeze(%y). Even %a and %b might not be the same in "%a = freeze(%x), %b = freeze(%x)" (each freeze returns an arbitrary, but fixed, value). Nuno _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
|
In reply to this post by via llvm-dev
> On Oct 18, 2016, at 12:44 PM, Sanjoy Das via llvm-dev <[hidden email]> wrote: > > Hi, > > Nuno Lopes wrote: > >>> Okay; so the problem is that an instruction that is value-equivalent >>> to a poison value is not itself necessarily poison? >> >> Right. >> I think there were other examples, but I don't have them here handy. But >> at least this one is very problematic for GVN. > > Another example is this: > > void f(int k) { > if (k != 0) { > for (< finite loop >) { > if (always_false_at_runtime) { > print(1 / k); > } > } > } > } > > We'd like to hoist the `1 / k` computation to the preheader. However, we can't do that today if `k` is undef, and we've defined branching on undef to be a non-deterministic choice. This one isn’t clear to me: you can fold 1/k -> 1 when k is undef. And then it is a constant which makes no point to hoist? — Mehdi > > I have some more details at: http://www.playingwithpointers.com/problem-with-undef.html > > -- Sanjoy > _______________________________________________ > LLVM Developers mailing list > [hidden email] > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
|
In reply to this post by via llvm-dev
We need a literal like undef to e.g. initialize padding, use in SROA for phi
node entries from branches where a variable is not initialized, constant folding, etc. We are just proposing to get rid of undef altogether and call it poison instead. It makes the compiler a bit more aggressive re undefined behavior, though. If freeze(%x) is duplicated it can return a different value. Not having a poison literal doesn't change that. Nuno -----Original Message----- From: Krzysztof Parzyszek via llvm-dev Sent: Tuesday, October 18, 2016 10:09 PM Subject: Re: [llvm-dev] RFC: Killing undef and spreading poison I guess, it may help if we don't have a literal for poison (as we do for undef). Then freeze(%x) would always be equal to freeze(%x) and there would be no question about freeze(poison). -Krzysztof On 10/18/2016 3:22 PM, Krzysztof Parzyszek via llvm-dev wrote: > On 10/18/2016 3:12 PM, Sanjoy Das wrote: >> But in the new proposal, in: >> >> %x = freeze(poison) >> %y = xor %x, %x >> >> that is no longer allowed (%y _has_ to be 0) -- all uses of %x will see >> some garbage, but fixed bit pattern. > > What about this: > %x = phi poison, poison (I'm simplifying the syntax here) > Can this be simplified to "%x = poison", i.e. can we rauw(%x, poison)? > > Or > %x = load %uninitialized_var > %y = load %uninitialized_var > // are %x and %y equal (i.e. is "cmp eq %x, %y" == true)? > // is freeze(%x) equal to freeze(%y)? > > I'm wary about such rules. I have a feeling that this is going to create > its own set of problems. > > -Krzysztof > > -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
|
In reply to this post by via llvm-dev
>>>> Okay; so the problem is that an instruction that is value-equivalent
>>>> to a poison value is not itself necessarily poison? >>> >>> Right. >>> I think there were other examples, but I don't have them here handy. But >>> at least this one is very problematic for GVN. >> >> Another example is this: >> >> void f(int k) { >> if (k != 0) { >> for (< finite loop >) { >> if (always_false_at_runtime) { >> print(1 / k); >> } >> } >> } >> } >> >> We'd like to hoist the `1 / k` computation to the preheader. However, we >> can't do that today if `k` is undef, and we've defined branching on undef >> to be a non-deterministic choice. > > This one isn’t clear to me: you can fold 1/k -> 1 when k is undef. And > then it is a constant which makes no point to hoist? Imagine that k is not a constant. Then you would like to hoist it to outside of the loop. It seems safe to hoist it since we know k != 0 by the enclosing 'if'. And so we transform the function into: if (k != 0) { int t = 1 / k; for (< finite loop >) { if (always_false_at_runtime) { print(t); } } } Now you realize that k is undef and you get: if (<non-deterministic branch>) { int t = 1 / undef; for (< finite loop >) { if (always_false_at_runtime) { print(t); } } } We can know chose to change the if condition to true and the undef in the division to zero (remember each use of undef can yield a different result). Now we have undefined behavior (UB) because we've introduced a division by zero (which would not happen in the original program). If branching on poison was UB instead then the original program was already executing UB so the transformation was fine. This example can be fixed by freezing k instead. That way we ensure that k is actually non-zero within the 'if' statement. Nuno _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
| Free forum by Nabble | Edit this page |
