[llvm-dev] RFC: Killing undef and spreading poison

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[llvm-dev] RFC: Killing undef and spreading poison

Hal Finkel via llvm-dev
Sanjoy,
            My answer is that step 3, commuting the multiply with the sign-extends, is invalid,
As this is what causes the `udiv` to fault.  I think it is invalid with or without the “freeze”,
why do you think step 3, the commute, without the “freeze" is valid ?


Also, do you think you can come up with an example that does not depend on signed
overflow being “undefined” ?


Peter Lawrence.





  



Hi John,

On 11/11/16, 1:58 PM, Sanjoy Das wrote:
 >> I'm saying that you still get reasonable behavior from reasoning about
 >> counterfactuals for these things, and I haven't been convinced that you
 >> lose optimization power except in code that actually exhibits undefined
 >> behavior.
 >
 > I'm not sure I understood your concern fully, but you could have cases like:
 >
 > for (...) {
 > if (condition){
 > i32 prod = x *nsw y
 > i64 prod.sext = sext prod to i64
 > i64 t = K `udiv` (-1 + (sum.prod >> 32))
 > use(t)
 > }
 > }

I'm sure it is obvious to you, but there's a typo here -- it should
have been "i64 t = K `udiv` (-1 + (prod.sext >> 32))"

 > The C program this came from is well defined in two cases (that are relevant here):
 >
 > 1. condition is false
 > 2. condition is true and x * y does not sign overflow
 >
 > We'd like to LICM the entire expression tree out of control flow in a way that it helps the well defined case (2)
 > without introducing a fault in the well defined case (1).

To be a 100% explicit, by (1) I mean a situation where:

   a. condition is false
   b. (say) x is 1 << 20, and y is 1 << 12 so "x *nsw y" does sign
      overflow, and ((sext(x) * sext(y)) >> 32) is 1

I'll also spell out what I was trying to say more explicitly, since we
can easily sink too much time talking about subtly different things
otherwise (has happened in the past :) ):

Firstly consider this sequence of transforms:

   for (...) {
     if (condition){
       i32 prod = x *nsw y
       i64 prod.sext = sext prod to i64
       i64 t = K `udiv` (-1 + (sum.prod >> 32))
       use(t)
     }
   }

==> (step 1) multiplication and sext can be speculated

   i32 prod = x *nsw y
   i64 prod.sext = sext prod to i64
   for (...) {
     if (condition){
       i64 t = K `udiv` (-1 + (prod.sext >> 32))
       use(t)
     }
   }

==> (step 2) known bits analysis on sext to prove divisor is non-zero

   i32 prod = x *nsw y
   i64 prod.sext = sext prod to i64
   i64 t = K `udiv` (-1 + (prod.sext >> 32))
   for (...) {
     if (condition){
       use(t)
     }
   }

==> (step 3) commute sext through nsw multiplicationa

   ;; i32 prod = x *nsw y
   i64 prod.sext = (sext x) *nsw (sext y)
   i64 t = K `udiv` (-1 + (prod.sext >> 32))
   for (...) {
     if (condition){
       use(t)
     }
   }


Now we've managed to introduce a fault if we were in case 1 -- the
source program was well defined but the target program divides by
zero.

This means whatever scheme / semantics / "rule book" LLVM implements
has to outlaw at least one of the steps above.

For the freeze/poison scheme that step is "step 2".  Under the
freeze/poison scheme step 2 will be

   i32 prod = x *nsw y
   i64 prod.sext = sext prod to i64
   for (...) {
     if (condition){
       i64 t = K `udiv` (-1 + (prod.sext >> 32))
       use(t)
     }
   }

==> (step 2) known bits analysis on sext to prove divisor is non-zero

   i32 prod = x *nsw y
   i32 prod.frozen = freeze(prod)
   i64 prod.sext = sext prod.frozen to i64
   i64 t = K `udiv` (-1 + (prod.sext >> 32))
   for (...) {
     if (condition){
       use(t)
     }
   }

Which would further outlaw step 3.

If we go with counterfactual reasoning, then what is that step above
that is illegal?

-- Sanjoy


 



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] RFC: Killing undef and spreading poison

Hal Finkel via llvm-dev
Hi Peter,

On Thu, Jun 1, 2017 at 6:18 PM, Peter Lawrence via llvm-dev
<[hidden email]> wrote:
>             My answer is that step 3, commuting the multiply with the
> sign-extends, is invalid,
> As this is what causes the `udiv` to fault.  I think it is invalid with or
> without the “freeze”,
> why do you think step 3, the commute, without the “freeze" is valid ?

I think it is valid by definition -- since "A *nsw B" does not sign
wrap, "sext(A *nsw B)" == (sext A) * (sext B).  In other words, these
two expressions are inequal only if "A *nsw B" sign wraps.

> Also, do you think you can come up with an example that does not depend on
> signed
> overflow being “undefined” ?

Can you be more specific?  Example that does what?

-- Sanjoy
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] RFC: Killing undef and spreading poison

Hal Finkel via llvm-dev
Sanjay,
            Your original example showed end-to-end-miscompilation,
After a sequence of transformations,
Where “undefined behavior” is a part of the problem.


My question is can you show some additional examples that do not rely on “nsw” or “nuw” ?

Thanks,
Peter Lawrence.



On Jun 4, 2017, at 9:13 PM, Sanjoy Das <[hidden email]> wrote:

Hi Peter,

On Thu, Jun 1, 2017 at 6:18 PM, Peter Lawrence via llvm-dev
<[hidden email]> wrote:

Also, do you think you can come up with an example that does not depend on
signed
overflow being “undefined” ?

Can you be more specific?  Example that does what?

-- Sanjoy


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] RFC: Killing undef and spreading poison

Hal Finkel via llvm-dev
Hi Peter,

On Mon, Jun 5, 2017 at 3:56 PM, Peter Lawrence
<[hidden email]> wrote:
> Sanjay,
>             Your original example showed end-to-end-miscompilation,
> After a sequence of transformations,
> Where “undefined behavior” is a part of the problem.
>
>
> My question is can you show some additional examples that do not rely on
> “nsw” or “nuw” ?

I still don't understand you -- I don't see how an example of a
miscompile that does not involve nsw/nuw is relevant here -- while I'm
sure LLVM has generic bugs, such a miscompile will not illustrate
anything about the nature of poison as it is implemented in LLVM
today.

If you're asking for an example where the original program does not
have UB, that is precisely what I was trying to illustrate in the
original example.  In

 > for (...) {
 > if (condition){
 > i32 prod = x *nsw y
 > i64 prod.sext = sext prod to i64
 > i64 t = K `udiv` (-1 + (sum.prod >> 32))
 > use(t)
 > }
 > }

the program does *not* have UB if "condition" implies that "x * y does
not sign overflow".  For instance "condition" could be
"__builtin_smul_overflow(x, y, &res)", in which case

 a. The program does not have UB
 b. Even if prod was not marked nsw, it would have been legal for LLVM
to infer it


-- Sanjoy
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] RFC: Killing undef and spreading poison

Hal Finkel via llvm-dev
In reply to this post by Hal Finkel via llvm-dev
Sanjoy,
            In that case I change my answer and say that the illegal
optimization was (step 1) hoisting the “nsw” out of the if-statement,

When inside the protection of the if-clause the “nsw” was conditional on
the if-clause, but when you hoisted it out of the if-statement there is no 
reason to continue assuming “nsw” is true. (This is true regardless of how
the original “nsw” got there, either by language definition or by analysis).

“Nsw” isn’t an opcode, it is analysis information, and as we all know
Analysis information can and does get invalidated by transformations.
This is a case in point.

I think the “nsw” got hoisted out of the if-statement by accident, assuming
it was part of the opcode, but we have just seen that this is incorrect.

   - - - - - - - - - - - - - - - - - - - -

If the above analysis is correct (I am pretty sure it is, but have been known
to make mistakes!), then my concern is that since this example was
incorrectly root-caused I am not sure we are ready to commit to “poison”
and “freeze” changes, and need to see more original source causes of
end-to-end-miscompilation to be sure we haven’t also mis-diagnosed those
as well.


Peter Lawrence.



On Jun 5, 2017, at 12:02 PM, Sanjoy Das <[hidden email]> wrote:

Hi Peter,

On Mon, Jun 5, 2017 at 9:43 AM, Peter Lawrence
<[hidden email]> wrote:
I think it is valid by definition -- since "A *nsw B" does not sign
wrap, "sext(A *nsw B)" == (sext A) * (sext B).  In other words, these
two expressions are inequal only if "A *nsw B" sign wraps.

The way you use “nsw” it seems like an assertion or an assumption.

Yes, that is correct.

Where did the “nsw” come from, is it from the source language definition,
Or compiler derived analysis information ?

Both are possible.  In C/C++, as you know, most (all?) signed
arithmetic can be marked nsw.  In many cases the optimizer will also
infer nsw if it can prove the arithmetic does not overflow (e.g. if (a
!= INT_SMAX) b = a + 1;  ==> if (a != INT_SMAX) b = a+nsw 1;).

-- Sanjoy


Here’s the original source example for context
———————————————————————————————

[llvm-dev] RFC: Killing undef and spreading poison

Sanjoy Das via llvm-dev [hidden email]

Fri Nov 11 23:55:32 PST 2016

Firstly consider this sequence of transforms:

   for (...) {
     if (condition){
       i32 prod = x *nsw y
       i64 prod.sext = sext prod to i64
       i64 t = K `udiv` (-1 + (sum.prod >> 32))
       use(t)
     }
   }

==> (step 1) multiplication and sext can be speculated

   i32 prod = x *nsw y
   i64 prod.sext = sext prod to i64
   for (...) {
     if (condition){
       i64 t = K `udiv` (-1 + (prod.sext >> 32))
       use(t)
     }
   }

==> (step 2) known bits analysis on sext to prove divisor is non-zero

   i32 prod = x *nsw y
   i64 prod.sext = sext prod to i64
   i64 t = K `udiv` (-1 + (prod.sext >> 32))
   for (...) {
     if (condition){
       use(t)
     }
   }

==> (step 3) commute sext through nsw multiplicationa

   ;; i32 prod = x *nsw y
   i64 prod.sext = (sext x) *nsw (sext y)
   i64 t = K `udiv` (-1 + (prod.sext >> 32))
   for (...) {
     if (condition){
       use(t)
     }
   }


Now we've managed to introduce a fault if we were in case 1 -- the
source program was well defined but the target program divides by
zero.









_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] RFC: Killing undef and spreading poison

Hal Finkel via llvm-dev
Since most add/sub operations compiled from C have the nsw attribute, we
cannot simply restrict movement of these instructions.
Sure, we could drop nsw when moving these instructions, but there are still
many other problems left.  Please read more about the topic here:
https://blog.regehr.org/archives/1496
For example, doing loop unswitching, followed by inline (just to illustrate
that the reasoning cannot be local), plus GVN gives you end-to-end
miscompilations easily.

Nuno

-----Original Message-----
From: Peter Lawrence via llvm-dev
Sent: Wednesday, June 7, 2017 8:48 PM
Subject: Re: [llvm-dev] RFC: Killing undef and spreading poison


Sanjoy,
            In that case I change my answer and say that the illegal
optimization was (step 1) hoisting the “nsw” out of the if-statement,

When inside the protection of the if-clause the “nsw” was conditional on
the if-clause, but when you hoisted it out of the if-statement there is no
reason to continue assuming “nsw” is true. (This is true regardless of how
the original “nsw” got there, either by language definition or by analysis).

“Nsw” isn’t an opcode, it is analysis information, and as we all know
Analysis information can and does get invalidated by transformations.
This is a case in point.

I think the “nsw” got hoisted out of the if-statement by accident, assuming
it was part of the opcode, but we have just seen that this is incorrect.

   - - - - - - - - - - - - - - - - - - - -

If the above analysis is correct (I am pretty sure it is, but have been
known
to make mistakes!), then my concern is that since this example was
incorrectly root-caused I am not sure we are ready to commit to “poison”
and “freeze” changes, and need to see more original source causes of
end-to-end-miscompilation to be sure we haven’t also mis-diagnosed those
as well.


Peter Lawrence.





On Jun 5, 2017, at 12:02 PM, Sanjoy Das <[hidden email]>
wrote:


Hi Peter,

On Mon, Jun 5, 2017 at 9:43 AM, Peter Lawrence
<[hidden email]> wrote:
I think it is valid by definition -- since "A *nsw B" does not sign
wrap, "sext(A *nsw B)" == (sext A) * (sext B).  In other words, these
two expressions are inequal only if "A *nsw B" sign wraps.

The way you use “nsw” it seems like an assertion or an assumption.

Yes, that is correct.

Where did the “nsw” come from, is it from the source language definition,
Or compiler derived analysis information ?

Both are possible.  In C/C++, as you know, most (all?) signed
arithmetic can be marked nsw.  In many cases the optimizer will also
infer nsw if it can prove the arithmetic does not overflow (e.g. if (a
!= INT_SMAX) b = a + 1;  ==> if (a != INT_SMAX) b = a+nsw 1;).

-- Sanjoy



Here’s the original source example for context
———————————————————————————————

[llvm-dev] RFC: Killing undef and spreading poison

Sanjoy Das via llvm-dev llvm-dev at lists.llvm.org

Fri Nov 11 23:55:32 PST 2016

Firstly consider this sequence of transforms: for (...) { if (condition){
i32 prod = x *nsw y i64 prod.sext = sext prod to i64 i64 t = K `udiv` (-1 +
(sum.prod >> 32)) use(t) } } ==> (step 1) multiplication and sext can be
speculated i32 prod = x *nsw y i64 prod.sext = sext prod to i64 for (...)
{ if (condition){ i64 t = K `udiv` (-1 + (prod.sext >> 32)) use(t) } } ==>
(step 2) known bits analysis on sext to prove divisor is non-zero i32 prod =
x *nsw y i64 prod.sext = sext prod to i64 i64 t = K `udiv` (-1 + (prod.sext
 >> 32)) for (...) { if (condition){ use(t) } } ==> (step 3) commute sext
through nsw multiplicationa ;; i32 prod = x *nsw y i64 prod.sext = (sext x)
*nsw (sext y) i64 t = K `udiv` (-1 + (prod.sext >> 32)) for (...) { if
(condition){ use(t) } } Now we've managed to introduce a fault if we were in
case 1 -- the source program was well defined but the target program divides
by zero.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev 

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] RFC: Killing undef and spreading poison

Hal Finkel via llvm-dev
In reply to this post by Hal Finkel via llvm-dev
Peter, it looks to me like you've misunderstood how nsw works.  You are
saying that the transformation below is incorrect.  If it is incorrect,
there needs to be a valuation of x and y (and also a choice of "..." and
"condition") that causes the transformed code to call use(t) with a
different t than the original code would have.  Please tell us those values.

John


>    for (...) {
>      if (condition){
>        i32 prod = x *nsw y
>        i64 prod.sext = sext prod to i64
>        i64 t = K `udiv` (-1 + (sum.prod >> 32))
>        use(t)
>      }
>    }
>
> ==> (step 1) multiplication and sext can be speculated
>
>    i32 prod = x *nsw y
>    i64 prod.sext = sext prod to i64
>    for (...) {
>      if (condition){
>        i64 t = K `udiv` (-1 + (prod.sext >> 32))
>        use(t)
>      }
>    }
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] RFC: Killing undef and spreading poison

Hal Finkel via llvm-dev
John,
         I’m not saying the transformation is illegal with respect to ’t’,
I am saying the transformation is illegal with respect to ‘prod’

I wil use Sanjoy’s example
        ‘x' = (1 << 16),   'y' = (1 << 16),   and ‘condition' precludes those values

Before the transformation ‘nsw’ says the the multiply will not overflow,

But after the transformation the multiply *will* overflow, and ‘nsw’ is no longer valid.

The point is not that you can’t hoist the multiply, but rather that the ‘nsw’ attribute no longer applies after that hoisting.


I think the LLVM community in general has misunderstood and misused ‘nsw’, don’t you agree now ?


Peter Lawrence.



> On Jun 7, 2017, at 2:27 PM, John Regehr <[hidden email]> wrote:
>
> Peter, it looks to me like you've misunderstood how nsw works.  You are saying that the transformation below is incorrect.  If it is incorrect, there needs to be a valuation of x and y (and also a choice of "..." and "condition") that causes the transformed code to call use(t) with a different t than the original code would have.  Please tell us those values.
>
> John
>
>
>>   for (...) {
>>     if (condition){
>>       i32 prod = x *nsw y
>>       i64 prod.sext = sext prod to i64
>>       i64 t = K `udiv` (-1 + (sum.prod >> 32))
>>       use(t)
>>     }
>>   }
>>
>> ==> (step 1) multiplication and sext can be speculated
>>
>>   i32 prod = x *nsw y
>>   i64 prod.sext = sext prod to i64
>>   for (...) {
>>     if (condition){
>>       i64 t = K `udiv` (-1 + (prod.sext >> 32))
>>       use(t)
>>     }
>>   }

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] RFC: Killing undef and spreading poison

Hal Finkel via llvm-dev
In reply to this post by Hal Finkel via llvm-dev

> On Jun 7, 2017, at 2:23 PM, Nuno Lopes <[hidden email]> wrote:
>
> Since most add/sub operations compiled from C have the nsw attribute, we cannot simply restrict movement of these instructions.

Nuno,
          I’m not saying the operations can’t be moved,
I’m saying that once you move one the ‘nsw’ attribute no longer applies,
unless you can mathematically prove that it still does,
otherwise an “add nsw” has to be converted to plain “add”.

It is only by illegally retaining the ‘nsw’ after hoisting the multiply out of the ‘if’ statement
that subsequent transformations yield end-to-end-miscompilation in Sanjoy’s example.

Don’t you agree ?

-Peter.







> Sure, we could drop nsw when moving these instructions, but there are still many other problems left.  Please read more about the topic here: https://blog.regehr.org/archives/1496
> For example, doing loop unswitching, followed by inline (just to illustrate that the reasoning cannot be local), plus GVN gives you end-to-end miscompilations easily.
>
> Nuno
>
> -----Original Message----- From: Peter Lawrence via llvm-dev
> Sent: Wednesday, June 7, 2017 8:48 PM
> Subject: Re: [llvm-dev] RFC: Killing undef and spreading poison
>
>
> Sanjoy,
>           In that case I change my answer and say that the illegal
> optimization was (step 1) hoisting the “nsw” out of the if-statement,
>
> When inside the protection of the if-clause the “nsw” was conditional on
> the if-clause, but when you hoisted it out of the if-statement there is no
> reason to continue assuming “nsw” is true. (This is true regardless of how
> the original “nsw” got there, either by language definition or by analysis).
>
> “Nsw” isn’t an opcode, it is analysis information, and as we all know
> Analysis information can and does get invalidated by transformations.
> This is a case in point.
>
> I think the “nsw” got hoisted out of the if-statement by accident, assuming
> it was part of the opcode, but we have just seen that this is incorrect.
>
>  - - - - - - - - - - - - - - - - - - - -
>
> If the above analysis is correct (I am pretty sure it is, but have been known
> to make mistakes!), then my concern is that since this example was
> incorrectly root-caused I am not sure we are ready to commit to “poison”
> and “freeze” changes, and need to see more original source causes of
> end-to-end-miscompilation to be sure we haven’t also mis-diagnosed those
> as well.
>
>
> Peter Lawrence.
>
>
>
>
>
> On Jun 5, 2017, at 12:02 PM, Sanjoy Das <[hidden email]> wrote:
>
>
> Hi Peter,
>
> On Mon, Jun 5, 2017 at 9:43 AM, Peter Lawrence
> <[hidden email]> wrote:
> I think it is valid by definition -- since "A *nsw B" does not sign
> wrap, "sext(A *nsw B)" == (sext A) * (sext B).  In other words, these
> two expressions are inequal only if "A *nsw B" sign wraps.
>
> The way you use “nsw” it seems like an assertion or an assumption.
>
> Yes, that is correct.
>
> Where did the “nsw” come from, is it from the source language definition,
> Or compiler derived analysis information ?
>
> Both are possible.  In C/C++, as you know, most (all?) signed
> arithmetic can be marked nsw.  In many cases the optimizer will also
> infer nsw if it can prove the arithmetic does not overflow (e.g. if (a
> != INT_SMAX) b = a + 1;  ==> if (a != INT_SMAX) b = a+nsw 1;).
>
> -- Sanjoy
>
>
>
> Here’s the original source example for context
> ———————————————————————————————
>
> [llvm-dev] RFC: Killing undef and spreading poison
>
> Sanjoy Das via llvm-dev llvm-dev at lists.llvm.org
>
> Fri Nov 11 23:55:32 PST 2016
>
> Firstly consider this sequence of transforms: for (...) { if (condition){ i32 prod = x *nsw y i64 prod.sext = sext prod to i64 i64 t = K `udiv` (-1 + (sum.prod >> 32)) use(t) } } ==> (step 1) multiplication and sext can be speculated i32 prod = x *nsw y i64 prod.sext = sext prod to i64 for (...) { if (condition){ i64 t = K `udiv` (-1 + (prod.sext >> 32)) use(t) } } ==> (step 2) known bits analysis on sext to prove divisor is non-zero i32 prod = x *nsw y i64 prod.sext = sext prod to i64 i64 t = K `udiv` (-1 + (prod.sext >> 32)) for (...) { if (condition){ use(t) } } ==> (step 3) commute sext through nsw multiplicationa ;; i32 prod = x *nsw y i64 prod.sext = (sext x) *nsw (sext y) i64 t = K `udiv` (-1 + (prod.sext >> 32)) for (...) { if (condition){ use(t) } } Now we've managed to introduce a fault if we were in case 1 -- the source program was well defined but the target program divides by zero.
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev 

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] RFC: Killing undef and spreading poison

Hal Finkel via llvm-dev
Hi Peter,

On Thu, Jun 8, 2017 at 9:41 AM, Peter Lawrence
<[hidden email]> wrote:

>
>> On Jun 7, 2017, at 2:23 PM, Nuno Lopes <[hidden email]> wrote:
>>
>> Since most add/sub operations compiled from C have the nsw attribute, we cannot simply restrict movement of these instructions.
>
> Nuno,
>           I’m not saying the operations can’t be moved,
> I’m saying that once you move one the ‘nsw’ attribute no longer applies,
> unless you can mathematically prove that it still does,
> otherwise an “add nsw” has to be converted to plain “add”.
>
> It is only by illegally retaining the ‘nsw’ after hoisting the multiply out of the ‘if’ statement
> that subsequent transformations yield end-to-end-miscompilation in Sanjoy’s example.

That would be correct (and we do this for some constructs: for
instance when we have !dereferenceable attached to a load instruction
we will strip the !dereferenceable when hoisting it out of control
flow).  However, with poison we want to have our cake and eat it too
(perhaps eating is not the best analogy with poison :) ) -- we want to
(whenever correct) exploit the fact that a certain operation does not
overflow when possible even when hoisting it above control flow.  For
instance, if we have:

if (cond) {
  t = a +nsw b;
  print(t);
}

Now if once we hoist t out of the control block:

t = a +nsw b;
if (cond) {
  print(t);
}

in the transformed program, t itself may sign overflow.  In LLVM IR
(or at least in the semantics we'd like), this has no correctness
implications -- t becomes "poison" (which is basically deferred
undefined behavior), and the program is undefined only if the
generated poison value is used in a "side effecting" manner.  Assuming
that print is a "side effect", this means at print, we can assume t
isn't poison (and thus a + b did not sign overflow).  This is a weaker
model than C/C++; and the difficult bits are getting the poison
propagation rules correct, and to have a sound definition of a "side
effect" (i.e. the points at which poison == deferred UB actually
becomes UB).

> I think the LLVM community in general has misunderstood and misused ‘nsw’, don’t you agree now ?

FYI, I think it is poor form to insinuate such things when you clearly
haven't made an effort to dig back and understand the all of prior
discussions we've had in this area (hint: we've discussed and
explicitly decided to not implement the semantics you're suggesting).
Of course, fresh ideas are always welcome but I suggest you start by
first reading http://www.cs.utah.edu/~regehr/papers/undef-pldi17.pdf
and some of the mailing list discussions we've had in the past on this
topic.

Thanks!
-- Sanjoy
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] RFC: Killing undef and spreading poison

Hal Finkel via llvm-dev

> On Jun 8, 2017, at 10:33 AM, Sanjoy Das <[hidden email]> wrote:
>
> Hi Peter,
>
> On Thu, Jun 8, 2017 at 9:41 AM, Peter Lawrence
> <[hidden email]> wrote:
>>
>>> On Jun 7, 2017, at 2:23 PM, Nuno Lopes <[hidden email]> wrote:
>>>
>>> Since most add/sub operations compiled from C have the nsw attribute, we cannot simply restrict movement of these instructions.
>>
>> Nuno,
>>          I’m not saying the operations can’t be moved,
>> I’m saying that once you move one the ‘nsw’ attribute no longer applies,
>> unless you can mathematically prove that it still does,
>> otherwise an “add nsw” has to be converted to plain “add”.
>>
>> It is only by illegally retaining the ‘nsw’ after hoisting the multiply out of the ‘if’ statement
>> that subsequent transformations yield end-to-end-miscompilation in Sanjoy’s example.
>
> That would be correct (and we do this for some constructs: for
> instance when we have !dereferenceable attached to a load instruction
> we will strip the !dereferenceable when hoisting it out of control
> flow).  

In other words you are agreeing with me.

And once we’ve agreed on that, why do you insist on illegally hoisting the ‘nsw’
Out of the if-statement, since adding “poison” is clearly not necessary (once
The ‘nsw’ is stripped off the multiply the ‘sext’ can no longer be commuted).

In other words “poison” isn’t cake, it is a bandaid over an illegal transformation,
It has no benefit. Don’t make the illegal transformation and ‘poison’ isn’t necessary.


Peter Lawrence.





> However, with poison we want to have our cake and eat it too
> (perhaps eating is not the best analogy with poison :) ) -- we want to
> (whenever correct) exploit the fact that a certain operation does not
> overflow when possible even when hoisting it above control flow.  For
> instance, if we have:
>
> if (cond) {
>  t = a +nsw b;
>  print(t);
> }
>
> Now if once we hoist t out of the control block:
>
> t = a +nsw b;
> if (cond) {
>  print(t);
> }
>
> in the transformed program, t itself may sign overflow.  In LLVM IR
> (or at least in the semantics we'd like), this has no correctness
> implications -- t becomes "poison" (which is basically deferred
> undefined behavior), and the program is undefined only if the
> generated poison value is used in a "side effecting" manner.  Assuming
> that print is a "side effect", this means at print, we can assume t
> isn't poison (and thus a + b did not sign overflow).  This is a weaker
> model than C/C++; and the difficult bits are getting the poison
> propagation rules correct, and to have a sound definition of a "side
> effect" (i.e. the points at which poison == deferred UB actually
> becomes UB).
>
>> I think the LLVM community in general has misunderstood and misused ‘nsw’, don’t you agree now ?
>
> FYI, I think it is poor form to insinuate such things when you clearly
> haven't made an effort to dig back and understand the all of prior
> discussions we've had in this area (hint: we've discussed and
> explicitly decided to not implement the semantics you're suggesting).
> Of course, fresh ideas are always welcome but I suggest you start by
> first reading http://www.cs.utah.edu/~regehr/papers/undef-pldi17.pdf
> and some of the mailing list discussions we've had in the past on this
> topic.
>
> Thanks!
> -- Sanjoy

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] RFC: Killing undef and spreading poison

Hal Finkel via llvm-dev
In reply to this post by Hal Finkel via llvm-dev

> On Jun 8, 2017, at 10:33 AM, Sanjoy Das <[hidden email]> wrote:
>
> Hi Peter,
>
> On Thu, Jun 8, 2017 at 9:41 AM, Peter Lawrence
> <[hidden email]> wrote:
>>
>>> On Jun 7, 2017, at 2:23 PM, Nuno Lopes <[hidden email]> wrote:
>>>
>>> Since most add/sub operations compiled from C have the nsw attribute, we cannot simply restrict movement of these instructions.
>>
>> Nuno,
>>          I’m not saying the operations can’t be moved,
>> I’m saying that once you move one the ‘nsw’ attribute no longer applies,
>> unless you can mathematically prove that it still does,
>> otherwise an “add nsw” has to be converted to plain “add”.
>>
>> It is only by illegally retaining the ‘nsw’ after hoisting the multiply out of the ‘if’ statement
>> that subsequent transformations yield end-to-end-miscompilation in Sanjoy’s example.
>
>
>> I think the LLVM community in general has misunderstood and misused ‘nsw’, don’t you agree now ?
>
> FYI, I think it is poor form to insinuate such things when you clearly
> haven't made an effort to dig back and understand the all of prior
> discussions we've had in this area (hint: we've discussed and
> explicitly decided to not implement the semantics you're suggesting).
> Of course, fresh ideas are always welcome but I suggest you start by
> first reading http://www.cs.utah.edu/~regehr/papers/undef-pldi17.pdf
> and some of the mailing list discussions we've had in the past on this
> topic.

Sanjoy,
            I have read the paper, and I dug the archives all the way back to Nov to find
Your example with the nsw-multiply-and-commuting-sext problem.

The reason I keep asking for additional examples is that this one where
You have illegally transformed the ‘nsw’ isn’t convincing. So please,
Provide some additional examples, or some pointers going farther back
Than last Nov in the archives, especially ones that show why you decided
To allow the illegal transformation of ‘nsw’.

Peter Lawrence.



>
> Thanks!
> -- Sanjoy

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] RFC: Killing undef and spreading poison

Hal Finkel via llvm-dev

Sanjoy,
you describe a problem with LLVM “undef”,
you do not suggest fixing this problem, even though in chpt. 9 you identify other
compilers that do attempt to avoid it.

It seems to me that since transforming a program with multiple uses of an “undef” variable
into one with multiple distinct “undef”s results in behavior that is inconsistent (and that
violates users expectations) we should not allow it.

This seems to me to be one of those situations where the phrase “extraordinary claims
require extraordinary evidence” applies, and I’m not seeing any, are you ?


IIUC, a “freeze” operation can always be inserted over an “undef” (now “poison”)
that forces a single consistent value, but why the extra overhead of “freeze”, as
there doesn’t seem to be a convincing argument for ever allowing the same variable
to have inconsistent values, or am I missing something ?


Peter Lawrence.

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] RFC: Killing undef and spreading poison

Hal Finkel via llvm-dev
Hi,

On Thu, Jun 8, 2017 at 12:29 PM, Peter Lawrence
<[hidden email]> wrote:
>
> Sanjoy,
>             in your blog post
> https://www.playingwithpointers.com/problem-with-undef.html
> you describe a problem with LLVM “undef”,
> yet in your paper http://www.cs.utah.edu/~regehr/papers/undef-pldi17.pdf
> you do not suggest fixing this problem, even though in chpt. 9 you identify
> other
> compilers that do attempt to avoid it.

That problem is mentioned in section 3.2 (which quotes the same
example as in the blog post), and section 4 (the paragraph starting
with "Defining branching on poison to be UB further enables analyses
to assume that...") addresses how the new semantics plugs this gap.

> The reason I keep asking for additional examples is that this one where
> You have illegally transformed the ‘nsw’ isn’t convincing. So please,

Maybe we're talking past each other.  Let me first discuss the framing
of the problem as I see it:

 1. Whether a transform is illegal or not is decided by the semantics
    we've assigned to the IR.  In this case, whether hoisting the
    multiplication with the nsw intact is illegal or not is decided by
    the semantics we've assigned to nsw.

 2. We want certain transforms to be legal, which limits the kinds of
    semantics we can possibly have for the IR.  For instance, we
    certainly don't want a semantics for IR that disallows constant
    folding `add i32 1, 2`.

The example I gave was to demonstrate why the current (albeit not very
clearly specified) semantics is incorrect.

So when you ask for an example, are you asking why the set of
transforms we want to be correct in the new semantics (i.e. (2))
includes "hoist nsw-arithmetic out of control flow while still keeping
the flags"?  IOW, in the language used in the paper I linked to, are
you asking "why can't we have a overflowing nsw addition be immediate
UB"?  If so, that is covered in section 2.2 (but we can get into more
detail if you want).

-- Sanjoy
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] RFC: Killing undef and spreading poison

Hal Finkel via llvm-dev
Sanjoy,
            Sorry, I guess I wasn’t specific enough, I’ll try again.  Excerpt  from your blog post:

"Another way to look at this is that undef isn’t a normal SSA value, and uses of an undefvalue are also its defs.

IMHO this was a bad definition from the get-go.

It seems to me that transforming a program with multiple uses of a single undefined variable
into one with multiple distinct IR “undef”s results in behavior that is inconsistent, and that
violates users expectations, and therefore we should not do it.

In other words the aspect of the current definition of “undef” in LLVM-IR that seems to force
programs with undefined variables to be mis-represented is broken, and the definition of “undef"
needs to be fixed.

Your proposal with “freeze” makes it possible to force all uses of an undefined variable to be
consistent, but why not just do the simpler fix to “undef” / “poison” and allow them to have
multiple uses, as opposed to “use once only, by definition of the IR”, which does not seem
to have any benefit.

So again, “freeze” seems to be a workaround, rather than fixing the underlying problem.


Thoughts ?


Peter Lawrence.



On Jun 8, 2017, at 1:29 PM, Sanjoy Das <[hidden email]> wrote:

Hi,

On Thu, Jun 8, 2017 at 12:29 PM, Peter Lawrence
<[hidden email]> wrote:

Sanjoy,
           in your blog post
https://www.playingwithpointers.com/problem-with-undef.html
you describe a problem with LLVM “undef”,
yet in your paper http://www.cs.utah.edu/~regehr/papers/undef-pldi17.pdf
you do not suggest fixing this problem, even though in chpt. 9 you identify
other
compilers that do attempt to avoid it.

That problem is mentioned in section 3.2 (which quotes the same
example as in the blog post), and section 4 (the paragraph starting
with "Defining branching on poison to be UB further enables analyses
to assume that...") addresses how the new semantics plugs this gap.

The reason I keep asking for additional examples is that this one where
You have illegally transformed the ‘nsw’ isn’t convincing. So please,

Maybe we're talking past each other.  Let me first discuss the framing
of the problem as I see it:

1. Whether a transform is illegal or not is decided by the semantics
   we've assigned to the IR.  In this case, whether hoisting the
   multiplication with the nsw intact is illegal or not is decided by
   the semantics we've assigned to nsw.

2. We want certain transforms to be legal, which limits the kinds of
   semantics we can possibly have for the IR.  For instance, we
   certainly don't want a semantics for IR that disallows constant
   folding `add i32 1, 2`.

The example I gave was to demonstrate why the current (albeit not very
clearly specified) semantics is incorrect.

So when you ask for an example, are you asking why the set of
transforms we want to be correct in the new semantics (i.e. (2))
includes "hoist nsw-arithmetic out of control flow while still keeping
the flags"?  IOW, in the language used in the paper I linked to, are
you asking "why can't we have a overflowing nsw addition be immediate
UB"?  If so, that is covered in section 2.2 (but we can get into more
detail if you want).

-- Sanjoy


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] RFC: Killing undef and spreading poison

Hal Finkel via llvm-dev
In reply to this post by Hal Finkel via llvm-dev

On Jun 8, 2017, at 10:33 AM, Sanjoy Das <[hidden email]> wrote:


(hint: we've discussed and
explicitly decided to not implement the semantics you're suggesting).

Sanjoy,
            I’ve search the entire archives of 2017 and 2016 and have not found anything remotely
resembling a discussion about hoisting ‘nsw’ / ‘nuw’, can you be more specific ?

Peter Lawrence.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] RFC: Killing undef and spreading poison

Hal Finkel via llvm-dev
In reply to this post by Hal Finkel via llvm-dev
Sanjoy,
            From your paper   http://www.cs.utah.edu/~regehr/papers/undef-pldi17.pdf  we have this proposal

1.  remove “undef” and use “poison” instead

2. Introduce a new instruction:  “freeze” …

3. All operations over “poison” unconditionally return “poison” except “phi", “select", and "freeze"

4. Branching on “poison” is immediate UB.



3:::   then  (X & 0)   does not translate to  (0)   when X is “poison” ?,  that seems strange, can you explain / elaborate ?


4:::   hmm, I know of no machine where this is what actually happens[*],  can you explain / elaborate ?


Peter Lawrence.


[*. I am assuming you are using consistent terminology, where in the beginning of the paper you make a
 distinction between immediate and deferred undefined, with immediate being a trap like divide-by-zero.]


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] RFC: Killing undef and spreading poison

Hal Finkel via llvm-dev
> 3:::   then  (X & 0)   does not translate to  (0)   when X is “poison”
> ?,  that seems strange, can you explain / elaborate ?

This is how poison already works, please see the LangRef.

John

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] RFC: Killing undef and spreading poison

Hal Finkel via llvm-dev
In reply to this post by Hal Finkel via llvm-dev
> 4. Branching on “poison” is immediate UB.

> 4:::   hmm, I know of no machine where this is what actually happens[*],
>   can you explain / elaborate ?

Well, branching on poison has to mean something.  We propose making it
immediate UB.  Another choice would be nondeterministic branching.
Either way, some optimizations become legal and others become illegal.
It's just a design tradeoff.  The semantics of actual machines aren't as
important here as you might be tempted to think.

Peter, I think you are kind of wearing us out here.  Could you please
tone it down a bit, read everything carefully, play with some examples
(Alive is particularly helpful, and it implements both the old and
proposed new semantics), and then get back to us with questions that
don't ask us to re-explain everything that we've already spent months or
years discussing?

I'm not exactly sure which discussions Sanjoy was referring to (there
have been many) but if you look for "the nsw story" in the archives (ca.
2011/2012) you'll turn up a lot of material.

Thanks for your consideration,

John

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Loading...