[llvm-dev] [SCEV] Why is backedge-taken count <nsw> instead of <nuw>?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] [SCEV] Why is backedge-taken count <nsw> instead of <nuw>?

Daniel Sanders via llvm-dev
Hello,

If I run clang on the following code:

void func(unsigned n) {
  for (unsigned long x = 1; x < n; ++x)
    dummy(x);
}

I get the following llvm ir:

define void @func(i32 %n) {
entry:
  %conv = zext i32 %n to i64
  %cmp5 = icmp ugt i32 %n, 1
  br i1 %cmp5, label %for.body, label %for.cond.cleanup
for.cond.cleanup:                                 ; preds = %for.body, %entry
  ret void
for.body:                                         ; preds = %entry, %for.body
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  tail call void @dummy(i64 %x.06) #2
  %inc = add nuw nsw i64 %x.06, 1
  %exitcond = icmp eq i64 %inc, %conv
  br i1 %exitcond, label %for.cond.cleanup, label %for.body
}

Over which, SCEV will provide the following analysis:

Printing analysis 'Scalar Evolution Analysis' for function 'func':
Classifying expressions for: @func
  %conv = zext i32 %n to i64
  -->  (zext i32 %n to i64) U: [0,4294967296) S: [0,4294967296)
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  -->  {1,+,1}<nuw><nsw><%for.body> U: [1,-9223372036854775808) S: [1,-9223372036854775808) Exits: (-1 + (zext i32 %n to i64)) LoopDispositions: { %for.body: Computable }
  %inc = add nuw nsw i64 %x.06, 1
  -->  {2,+,1}<nuw><%for.body> U: [2,0) S: [2,0) Exits: (zext i32 %n to i64) LoopDispositions: { %for.body: Computable }
Determining loop execution counts for: @func
Loop %for.body: backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
Loop %for.body: max backedge-taken count is -2
Loop %for.body: Predicated backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
 Predicates:
Loop %for.body: Trip multiple is 1

Now, I was surprised by the max backedge-taken count being -2, and I suspect it is due to the backedge-taken count being marked as <nsw> instead of <nuw>.

Is that on purpose, is that a bug, or is my analysis incorrect? I am not sure where to fix that issue.

--
Alexandre Isoard

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [SCEV] Why is backedge-taken count <nsw> instead of <nuw>?

Daniel Sanders via llvm-dev
On 8/15/2018 12:21 PM, Alexandre Isoard via llvm-dev wrote:
Hello,

If I run clang on the following code:

void func(unsigned n) {
  for (unsigned long x = 1; x < n; ++x)
    dummy(x);
}

I get the following llvm ir:

define void @func(i32 %n) {
entry:
  %conv = zext i32 %n to i64
  %cmp5 = icmp ugt i32 %n, 1
  br i1 %cmp5, label %for.body, label %for.cond.cleanup
for.cond.cleanup:                                 ; preds = %for.body, %entry
  ret void
for.body:                                         ; preds = %entry, %for.body
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  tail call void @dummy(i64 %x.06) #2
  %inc = add nuw nsw i64 %x.06, 1
  %exitcond = icmp eq i64 %inc, %conv
  br i1 %exitcond, label %for.cond.cleanup, label %for.body
}

Over which, SCEV will provide the following analysis:

Printing analysis 'Scalar Evolution Analysis' for function 'func':
Classifying expressions for: @func
  %conv = zext i32 %n to i64
  -->  (zext i32 %n to i64) U: [0,4294967296) S: [0,4294967296)
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  -->  {1,+,1}<nuw><nsw><%for.body> U: [1,-9223372036854775808) S: [1,-9223372036854775808) Exits: (-1 + (zext i32 %n to i64)) LoopDispositions: { %for.body: Computable }
  %inc = add nuw nsw i64 %x.06, 1
  -->  {2,+,1}<nuw><%for.body> U: [2,0) S: [2,0) Exits: (zext i32 %n to i64) LoopDispositions: { %for.body: Computable }
Determining loop execution counts for: @func
Loop %for.body: backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
Loop %for.body: max backedge-taken count is -2
Loop %for.body: Predicated backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
 Predicates:
Loop %for.body: Trip multiple is 1

Now, I was surprised by the max backedge-taken count being -2, and I suspect it is due to the backedge-taken count being marked as <nsw> instead of <nuw>.

Is that on purpose, is that a bug, or is my analysis incorrect? I am not sure where to fix that issue.

The backedge-taken count isn't nuw because nsw/nuw markings aren't flow-sensitive: there isn't any way to mark the trip count as nuw without marking every computation of `(long)n-2` as nuw.

There's some code in ScalarEvolution::howFarToZero to try to refine the max backedge-taken count in some cases, but it isn't very general.  See https://reviews.llvm.org/D28536 .

-Eli
-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [SCEV] Why is backedge-taken count <nsw> instead of <nuw>?

Daniel Sanders via llvm-dev
Is that why we do not deduce +<nsw> from "add nsw" either?

Is that an intrinsic limitation of creating a context-invariant expressions from a Value* or is that a limitation of our implementation (our unification not considering the nsw flags)?

On Wed, Aug 15, 2018 at 12:39 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 12:21 PM, Alexandre Isoard via llvm-dev wrote:
Hello,

If I run clang on the following code:

void func(unsigned n) {
  for (unsigned long x = 1; x < n; ++x)
    dummy(x);
}

I get the following llvm ir:

define void @func(i32 %n) {
entry:
  %conv = zext i32 %n to i64
  %cmp5 = icmp ugt i32 %n, 1
  br i1 %cmp5, label %for.body, label %for.cond.cleanup
for.cond.cleanup:                                 ; preds = %for.body, %entry
  ret void
for.body:                                         ; preds = %entry, %for.body
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  tail call void @dummy(i64 %x.06) #2
  %inc = add nuw nsw i64 %x.06, 1
  %exitcond = icmp eq i64 %inc, %conv
  br i1 %exitcond, label %for.cond.cleanup, label %for.body
}

Over which, SCEV will provide the following analysis:

Printing analysis 'Scalar Evolution Analysis' for function 'func':
Classifying expressions for: @func
  %conv = zext i32 %n to i64
  -->  (zext i32 %n to i64) U: [0,4294967296) S: [0,4294967296)
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  -->  {1,+,1}<nuw><nsw><%for.body> U: [1,-9223372036854775808) S: [1,-9223372036854775808) Exits: (-1 + (zext i32 %n to i64)) LoopDispositions: { %for.body: Computable }
  %inc = add nuw nsw i64 %x.06, 1
  -->  {2,+,1}<nuw><%for.body> U: [2,0) S: [2,0) Exits: (zext i32 %n to i64) LoopDispositions: { %for.body: Computable }
Determining loop execution counts for: @func
Loop %for.body: backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
Loop %for.body: max backedge-taken count is -2
Loop %for.body: Predicated backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
 Predicates:
Loop %for.body: Trip multiple is 1

Now, I was surprised by the max backedge-taken count being -2, and I suspect it is due to the backedge-taken count being marked as <nsw> instead of <nuw>.

Is that on purpose, is that a bug, or is my analysis incorrect? I am not sure where to fix that issue.

The backedge-taken count isn't nuw because nsw/nuw markings aren't flow-sensitive: there isn't any way to mark the trip count as nuw without marking every computation of `(long)n-2` as nuw.

There's some code in ScalarEvolution::howFarToZero to try to refine the max backedge-taken count in some cases, but it isn't very general.  See https://reviews.llvm.org/D28536 .

-Eli
-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [SCEV] Why is backedge-taken count <nsw> instead of <nuw>?

Daniel Sanders via llvm-dev
On 8/15/2018 1:31 PM, Alexandre Isoard wrote:
Is that why we do not deduce +<nsw> from "add nsw" either?

Essentially, yes.

Is that an intrinsic limitation of creating a context-invariant expressions from a Value* or is that a limitation of our implementation (our unification not considering the nsw flags)?

It's a consequence of unification not considering nsw.  (nsw on an instruction is naturally invariant because violating nsw produces poison, not UB.)

-Eli


On Wed, Aug 15, 2018 at 12:39 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 12:21 PM, Alexandre Isoard via llvm-dev wrote:
Hello,

If I run clang on the following code:

void func(unsigned n) {
  for (unsigned long x = 1; x < n; ++x)
    dummy(x);
}

I get the following llvm ir:

define void @func(i32 %n) {
entry:
  %conv = zext i32 %n to i64
  %cmp5 = icmp ugt i32 %n, 1
  br i1 %cmp5, label %for.body, label %for.cond.cleanup
for.cond.cleanup:                                 ; preds = %for.body, %entry
  ret void
for.body:                                         ; preds = %entry, %for.body
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  tail call void @dummy(i64 %x.06) #2
  %inc = add nuw nsw i64 %x.06, 1
  %exitcond = icmp eq i64 %inc, %conv
  br i1 %exitcond, label %for.cond.cleanup, label %for.body
}

Over which, SCEV will provide the following analysis:

Printing analysis 'Scalar Evolution Analysis' for function 'func':
Classifying expressions for: @func
  %conv = zext i32 %n to i64
  -->  (zext i32 %n to i64) U: [0,4294967296) S: [0,4294967296)
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  -->  {1,+,1}<nuw><nsw><%for.body> U: [1,-9223372036854775808) S: [1,-9223372036854775808) Exits: (-1 + (zext i32 %n to i64)) LoopDispositions: { %for.body: Computable }
  %inc = add nuw nsw i64 %x.06, 1
  -->  {2,+,1}<nuw><%for.body> U: [2,0) S: [2,0) Exits: (zext i32 %n to i64) LoopDispositions: { %for.body: Computable }
Determining loop execution counts for: @func
Loop %for.body: backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
Loop %for.body: max backedge-taken count is -2
Loop %for.body: Predicated backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
 Predicates:
Loop %for.body: Trip multiple is 1

Now, I was surprised by the max backedge-taken count being -2, and I suspect it is due to the backedge-taken count being marked as <nsw> instead of <nuw>.

Is that on purpose, is that a bug, or is my analysis incorrect? I am not sure where to fix that issue.

The backedge-taken count isn't nuw because nsw/nuw markings aren't flow-sensitive: there isn't any way to mark the trip count as nuw without marking every computation of `(long)n-2` as nuw.

There's some code in ScalarEvolution::howFarToZero to try to refine the max backedge-taken count in some cases, but it isn't very general.  See https://reviews.llvm.org/D28536 .

-Eli
-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [SCEV] Why is backedge-taken count <nsw> instead of <nuw>?

Daniel Sanders via llvm-dev
I'm not sure I understand the poison/undef/UB distinctions.

But on this example:

define i32 @func(i1 zeroext %b, i32 %x, i32 %y) {
entry:
  %adds = add nsw i32 %x, %y
  %addu = add nuw i32 %x, %y
  %cond = select i1 %b, i32 %adds, i32 %addu
  ret i32 %cond
}

It is important to not propagate the nsw/nuw between the two SCEV expressions (which unification would do today, can I consider that a bug or is it a feature?).
So we work-around it by not informing SCEV of the flags:

Printing analysis 'Scalar Evolution Analysis' for function 'func':
Classifying expressions for: @func
  %adds = add nsw i32 %x, %y
  -->  (%x + %y) U: full-set S: full-set
  %addu = add nuw i32 %x, %y
  -->  (%x + %y) U: full-set S: full-set
  %cond = select i1 %b, i32 %adds, i32 %addu
  -->  %cond U: full-set S: full-set
Determining loop execution counts for: @func

Would there be problems if we properly considered nuw/nsw flags when unifying SCEVs?

On Wed, Aug 15, 2018 at 1:59 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 1:31 PM, Alexandre Isoard wrote:
Is that why we do not deduce +<nsw> from "add nsw" either?

Essentially, yes.

Is that an intrinsic limitation of creating a context-invariant expressions from a Value* or is that a limitation of our implementation (our unification not considering the nsw flags)?

It's a consequence of unification not considering nsw.  (nsw on an instruction is naturally invariant because violating nsw produces poison, not UB.)

-Eli


On Wed, Aug 15, 2018 at 12:39 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 12:21 PM, Alexandre Isoard via llvm-dev wrote:
Hello,

If I run clang on the following code:

void func(unsigned n) {
  for (unsigned long x = 1; x < n; ++x)
    dummy(x);
}

I get the following llvm ir:

define void @func(i32 %n) {
entry:
  %conv = zext i32 %n to i64
  %cmp5 = icmp ugt i32 %n, 1
  br i1 %cmp5, label %for.body, label %for.cond.cleanup
for.cond.cleanup:                                 ; preds = %for.body, %entry
  ret void
for.body:                                         ; preds = %entry, %for.body
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  tail call void @dummy(i64 %x.06) #2
  %inc = add nuw nsw i64 %x.06, 1
  %exitcond = icmp eq i64 %inc, %conv
  br i1 %exitcond, label %for.cond.cleanup, label %for.body
}

Over which, SCEV will provide the following analysis:

Printing analysis 'Scalar Evolution Analysis' for function 'func':
Classifying expressions for: @func
  %conv = zext i32 %n to i64
  -->  (zext i32 %n to i64) U: [0,4294967296) S: [0,4294967296)
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  -->  {1,+,1}<nuw><nsw><%for.body> U: [1,-9223372036854775808) S: [1,-9223372036854775808) Exits: (-1 + (zext i32 %n to i64)) LoopDispositions: { %for.body: Computable }
  %inc = add nuw nsw i64 %x.06, 1
  -->  {2,+,1}<nuw><%for.body> U: [2,0) S: [2,0) Exits: (zext i32 %n to i64) LoopDispositions: { %for.body: Computable }
Determining loop execution counts for: @func
Loop %for.body: backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
Loop %for.body: max backedge-taken count is -2
Loop %for.body: Predicated backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
 Predicates:
Loop %for.body: Trip multiple is 1

Now, I was surprised by the max backedge-taken count being -2, and I suspect it is due to the backedge-taken count being marked as <nsw> instead of <nuw>.

Is that on purpose, is that a bug, or is my analysis incorrect? I am not sure where to fix that issue.

The backedge-taken count isn't nuw because nsw/nuw markings aren't flow-sensitive: there isn't any way to mark the trip count as nuw without marking every computation of `(long)n-2` as nuw.

There's some code in ScalarEvolution::howFarToZero to try to refine the max backedge-taken count in some cases, but it isn't very general.  See https://reviews.llvm.org/D28536 .

-Eli
-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [SCEV] Why is backedge-taken count <nsw> instead of <nuw>?

Daniel Sanders via llvm-dev
On 8/15/2018 2:27 PM, Alexandre Isoard wrote:
I'm not sure I understand the poison/undef/UB distinctions.

But on this example:

define i32 @func(i1 zeroext %b, i32 %x, i32 %y) {
entry:
  %adds = add nsw i32 %x, %y
  %addu = add nuw i32 %x, %y
  %cond = select i1 %b, i32 %adds, i32 %addu
  ret i32 %cond
}

It is important to not propagate the nsw/nuw between the two SCEV expressions (which unification would do today, can I consider that a bug or is it a feature?).

It's an intentional design choice.

So we work-around it by not informing SCEV of the flags:

Printing analysis 'Scalar Evolution Analysis' for function 'func':
Classifying expressions for: @func
  %adds = add nsw i32 %x, %y
  -->  (%x + %y) U: full-set S: full-set
  %addu = add nuw i32 %x, %y
  -->  (%x + %y) U: full-set S: full-set
  %cond = select i1 %b, i32 %adds, i32 %addu
  -->  %cond U: full-set S: full-set
Determining loop execution counts for: @func

Would there be problems if we properly considered nuw/nsw flags when unifying SCEVs?

There would be other consequences.  For example, `(%x + %y)<nsw>` and `(%x + %y)<nuw>` wouldn't compare equal for other simplifications, and all the places that call setNoWrapFlags would have to be rewritten.  It's probably possible to come up with some workable design, but nobody has actually tried it, so it's not clear how much work it would be to implement, or whether it would improve the generated code overall.

-Eli


On Wed, Aug 15, 2018 at 1:59 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 1:31 PM, Alexandre Isoard wrote:
Is that why we do not deduce +<nsw> from "add nsw" either?

Essentially, yes.

Is that an intrinsic limitation of creating a context-invariant expressions from a Value* or is that a limitation of our implementation (our unification not considering the nsw flags)?

It's a consequence of unification not considering nsw.  (nsw on an instruction is naturally invariant because violating nsw produces poison, not UB.)

-Eli


On Wed, Aug 15, 2018 at 12:39 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 12:21 PM, Alexandre Isoard via llvm-dev wrote:
Hello,

If I run clang on the following code:

void func(unsigned n) {
  for (unsigned long x = 1; x < n; ++x)
    dummy(x);
}

I get the following llvm ir:

define void @func(i32 %n) {
entry:
  %conv = zext i32 %n to i64
  %cmp5 = icmp ugt i32 %n, 1
  br i1 %cmp5, label %for.body, label %for.cond.cleanup
for.cond.cleanup:                                 ; preds = %for.body, %entry
  ret void
for.body:                                         ; preds = %entry, %for.body
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  tail call void @dummy(i64 %x.06) #2
  %inc = add nuw nsw i64 %x.06, 1
  %exitcond = icmp eq i64 %inc, %conv
  br i1 %exitcond, label %for.cond.cleanup, label %for.body
}

Over which, SCEV will provide the following analysis:

Printing analysis 'Scalar Evolution Analysis' for function 'func':
Classifying expressions for: @func
  %conv = zext i32 %n to i64
  -->  (zext i32 %n to i64) U: [0,4294967296) S: [0,4294967296)
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  -->  {1,+,1}<nuw><nsw><%for.body> U: [1,-9223372036854775808) S: [1,-9223372036854775808) Exits: (-1 + (zext i32 %n to i64)) LoopDispositions: { %for.body: Computable }
  %inc = add nuw nsw i64 %x.06, 1
  -->  {2,+,1}<nuw><%for.body> U: [2,0) S: [2,0) Exits: (zext i32 %n to i64) LoopDispositions: { %for.body: Computable }
Determining loop execution counts for: @func
Loop %for.body: backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
Loop %for.body: max backedge-taken count is -2
Loop %for.body: Predicated backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
 Predicates:
Loop %for.body: Trip multiple is 1

Now, I was surprised by the max backedge-taken count being -2, and I suspect it is due to the backedge-taken count being marked as <nsw> instead of <nuw>.

Is that on purpose, is that a bug, or is my analysis incorrect? I am not sure where to fix that issue.

The backedge-taken count isn't nuw because nsw/nuw markings aren't flow-sensitive: there isn't any way to mark the trip count as nuw without marking every computation of `(long)n-2` as nuw.

There's some code in ScalarEvolution::howFarToZero to try to refine the max backedge-taken count in some cases, but it isn't very general.  See https://reviews.llvm.org/D28536 .

-Eli
-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [SCEV] Why is backedge-taken count <nsw> instead of <nuw>?

Daniel Sanders via llvm-dev
Ok.

To go back to the original issue, would it be meaningful to add a SCEVUMax(0, BTC) on the final BTC computed by SCEV?

So that it does not use "negative values"?

On Wed, Aug 15, 2018 at 2:40 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 2:27 PM, Alexandre Isoard wrote:
I'm not sure I understand the poison/undef/UB distinctions.

But on this example:

define i32 @func(i1 zeroext %b, i32 %x, i32 %y) {
entry:
  %adds = add nsw i32 %x, %y
  %addu = add nuw i32 %x, %y
  %cond = select i1 %b, i32 %adds, i32 %addu
  ret i32 %cond
}

It is important to not propagate the nsw/nuw between the two SCEV expressions (which unification would do today, can I consider that a bug or is it a feature?).

It's an intentional design choice.

So we work-around it by not informing SCEV of the flags:

Printing analysis 'Scalar Evolution Analysis' for function 'func':
Classifying expressions for: @func
  %adds = add nsw i32 %x, %y
  -->  (%x + %y) U: full-set S: full-set
  %addu = add nuw i32 %x, %y
  -->  (%x + %y) U: full-set S: full-set
  %cond = select i1 %b, i32 %adds, i32 %addu
  -->  %cond U: full-set S: full-set
Determining loop execution counts for: @func

Would there be problems if we properly considered nuw/nsw flags when unifying SCEVs?

There would be other consequences.  For example, `(%x + %y)<nsw>` and `(%x + %y)<nuw>` wouldn't compare equal for other simplifications, and all the places that call setNoWrapFlags would have to be rewritten.  It's probably possible to come up with some workable design, but nobody has actually tried it, so it's not clear how much work it would be to implement, or whether it would improve the generated code overall.

-Eli


On Wed, Aug 15, 2018 at 1:59 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 1:31 PM, Alexandre Isoard wrote:
Is that why we do not deduce +<nsw> from "add nsw" either?

Essentially, yes.

Is that an intrinsic limitation of creating a context-invariant expressions from a Value* or is that a limitation of our implementation (our unification not considering the nsw flags)?

It's a consequence of unification not considering nsw.  (nsw on an instruction is naturally invariant because violating nsw produces poison, not UB.)

-Eli


On Wed, Aug 15, 2018 at 12:39 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 12:21 PM, Alexandre Isoard via llvm-dev wrote:
Hello,

If I run clang on the following code:

void func(unsigned n) {
  for (unsigned long x = 1; x < n; ++x)
    dummy(x);
}

I get the following llvm ir:

define void @func(i32 %n) {
entry:
  %conv = zext i32 %n to i64
  %cmp5 = icmp ugt i32 %n, 1
  br i1 %cmp5, label %for.body, label %for.cond.cleanup
for.cond.cleanup:                                 ; preds = %for.body, %entry
  ret void
for.body:                                         ; preds = %entry, %for.body
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  tail call void @dummy(i64 %x.06) #2
  %inc = add nuw nsw i64 %x.06, 1
  %exitcond = icmp eq i64 %inc, %conv
  br i1 %exitcond, label %for.cond.cleanup, label %for.body
}

Over which, SCEV will provide the following analysis:

Printing analysis 'Scalar Evolution Analysis' for function 'func':
Classifying expressions for: @func
  %conv = zext i32 %n to i64
  -->  (zext i32 %n to i64) U: [0,4294967296) S: [0,4294967296)
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  -->  {1,+,1}<nuw><nsw><%for.body> U: [1,-9223372036854775808) S: [1,-9223372036854775808) Exits: (-1 + (zext i32 %n to i64)) LoopDispositions: { %for.body: Computable }
  %inc = add nuw nsw i64 %x.06, 1
  -->  {2,+,1}<nuw><%for.body> U: [2,0) S: [2,0) Exits: (zext i32 %n to i64) LoopDispositions: { %for.body: Computable }
Determining loop execution counts for: @func
Loop %for.body: backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
Loop %for.body: max backedge-taken count is -2
Loop %for.body: Predicated backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
 Predicates:
Loop %for.body: Trip multiple is 1

Now, I was surprised by the max backedge-taken count being -2, and I suspect it is due to the backedge-taken count being marked as <nsw> instead of <nuw>.

Is that on purpose, is that a bug, or is my analysis incorrect? I am not sure where to fix that issue.

The backedge-taken count isn't nuw because nsw/nuw markings aren't flow-sensitive: there isn't any way to mark the trip count as nuw without marking every computation of `(long)n-2` as nuw.

There's some code in ScalarEvolution::howFarToZero to try to refine the max backedge-taken count in some cases, but it isn't very general.  See https://reviews.llvm.org/D28536 .

-Eli
-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [SCEV] Why is backedge-taken count <nsw> instead of <nuw>?

Daniel Sanders via llvm-dev
Hum, actually that gets eliminated to nothing of course. What we need is really the NUW flag...

On Thu, Aug 16, 2018 at 4:09 PM Alexandre Isoard <[hidden email]> wrote:
Ok.

To go back to the original issue, would it be meaningful to add a SCEVUMax(0, BTC) on the final BTC computed by SCEV?

So that it does not use "negative values"?

On Wed, Aug 15, 2018 at 2:40 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 2:27 PM, Alexandre Isoard wrote:
I'm not sure I understand the poison/undef/UB distinctions.

But on this example:

define i32 @func(i1 zeroext %b, i32 %x, i32 %y) {
entry:
  %adds = add nsw i32 %x, %y
  %addu = add nuw i32 %x, %y
  %cond = select i1 %b, i32 %adds, i32 %addu
  ret i32 %cond
}

It is important to not propagate the nsw/nuw between the two SCEV expressions (which unification would do today, can I consider that a bug or is it a feature?).

It's an intentional design choice.

So we work-around it by not informing SCEV of the flags:

Printing analysis 'Scalar Evolution Analysis' for function 'func':
Classifying expressions for: @func
  %adds = add nsw i32 %x, %y
  -->  (%x + %y) U: full-set S: full-set
  %addu = add nuw i32 %x, %y
  -->  (%x + %y) U: full-set S: full-set
  %cond = select i1 %b, i32 %adds, i32 %addu
  -->  %cond U: full-set S: full-set
Determining loop execution counts for: @func

Would there be problems if we properly considered nuw/nsw flags when unifying SCEVs?

There would be other consequences.  For example, `(%x + %y)<nsw>` and `(%x + %y)<nuw>` wouldn't compare equal for other simplifications, and all the places that call setNoWrapFlags would have to be rewritten.  It's probably possible to come up with some workable design, but nobody has actually tried it, so it's not clear how much work it would be to implement, or whether it would improve the generated code overall.

-Eli


On Wed, Aug 15, 2018 at 1:59 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 1:31 PM, Alexandre Isoard wrote:
Is that why we do not deduce +<nsw> from "add nsw" either?

Essentially, yes.

Is that an intrinsic limitation of creating a context-invariant expressions from a Value* or is that a limitation of our implementation (our unification not considering the nsw flags)?

It's a consequence of unification not considering nsw.  (nsw on an instruction is naturally invariant because violating nsw produces poison, not UB.)

-Eli


On Wed, Aug 15, 2018 at 12:39 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 12:21 PM, Alexandre Isoard via llvm-dev wrote:
Hello,

If I run clang on the following code:

void func(unsigned n) {
  for (unsigned long x = 1; x < n; ++x)
    dummy(x);
}

I get the following llvm ir:

define void @func(i32 %n) {
entry:
  %conv = zext i32 %n to i64
  %cmp5 = icmp ugt i32 %n, 1
  br i1 %cmp5, label %for.body, label %for.cond.cleanup
for.cond.cleanup:                                 ; preds = %for.body, %entry
  ret void
for.body:                                         ; preds = %entry, %for.body
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  tail call void @dummy(i64 %x.06) #2
  %inc = add nuw nsw i64 %x.06, 1
  %exitcond = icmp eq i64 %inc, %conv
  br i1 %exitcond, label %for.cond.cleanup, label %for.body
}

Over which, SCEV will provide the following analysis:

Printing analysis 'Scalar Evolution Analysis' for function 'func':
Classifying expressions for: @func
  %conv = zext i32 %n to i64
  -->  (zext i32 %n to i64) U: [0,4294967296) S: [0,4294967296)
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  -->  {1,+,1}<nuw><nsw><%for.body> U: [1,-9223372036854775808) S: [1,-9223372036854775808) Exits: (-1 + (zext i32 %n to i64)) LoopDispositions: { %for.body: Computable }
  %inc = add nuw nsw i64 %x.06, 1
  -->  {2,+,1}<nuw><%for.body> U: [2,0) S: [2,0) Exits: (zext i32 %n to i64) LoopDispositions: { %for.body: Computable }
Determining loop execution counts for: @func
Loop %for.body: backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
Loop %for.body: max backedge-taken count is -2
Loop %for.body: Predicated backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
 Predicates:
Loop %for.body: Trip multiple is 1

Now, I was surprised by the max backedge-taken count being -2, and I suspect it is due to the backedge-taken count being marked as <nsw> instead of <nuw>.

Is that on purpose, is that a bug, or is my analysis incorrect? I am not sure where to fix that issue.

The backedge-taken count isn't nuw because nsw/nuw markings aren't flow-sensitive: there isn't any way to mark the trip count as nuw without marking every computation of `(long)n-2` as nuw.

There's some code in ScalarEvolution::howFarToZero to try to refine the max backedge-taken count in some cases, but it isn't very general.  See https://reviews.llvm.org/D28536 .

-Eli
-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard


--
Alexandre Isoard

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [SCEV] Why is backedge-taken count <nsw> instead of <nuw>?

Daniel Sanders via llvm-dev
In reply to this post by Daniel Sanders via llvm-dev
In general, the backedge-taken count is an unsigned value; it's possible to write a loop with a trip count of 2^64 using a 64-bit induction variable.  To prove your loop has a "small" trip count, you have to use either the guard or the nsw/nuw markings on the induction variable.

-Eli

On 8/16/2018 4:09 PM, Alexandre Isoard wrote:
Ok.

To go back to the original issue, would it be meaningful to add a SCEVUMax(0, BTC) on the final BTC computed by SCEV?

So that it does not use "negative values"?

On Wed, Aug 15, 2018 at 2:40 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 2:27 PM, Alexandre Isoard wrote:
I'm not sure I understand the poison/undef/UB distinctions.

But on this example:

define i32 @func(i1 zeroext %b, i32 %x, i32 %y) {
entry:
  %adds = add nsw i32 %x, %y
  %addu = add nuw i32 %x, %y
  %cond = select i1 %b, i32 %adds, i32 %addu
  ret i32 %cond
}

It is important to not propagate the nsw/nuw between the two SCEV expressions (which unification would do today, can I consider that a bug or is it a feature?).

It's an intentional design choice.

So we work-around it by not informing SCEV of the flags:

Printing analysis 'Scalar Evolution Analysis' for function 'func':
Classifying expressions for: @func
  %adds = add nsw i32 %x, %y
  -->  (%x + %y) U: full-set S: full-set
  %addu = add nuw i32 %x, %y
  -->  (%x + %y) U: full-set S: full-set
  %cond = select i1 %b, i32 %adds, i32 %addu
  -->  %cond U: full-set S: full-set
Determining loop execution counts for: @func

Would there be problems if we properly considered nuw/nsw flags when unifying SCEVs?

There would be other consequences.  For example, `(%x + %y)<nsw>` and `(%x + %y)<nuw>` wouldn't compare equal for other simplifications, and all the places that call setNoWrapFlags would have to be rewritten.  It's probably possible to come up with some workable design, but nobody has actually tried it, so it's not clear how much work it would be to implement, or whether it would improve the generated code overall.

-Eli


On Wed, Aug 15, 2018 at 1:59 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 1:31 PM, Alexandre Isoard wrote:
Is that why we do not deduce +<nsw> from "add nsw" either?

Essentially, yes.

Is that an intrinsic limitation of creating a context-invariant expressions from a Value* or is that a limitation of our implementation (our unification not considering the nsw flags)?

It's a consequence of unification not considering nsw.  (nsw on an instruction is naturally invariant because violating nsw produces poison, not UB.)

-Eli


On Wed, Aug 15, 2018 at 12:39 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 12:21 PM, Alexandre Isoard via llvm-dev wrote:
Hello,

If I run clang on the following code:

void func(unsigned n) {
  for (unsigned long x = 1; x < n; ++x)
    dummy(x);
}

I get the following llvm ir:

define void @func(i32 %n) {
entry:
  %conv = zext i32 %n to i64
  %cmp5 = icmp ugt i32 %n, 1
  br i1 %cmp5, label %for.body, label %for.cond.cleanup
for.cond.cleanup:                                 ; preds = %for.body, %entry
  ret void
for.body:                                         ; preds = %entry, %for.body
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  tail call void @dummy(i64 %x.06) #2
  %inc = add nuw nsw i64 %x.06, 1
  %exitcond = icmp eq i64 %inc, %conv
  br i1 %exitcond, label %for.cond.cleanup, label %for.body
}

Over which, SCEV will provide the following analysis:

Printing analysis 'Scalar Evolution Analysis' for function 'func':
Classifying expressions for: @func
  %conv = zext i32 %n to i64
  -->  (zext i32 %n to i64) U: [0,4294967296) S: [0,4294967296)
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  -->  {1,+,1}<nuw><nsw><%for.body> U: [1,-9223372036854775808) S: [1,-9223372036854775808) Exits: (-1 + (zext i32 %n to i64)) LoopDispositions: { %for.body: Computable }
  %inc = add nuw nsw i64 %x.06, 1
  -->  {2,+,1}<nuw><%for.body> U: [2,0) S: [2,0) Exits: (zext i32 %n to i64) LoopDispositions: { %for.body: Computable }
Determining loop execution counts for: @func
Loop %for.body: backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
Loop %for.body: max backedge-taken count is -2
Loop %for.body: Predicated backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
 Predicates:
Loop %for.body: Trip multiple is 1

Now, I was surprised by the max backedge-taken count being -2, and I suspect it is due to the backedge-taken count being marked as <nsw> instead of <nuw>.

Is that on purpose, is that a bug, or is my analysis incorrect? I am not sure where to fix that issue.

The backedge-taken count isn't nuw because nsw/nuw markings aren't flow-sensitive: there isn't any way to mark the trip count as nuw without marking every computation of `(long)n-2` as nuw.

There's some code in ScalarEvolution::howFarToZero to try to refine the max backedge-taken count in some cases, but it isn't very general.  See https://reviews.llvm.org/D28536 .

-Eli
-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [SCEV] Why is backedge-taken count <nsw> instead of <nuw>?

Daniel Sanders via llvm-dev
The loop exits iff  {2,+,1}<nuw><%for.body> ==  (zext i32 %n to i64)

The nuw marking on the "induction variable" should be sufficient to deduce a max loop trip count of 2^32.
But I do not know how we compute it (we build a database and it is contrived to follow, at least to me).

I saw that we annotate it with <nsw> (which is correct and can be (and probably has been) deduced from the ranges) and following our discussion, we can't annotate it with <nuw> as per limitation of our unification.

So, I am kind of in a bind here...

On Thu, Aug 16, 2018 at 4:34 PM Friedman, Eli <[hidden email]> wrote:
In general, the backedge-taken count is an unsigned value; it's possible to write a loop with a trip count of 2^64 using a 64-bit induction variable.  To prove your loop has a "small" trip count, you have to use either the guard or the nsw/nuw markings on the induction variable.

-Eli

On 8/16/2018 4:09 PM, Alexandre Isoard wrote:
Ok.

To go back to the original issue, would it be meaningful to add a SCEVUMax(0, BTC) on the final BTC computed by SCEV?

So that it does not use "negative values"?

On Wed, Aug 15, 2018 at 2:40 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 2:27 PM, Alexandre Isoard wrote:
I'm not sure I understand the poison/undef/UB distinctions.

But on this example:

define i32 @func(i1 zeroext %b, i32 %x, i32 %y) {
entry:
  %adds = add nsw i32 %x, %y
  %addu = add nuw i32 %x, %y
  %cond = select i1 %b, i32 %adds, i32 %addu
  ret i32 %cond
}

It is important to not propagate the nsw/nuw between the two SCEV expressions (which unification would do today, can I consider that a bug or is it a feature?).

It's an intentional design choice.

So we work-around it by not informing SCEV of the flags:

Printing analysis 'Scalar Evolution Analysis' for function 'func':
Classifying expressions for: @func
  %adds = add nsw i32 %x, %y
  -->  (%x + %y) U: full-set S: full-set
  %addu = add nuw i32 %x, %y
  -->  (%x + %y) U: full-set S: full-set
  %cond = select i1 %b, i32 %adds, i32 %addu
  -->  %cond U: full-set S: full-set
Determining loop execution counts for: @func

Would there be problems if we properly considered nuw/nsw flags when unifying SCEVs?

There would be other consequences.  For example, `(%x + %y)<nsw>` and `(%x + %y)<nuw>` wouldn't compare equal for other simplifications, and all the places that call setNoWrapFlags would have to be rewritten.  It's probably possible to come up with some workable design, but nobody has actually tried it, so it's not clear how much work it would be to implement, or whether it would improve the generated code overall.

-Eli


On Wed, Aug 15, 2018 at 1:59 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 1:31 PM, Alexandre Isoard wrote:
Is that why we do not deduce +<nsw> from "add nsw" either?

Essentially, yes.

Is that an intrinsic limitation of creating a context-invariant expressions from a Value* or is that a limitation of our implementation (our unification not considering the nsw flags)?

It's a consequence of unification not considering nsw.  (nsw on an instruction is naturally invariant because violating nsw produces poison, not UB.)

-Eli


On Wed, Aug 15, 2018 at 12:39 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 12:21 PM, Alexandre Isoard via llvm-dev wrote:
Hello,

If I run clang on the following code:

void func(unsigned n) {
  for (unsigned long x = 1; x < n; ++x)
    dummy(x);
}

I get the following llvm ir:

define void @func(i32 %n) {
entry:
  %conv = zext i32 %n to i64
  %cmp5 = icmp ugt i32 %n, 1
  br i1 %cmp5, label %for.body, label %for.cond.cleanup
for.cond.cleanup:                                 ; preds = %for.body, %entry
  ret void
for.body:                                         ; preds = %entry, %for.body
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  tail call void @dummy(i64 %x.06) #2
  %inc = add nuw nsw i64 %x.06, 1
  %exitcond = icmp eq i64 %inc, %conv
  br i1 %exitcond, label %for.cond.cleanup, label %for.body
}

Over which, SCEV will provide the following analysis:

Printing analysis 'Scalar Evolution Analysis' for function 'func':
Classifying expressions for: @func
  %conv = zext i32 %n to i64
  -->  (zext i32 %n to i64) U: [0,4294967296) S: [0,4294967296)
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  -->  {1,+,1}<nuw><nsw><%for.body> U: [1,-9223372036854775808) S: [1,-9223372036854775808) Exits: (-1 + (zext i32 %n to i64)) LoopDispositions: { %for.body: Computable }
  %inc = add nuw nsw i64 %x.06, 1
  -->  {2,+,1}<nuw><%for.body> U: [2,0) S: [2,0) Exits: (zext i32 %n to i64) LoopDispositions: { %for.body: Computable }
Determining loop execution counts for: @func
Loop %for.body: backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
Loop %for.body: max backedge-taken count is -2
Loop %for.body: Predicated backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
 Predicates:
Loop %for.body: Trip multiple is 1

Now, I was surprised by the max backedge-taken count being -2, and I suspect it is due to the backedge-taken count being marked as <nsw> instead of <nuw>.

Is that on purpose, is that a bug, or is my analysis incorrect? I am not sure where to fix that issue.

The backedge-taken count isn't nuw because nsw/nuw markings aren't flow-sensitive: there isn't any way to mark the trip count as nuw without marking every computation of `(long)n-2` as nuw.

There's some code in ScalarEvolution::howFarToZero to try to refine the max backedge-taken count in some cases, but it isn't very general.  See https://reviews.llvm.org/D28536 .

-Eli
-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [SCEV] Why is backedge-taken count <nsw> instead of <nuw>?

Daniel Sanders via llvm-dev
Is the issue that nothing knows that n isn't 0? In which case the loop would run close to 2^64-1 iterations. Is the -2 just a failure to print 2^64 - 2 as a positive number?

~Craig


On Thu, Aug 16, 2018 at 4:54 PM Alexandre Isoard via llvm-dev <[hidden email]> wrote:
The loop exits iff  {2,+,1}<nuw><%for.body> ==  (zext i32 %n to i64)

The nuw marking on the "induction variable" should be sufficient to deduce a max loop trip count of 2^32.
But I do not know how we compute it (we build a database and it is contrived to follow, at least to me).

I saw that we annotate it with <nsw> (which is correct and can be (and probably has been) deduced from the ranges) and following our discussion, we can't annotate it with <nuw> as per limitation of our unification.

So, I am kind of in a bind here...

On Thu, Aug 16, 2018 at 4:34 PM Friedman, Eli <[hidden email]> wrote:
In general, the backedge-taken count is an unsigned value; it's possible to write a loop with a trip count of 2^64 using a 64-bit induction variable.  To prove your loop has a "small" trip count, you have to use either the guard or the nsw/nuw markings on the induction variable.

-Eli

On 8/16/2018 4:09 PM, Alexandre Isoard wrote:
Ok.

To go back to the original issue, would it be meaningful to add a SCEVUMax(0, BTC) on the final BTC computed by SCEV?

So that it does not use "negative values"?

On Wed, Aug 15, 2018 at 2:40 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 2:27 PM, Alexandre Isoard wrote:
I'm not sure I understand the poison/undef/UB distinctions.

But on this example:

define i32 @func(i1 zeroext %b, i32 %x, i32 %y) {
entry:
  %adds = add nsw i32 %x, %y
  %addu = add nuw i32 %x, %y
  %cond = select i1 %b, i32 %adds, i32 %addu
  ret i32 %cond
}

It is important to not propagate the nsw/nuw between the two SCEV expressions (which unification would do today, can I consider that a bug or is it a feature?).

It's an intentional design choice.

So we work-around it by not informing SCEV of the flags:

Printing analysis 'Scalar Evolution Analysis' for function 'func':
Classifying expressions for: @func
  %adds = add nsw i32 %x, %y
  -->  (%x + %y) U: full-set S: full-set
  %addu = add nuw i32 %x, %y
  -->  (%x + %y) U: full-set S: full-set
  %cond = select i1 %b, i32 %adds, i32 %addu
  -->  %cond U: full-set S: full-set
Determining loop execution counts for: @func

Would there be problems if we properly considered nuw/nsw flags when unifying SCEVs?

There would be other consequences.  For example, `(%x + %y)<nsw>` and `(%x + %y)<nuw>` wouldn't compare equal for other simplifications, and all the places that call setNoWrapFlags would have to be rewritten.  It's probably possible to come up with some workable design, but nobody has actually tried it, so it's not clear how much work it would be to implement, or whether it would improve the generated code overall.

-Eli


On Wed, Aug 15, 2018 at 1:59 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 1:31 PM, Alexandre Isoard wrote:
Is that why we do not deduce +<nsw> from "add nsw" either?

Essentially, yes.

Is that an intrinsic limitation of creating a context-invariant expressions from a Value* or is that a limitation of our implementation (our unification not considering the nsw flags)?

It's a consequence of unification not considering nsw.  (nsw on an instruction is naturally invariant because violating nsw produces poison, not UB.)

-Eli


On Wed, Aug 15, 2018 at 12:39 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 12:21 PM, Alexandre Isoard via llvm-dev wrote:
Hello,

If I run clang on the following code:

void func(unsigned n) {
  for (unsigned long x = 1; x < n; ++x)
    dummy(x);
}

I get the following llvm ir:

define void @func(i32 %n) {
entry:
  %conv = zext i32 %n to i64
  %cmp5 = icmp ugt i32 %n, 1
  br i1 %cmp5, label %for.body, label %for.cond.cleanup
for.cond.cleanup:                                 ; preds = %for.body, %entry
  ret void
for.body:                                         ; preds = %entry, %for.body
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  tail call void @dummy(i64 %x.06) #2
  %inc = add nuw nsw i64 %x.06, 1
  %exitcond = icmp eq i64 %inc, %conv
  br i1 %exitcond, label %for.cond.cleanup, label %for.body
}

Over which, SCEV will provide the following analysis:

Printing analysis 'Scalar Evolution Analysis' for function 'func':
Classifying expressions for: @func
  %conv = zext i32 %n to i64
  -->  (zext i32 %n to i64) U: [0,4294967296) S: [0,4294967296)
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  -->  {1,+,1}<nuw><nsw><%for.body> U: [1,-9223372036854775808) S: [1,-9223372036854775808) Exits: (-1 + (zext i32 %n to i64)) LoopDispositions: { %for.body: Computable }
  %inc = add nuw nsw i64 %x.06, 1
  -->  {2,+,1}<nuw><%for.body> U: [2,0) S: [2,0) Exits: (zext i32 %n to i64) LoopDispositions: { %for.body: Computable }
Determining loop execution counts for: @func
Loop %for.body: backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
Loop %for.body: max backedge-taken count is -2
Loop %for.body: Predicated backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
 Predicates:
Loop %for.body: Trip multiple is 1

Now, I was surprised by the max backedge-taken count being -2, and I suspect it is due to the backedge-taken count being marked as <nsw> instead of <nuw>.

Is that on purpose, is that a bug, or is my analysis incorrect? I am not sure where to fix that issue.

The backedge-taken count isn't nuw because nsw/nuw markings aren't flow-sensitive: there isn't any way to mark the trip count as nuw without marking every computation of `(long)n-2` as nuw.

There's some code in ScalarEvolution::howFarToZero to try to refine the max backedge-taken count in some cases, but it isn't very general.  See https://reviews.llvm.org/D28536 .

-Eli
-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [SCEV] Why is backedge-taken count <nsw> instead of <nuw>?

Daniel Sanders via llvm-dev
In reply to this post by Daniel Sanders via llvm-dev
Well, like I said earlier, it's usually viable to check for a guard, along the lines of https://reviews.llvm.org/D28536 (or maybe implement some context-sensitive range analysis algorithm to use everywhere).

The alternative is to try to do something to preserve the nowrap flags we have on the input to the icmp, in ScalarEvolution::computeExitLimitFromICmp.

-Eli

On 8/16/2018 4:54 PM, Alexandre Isoard wrote:
The loop exits iff  {2,+,1}<nuw><%for.body> ==  (zext i32 %n to i64)

The nuw marking on the "induction variable" should be sufficient to deduce a max loop trip count of 2^32.
But I do not know how we compute it (we build a database and it is contrived to follow, at least to me).

I saw that we annotate it with <nsw> (which is correct and can be (and probably has been) deduced from the ranges) and following our discussion, we can't annotate it with <nuw> as per limitation of our unification.

So, I am kind of in a bind here...

On Thu, Aug 16, 2018 at 4:34 PM Friedman, Eli <[hidden email]> wrote:
In general, the backedge-taken count is an unsigned value; it's possible to write a loop with a trip count of 2^64 using a 64-bit induction variable.  To prove your loop has a "small" trip count, you have to use either the guard or the nsw/nuw markings on the induction variable.

-Eli

On 8/16/2018 4:09 PM, Alexandre Isoard wrote:
Ok.

To go back to the original issue, would it be meaningful to add a SCEVUMax(0, BTC) on the final BTC computed by SCEV?

So that it does not use "negative values"?

On Wed, Aug 15, 2018 at 2:40 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 2:27 PM, Alexandre Isoard wrote:
I'm not sure I understand the poison/undef/UB distinctions.

But on this example:

define i32 @func(i1 zeroext %b, i32 %x, i32 %y) {
entry:
  %adds = add nsw i32 %x, %y
  %addu = add nuw i32 %x, %y
  %cond = select i1 %b, i32 %adds, i32 %addu
  ret i32 %cond
}

It is important to not propagate the nsw/nuw between the two SCEV expressions (which unification would do today, can I consider that a bug or is it a feature?).

It's an intentional design choice.

So we work-around it by not informing SCEV of the flags:

Printing analysis 'Scalar Evolution Analysis' for function 'func':
Classifying expressions for: @func
  %adds = add nsw i32 %x, %y
  -->  (%x + %y) U: full-set S: full-set
  %addu = add nuw i32 %x, %y
  -->  (%x + %y) U: full-set S: full-set
  %cond = select i1 %b, i32 %adds, i32 %addu
  -->  %cond U: full-set S: full-set
Determining loop execution counts for: @func

Would there be problems if we properly considered nuw/nsw flags when unifying SCEVs?

There would be other consequences.  For example, `(%x + %y)<nsw>` and `(%x + %y)<nuw>` wouldn't compare equal for other simplifications, and all the places that call setNoWrapFlags would have to be rewritten.  It's probably possible to come up with some workable design, but nobody has actually tried it, so it's not clear how much work it would be to implement, or whether it would improve the generated code overall.

-Eli


On Wed, Aug 15, 2018 at 1:59 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 1:31 PM, Alexandre Isoard wrote:
Is that why we do not deduce +<nsw> from "add nsw" either?

Essentially, yes.

Is that an intrinsic limitation of creating a context-invariant expressions from a Value* or is that a limitation of our implementation (our unification not considering the nsw flags)?

It's a consequence of unification not considering nsw.  (nsw on an instruction is naturally invariant because violating nsw produces poison, not UB.)

-Eli


On Wed, Aug 15, 2018 at 12:39 PM Friedman, Eli <[hidden email]> wrote:
On 8/15/2018 12:21 PM, Alexandre Isoard via llvm-dev wrote:
Hello,

If I run clang on the following code:

void func(unsigned n) {
  for (unsigned long x = 1; x < n; ++x)
    dummy(x);
}

I get the following llvm ir:

define void @func(i32 %n) {
entry:
  %conv = zext i32 %n to i64
  %cmp5 = icmp ugt i32 %n, 1
  br i1 %cmp5, label %for.body, label %for.cond.cleanup
for.cond.cleanup:                                 ; preds = %for.body, %entry
  ret void
for.body:                                         ; preds = %entry, %for.body
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  tail call void @dummy(i64 %x.06) #2
  %inc = add nuw nsw i64 %x.06, 1
  %exitcond = icmp eq i64 %inc, %conv
  br i1 %exitcond, label %for.cond.cleanup, label %for.body
}

Over which, SCEV will provide the following analysis:

Printing analysis 'Scalar Evolution Analysis' for function 'func':
Classifying expressions for: @func
  %conv = zext i32 %n to i64
  -->  (zext i32 %n to i64) U: [0,4294967296) S: [0,4294967296)
  %x.06 = phi i64 [ %inc, %for.body ], [ 1, %entry ]
  -->  {1,+,1}<nuw><nsw><%for.body> U: [1,-9223372036854775808) S: [1,-9223372036854775808) Exits: (-1 + (zext i32 %n to i64)) LoopDispositions: { %for.body: Computable }
  %inc = add nuw nsw i64 %x.06, 1
  -->  {2,+,1}<nuw><%for.body> U: [2,0) S: [2,0) Exits: (zext i32 %n to i64) LoopDispositions: { %for.body: Computable }
Determining loop execution counts for: @func
Loop %for.body: backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
Loop %for.body: max backedge-taken count is -2
Loop %for.body: Predicated backedge-taken count is (-2 + (zext i32 %n to i64))<nsw>
 Predicates:
Loop %for.body: Trip multiple is 1

Now, I was surprised by the max backedge-taken count being -2, and I suspect it is due to the backedge-taken count being marked as <nsw> instead of <nuw>.

Is that on purpose, is that a bug, or is my analysis incorrect? I am not sure where to fix that issue.

The backedge-taken count isn't nuw because nsw/nuw markings aren't flow-sensitive: there isn't any way to mark the trip count as nuw without marking every computation of `(long)n-2` as nuw.

There's some code in ScalarEvolution::howFarToZero to try to refine the max backedge-taken count in some cases, but it isn't very general.  See https://reviews.llvm.org/D28536 .

-Eli
-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project


--
Alexandre Isoard


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev