[llvm-dev] Inlining + CSE + restrict pointers == funtimes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] Inlining + CSE + restrict pointers == funtimes

Jeremy Morse via llvm-dev
So I've been narrowing down a very fun issue in our Burst compiler stack with respect to noalias support, and I've managed to basically boil this down to the following failure (see https://godbolt.org/z/-mdjPV):

int called(int* __restrict__ a, int* b, int* c) {
return *a + *b + *c;
}

int foo(int * x, int * y) {
return *x + *y + called(x, x, y);
}

int bar(int * x, int * y) {
return called(x, x, y) + *x + *y;
}

Which becomes:

define dso_local i32 @called(i32* noalias nocapture readonly %0, i32* nocapture readonly %1, i32* nocapture readonly %2) local_unnamed_addr #0 !dbg !7 {
%4 = load i32, i32* %0, align 4, !dbg !19, !tbaa !20
%5 = load i32, i32* %1, align 4, !dbg !24, !tbaa !20
%6 = add nsw i32 %5, %4, !dbg !25
%7 = load i32, i32* %2, align 4, !dbg !26, !tbaa !20
%8 = add nsw i32 %6, %7, !dbg !27
ret i32 %8, !dbg !28
}

define dso_local i32 @foo(i32* nocapture readonly %0, i32* nocapture readonly %1) local_unnamed_addr #0 !dbg !29 {
%3 = load i32, i32* %0, align 4, !dbg !36, !tbaa !20
%4 = load i32, i32* %1, align 4, !dbg !37, !tbaa !20
%5 = add i32 %4, %3
%6 = shl i32 %5, 1
%7 = add i32 %6, %3, !dbg !38
ret i32 %7, !dbg !39
}

define dso_local i32 @bar(i32* nocapture readonly %0, i32* nocapture readonly %1) local_unnamed_addr #0 !dbg !40 {
%3 = load i32, i32* %0, align 4, !dbg !47, !tbaa !20, !alias.scope !48
%4 = load i32, i32* %1, align 4, !dbg !51, !tbaa !20, !noalias !48
%5 = add i32 %4, %3
%6 = shl i32 %5, 1
%7 = add i32 %6, %3, !dbg !52
ret i32 %7, !dbg !53
}

The issue is that CSE just looks at two loads from the same location and goes 'hey I can combine them!' but it doesn't take into account whether either load has extra aliasing information or not. So in foo it has turned a noalias pointer into an aliasing one, but in bar it has turned an aliasing pointer into a non-aliasing one.

I'm not sure what the C spec says (if anything) about this, but for us we'd like the behaviour to be defined.

Does anyone have any opinions on solving this before I drop a patch?
Should we perhaps make the behaviour to choose (alias over non-alias, or vice versa) controllable via a hidden CSE option?
What should we do in the presence of two conflicting sets of noalias information?

Sidenote: I'm aware of the 'full restrict' patch that has been circulated, but irrespective of whether that lands or not we'd still like to have some defined behaviour for the above case.

Cheers,
-Neil.
--
Neil Henning
Senior Software Engineer Compiler

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Inlining + CSE + restrict pointers == funtimes

Jeremy Morse via llvm-dev

Hi, Neil,

When you say CSE, do you mean EarlyCSE or GVN?

Is the metadata being combined using MDNode::intersect or AAMDNodes::intersect?

 -Hal

On 1/22/20 7:02 AM, Neil Henning via llvm-dev wrote:
So I've been narrowing down a very fun issue in our Burst compiler stack with respect to noalias support, and I've managed to basically boil this down to the following failure (see https://godbolt.org/z/-mdjPV):

int called(int* __restrict__ a, int* b, int* c) {
return *a + *b + *c;
}

int foo(int * x, int * y) {
return *x + *y + called(x, x, y);
}

int bar(int * x, int * y) {
return called(x, x, y) + *x + *y;
}

Which becomes:

define dso_local i32 @called(i32* noalias nocapture readonly %0, i32* nocapture readonly %1, i32* nocapture readonly %2) local_unnamed_addr #0 !dbg !7 {
%4 = load i32, i32* %0, align 4, !dbg !19, !tbaa !20
%5 = load i32, i32* %1, align 4, !dbg !24, !tbaa !20
%6 = add nsw i32 %5, %4, !dbg !25
%7 = load i32, i32* %2, align 4, !dbg !26, !tbaa !20
%8 = add nsw i32 %6, %7, !dbg !27
ret i32 %8, !dbg !28
}

define dso_local i32 @foo(i32* nocapture readonly %0, i32* nocapture readonly %1) local_unnamed_addr #0 !dbg !29 {
%3 = load i32, i32* %0, align 4, !dbg !36, !tbaa !20
%4 = load i32, i32* %1, align 4, !dbg !37, !tbaa !20
%5 = add i32 %4, %3
%6 = shl i32 %5, 1
%7 = add i32 %6, %3, !dbg !38
ret i32 %7, !dbg !39
}

define dso_local i32 @bar(i32* nocapture readonly %0, i32* nocapture readonly %1) local_unnamed_addr #0 !dbg !40 {
%3 = load i32, i32* %0, align 4, !dbg !47, !tbaa !20, !alias.scope !48
%4 = load i32, i32* %1, align 4, !dbg !51, !tbaa !20, !noalias !48
%5 = add i32 %4, %3
%6 = shl i32 %5, 1
%7 = add i32 %6, %3, !dbg !52
ret i32 %7, !dbg !53
}

The issue is that CSE just looks at two loads from the same location and goes 'hey I can combine them!' but it doesn't take into account whether either load has extra aliasing information or not. So in foo it has turned a noalias pointer into an aliasing one, but in bar it has turned an aliasing pointer into a non-aliasing one.

I'm not sure what the C spec says (if anything) about this, but for us we'd like the behaviour to be defined.

Does anyone have any opinions on solving this before I drop a patch?
Should we perhaps make the behaviour to choose (alias over non-alias, or vice versa) controllable via a hidden CSE option?
What should we do in the presence of two conflicting sets of noalias information?

Sidenote: I'm aware of the 'full restrict' patch that has been circulated, but irrespective of whether that lands or not we'd still like to have some defined behaviour for the above case.

Cheers,
-Neil.
--
Neil Henning
Senior Software Engineer Compiler

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Inlining + CSE + restrict pointers == funtimes

Jeremy Morse via llvm-dev
It's EarlyCSE that is doing the damage in this case:

; Function Attrs: norecurse nounwind readonly uwtable
define dso_local i32 @foo(i32* nocapture readonly %0, i32* nocapture readonly %1) local_unnamed_addr #0 {
  %3 = load i32, i32* %0, align 4, !tbaa !3
  %4 = load i32, i32* %1, align 4, !tbaa !3
  %5 = add nsw i32 %4, %3
  %6 = load i32, i32* %0, align 4, !tbaa !3, !alias.scope !7
  %7 = load i32, i32* %0, align 4, !tbaa !3, !noalias !7
  %8 = add nsw i32 %7, %6
  %9 = load i32, i32* %1, align 4, !tbaa !3, !noalias !7
  %10 = add nsw i32 %8, %9
  %11 = add nsw i32 %5, %10
  ret i32 %11
}
*** IR Dump After Early CSE w/ MemorySSA ***
; Function Attrs: norecurse nounwind readonly uwtable
define dso_local i32 @foo(i32* nocapture readonly %0, i32* nocapture readonly %1) local_unnamed_addr #0 {
  %3 = load i32, i32* %0, align 4, !tbaa !3
  %4 = load i32, i32* %1, align 4, !tbaa !3
  %5 = add nsw i32 %4, %3
  %6 = add nsw i32 %3, %3
  %7 = add nsw i32 %6, %4
  %8 = add nsw i32 %5, %7
  ret i32 %8
}

The problem is that EarlyCSE is not aware of the aliasing metadata at all - it just sees two loads that look the same and chooses one to die.

Cheers,
-Neil.

On Wed, Jan 22, 2020 at 2:14 PM Finkel, Hal J. <[hidden email]> wrote:

Hi, Neil,

When you say CSE, do you mean EarlyCSE or GVN?

Is the metadata being combined using MDNode::intersect or AAMDNodes::intersect?

 -Hal

On 1/22/20 7:02 AM, Neil Henning via llvm-dev wrote:
So I've been narrowing down a very fun issue in our Burst compiler stack with respect to noalias support, and I've managed to basically boil this down to the following failure (see https://godbolt.org/z/-mdjPV):

int called(int* __restrict__ a, int* b, int* c) {
return *a + *b + *c;
}

int foo(int * x, int * y) {
return *x + *y + called(x, x, y);
}

int bar(int * x, int * y) {
return called(x, x, y) + *x + *y;
}

Which becomes:

define dso_local i32 @called(i32* noalias nocapture readonly %0, i32* nocapture readonly %1, i32* nocapture readonly %2) local_unnamed_addr #0 !dbg !7 {
%4 = load i32, i32* %0, align 4, !dbg !19, !tbaa !20
%5 = load i32, i32* %1, align 4, !dbg !24, !tbaa !20
%6 = add nsw i32 %5, %4, !dbg !25
%7 = load i32, i32* %2, align 4, !dbg !26, !tbaa !20
%8 = add nsw i32 %6, %7, !dbg !27
ret i32 %8, !dbg !28
}

define dso_local i32 @foo(i32* nocapture readonly %0, i32* nocapture readonly %1) local_unnamed_addr #0 !dbg !29 {
%3 = load i32, i32* %0, align 4, !dbg !36, !tbaa !20
%4 = load i32, i32* %1, align 4, !dbg !37, !tbaa !20
%5 = add i32 %4, %3
%6 = shl i32 %5, 1
%7 = add i32 %6, %3, !dbg !38
ret i32 %7, !dbg !39
}

define dso_local i32 @bar(i32* nocapture readonly %0, i32* nocapture readonly %1) local_unnamed_addr #0 !dbg !40 {
%3 = load i32, i32* %0, align 4, !dbg !47, !tbaa !20, !alias.scope !48
%4 = load i32, i32* %1, align 4, !dbg !51, !tbaa !20, !noalias !48
%5 = add i32 %4, %3
%6 = shl i32 %5, 1
%7 = add i32 %6, %3, !dbg !52
ret i32 %7, !dbg !53
}

The issue is that CSE just looks at two loads from the same location and goes 'hey I can combine them!' but it doesn't take into account whether either load has extra aliasing information or not. So in foo it has turned a noalias pointer into an aliasing one, but in bar it has turned an aliasing pointer into a non-aliasing one.

I'm not sure what the C spec says (if anything) about this, but for us we'd like the behaviour to be defined.

Does anyone have any opinions on solving this before I drop a patch?
Should we perhaps make the behaviour to choose (alias over non-alias, or vice versa) controllable via a hidden CSE option?
What should we do in the presence of two conflicting sets of noalias information?

Sidenote: I'm aware of the 'full restrict' patch that has been circulated, but irrespective of whether that lands or not we'd still like to have some defined behaviour for the above case.

Cheers,
-Neil.
--
Neil Henning
Senior Software Engineer Compiler

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory


--
Neil Henning
Senior Software Engineer Compiler

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Inlining + CSE + restrict pointers == funtimes

Jeremy Morse via llvm-dev
In reply to this post by Jeremy Morse via llvm-dev

Hi Neil, Hall,

 

- as far as 'C' is concerned, this is input code is valid, as the pointers are not used to modify objects.

- as far as 'llvm LangRef' is concerned, this is invalid code:

   noalias

This indicates that objects accessed via pointer values based on the argument or return value are not also accessed, during the execution of the function, via pointer values not based on the argument or return value. The attribute on a return value also has additional semantics described below. The caller shares the responsibility with the callee for ensuring that these requirements are met. For further details, please see the discussion of the NoAlias response in alias analysis.

 

- clang can proof that after inlining, *a and *b are identical, so it can omit one of the loads.

 

- I think that both versions are acceptable (for C):

-- (A) dropping the noalias information will result in less optimization opportunities

-- (B) preferring the load with the noalias annotation can allow more reorderings and a better schedule.

  (especially when there would be a 'store' in the context where the restrict is valid; aka in 'called').

 

My preference would go to (B) as that opens up more optimization opportunities.

 

Notes:

- clang with 'full restrict' has similar behavior as the standard clang.

- the collapsing seems to be happening in 'EarlyCSE'.

 

Greetings,

 

Jeroen Dobbelaere

 

 

 

From: llvm-dev <[hidden email]> On Behalf Of Neil Henning via llvm-dev
Sent: Wednesday, January 22, 2020 14:02
To: llvm-dev <[hidden email]>
Subject: [llvm-dev] Inlining + CSE + restrict pointers == funtimes

 

So I've been narrowing down a very fun issue in our Burst compiler stack with respect to noalias support, and I've managed to basically boil this down to the following failure (see https://godbolt.org/z/-mdjPV):

 

int called(int* __restrict__ a, int* b, int* c) {

return *a + *b + *c;

}

 

int foo(int * x, int * y) {

return *x + *y + called(x, x, y);

}

 

int bar(int * x, int * y) {

return called(x, x, y) + *x + *y;

}

 

Which becomes:

 

define dso_local i32 @called(i32* noalias nocapture readonly %0, i32* nocapture readonly %1, i32* nocapture readonly %2) local_unnamed_addr #0 !dbg !7 {

%4 = load i32, i32* %0, align 4, !dbg !19, !tbaa !20

%5 = load i32, i32* %1, align 4, !dbg !24, !tbaa !20

%6 = add nsw i32 %5, %4, !dbg !25

%7 = load i32, i32* %2, align 4, !dbg !26, !tbaa !20

%8 = add nsw i32 %6, %7, !dbg !27

ret i32 %8, !dbg !28

}

 

define dso_local i32 @foo(i32* nocapture readonly %0, i32* nocapture readonly %1) local_unnamed_addr #0 !dbg !29 {

%3 = load i32, i32* %0, align 4, !dbg !36, !tbaa !20

%4 = load i32, i32* %1, align 4, !dbg !37, !tbaa !20

%5 = add i32 %4, %3

%6 = shl i32 %5, 1

%7 = add i32 %6, %3, !dbg !38

ret i32 %7, !dbg !39

}

 

define dso_local i32 @bar(i32* nocapture readonly %0, i32* nocapture readonly %1) local_unnamed_addr #0 !dbg !40 {

%3 = load i32, i32* %0, align 4, !dbg !47, !tbaa !20, !alias.scope !48

%4 = load i32, i32* %1, align 4, !dbg !51, !tbaa !20, !noalias !48

%5 = add i32 %4, %3

%6 = shl i32 %5, 1

%7 = add i32 %6, %3, !dbg !52

ret i32 %7, !dbg !53

}

 

The issue is that CSE just looks at two loads from the same location and goes 'hey I can combine them!' but it doesn't take into account whether either load has extra aliasing information or not. So in foo it has turned a noalias pointer into an aliasing one, but in bar it has turned an aliasing pointer into a non-aliasing one.

 

I'm not sure what the C spec says (if anything) about this, but for us we'd like the behaviour to be defined.

 

Does anyone have any opinions on solving this before I drop a patch?

Should we perhaps make the behaviour to choose (alias over non-alias, or vice versa) controllable via a hidden CSE option?

What should we do in the presence of two conflicting sets of noalias information?

 

Sidenote: I'm aware of the 'full restrict' patch that has been circulated, but irrespective of whether that lands or not we'd still like to have some defined behaviour for the above case.

 

Cheers,

-Neil.

--

Image removed by sender.

Neil Henning

Senior Software Engineer Compiler

 


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Inlining + CSE + restrict pointers == funtimes

Jeremy Morse via llvm-dev

At a high level, EarlyCSE should be intersecting the metadata of instructions that it combines. If it doesn't, and also doesn't drop the metadata, that seems like a bug, regardless of anything else.

On 1/22/20 9:30 AM, Jeroen Dobbelaere wrote:

Hi Neil, Hall,

 

- as far as 'C' is concerned, this is input code is valid, as the pointers are not used to modify objects.

- as far as 'llvm LangRef' is concerned, this is invalid code:

   noalias

This indicates that objects accessed via pointer values based on the argument or return value are not also accessed, during the execution of the function, via pointer values not based on the argument or return value. The attribute on a return value also has additional semantics described below. The caller shares the responsibility with the callee for ensuring that these requirements are met. For further details, please see the discussion of the NoAlias response in alias analysis.

 

- clang can proof that after inlining, *a and *b are identical, so it can omit one of the loads.

 

- I think that both versions are acceptable (for C):

-- (A) dropping the noalias information will result in less optimization opportunities

-- (B) preferring the load with the noalias annotation can allow more reorderings and a better schedule.

  (especially when there would be a 'store' in the context where the restrict is valid; aka in 'called').

 

My preference would go to (B) as that opens up more optimization opportunities.


I'm concerned that this isn't sound. So, imagine if I had something like this:

int * restrict r = a;
...
int x = noaliasing ? *r : *a;

then we need the aliasing load to retain its dependencies relative to other things in the block. If we combine them into one load, it needs to be the aliasing one.

Thanks again,

Hal

 

Notes:

- clang with 'full restrict' has similar behavior as the standard clang.

- the collapsing seems to be happening in 'EarlyCSE'.

 

Greetings,

 

Jeroen Dobbelaere

 

 

 

From: llvm-dev [hidden email] On Behalf Of Neil Henning via llvm-dev
Sent: Wednesday, January 22, 2020 14:02
To: llvm-dev [hidden email]
Subject: [llvm-dev] Inlining + CSE + restrict pointers == funtimes

 

So I've been narrowing down a very fun issue in our Burst compiler stack with respect to noalias support, and I've managed to basically boil this down to the following failure (see https://godbolt.org/z/-mdjPV):

 

int called(int* __restrict__ a, int* b, int* c) {

return *a + *b + *c;

}

 

int foo(int * x, int * y) {

return *x + *y + called(x, x, y);

}

 

int bar(int * x, int * y) {

return called(x, x, y) + *x + *y;

}

 

Which becomes:

 

define dso_local i32 @called(i32* noalias nocapture readonly %0, i32* nocapture readonly %1, i32* nocapture readonly %2) local_unnamed_addr #0 !dbg !7 {

%4 = load i32, i32* %0, align 4, !dbg !19, !tbaa !20

%5 = load i32, i32* %1, align 4, !dbg !24, !tbaa !20

%6 = add nsw i32 %5, %4, !dbg !25

%7 = load i32, i32* %2, align 4, !dbg !26, !tbaa !20

%8 = add nsw i32 %6, %7, !dbg !27

ret i32 %8, !dbg !28

}

 

define dso_local i32 @foo(i32* nocapture readonly %0, i32* nocapture readonly %1) local_unnamed_addr #0 !dbg !29 {

%3 = load i32, i32* %0, align 4, !dbg !36, !tbaa !20

%4 = load i32, i32* %1, align 4, !dbg !37, !tbaa !20

%5 = add i32 %4, %3

%6 = shl i32 %5, 1

%7 = add i32 %6, %3, !dbg !38

ret i32 %7, !dbg !39

}

 

define dso_local i32 @bar(i32* nocapture readonly %0, i32* nocapture readonly %1) local_unnamed_addr #0 !dbg !40 {

%3 = load i32, i32* %0, align 4, !dbg !47, !tbaa !20, !alias.scope !48

%4 = load i32, i32* %1, align 4, !dbg !51, !tbaa !20, !noalias !48

%5 = add i32 %4, %3

%6 = shl i32 %5, 1

%7 = add i32 %6, %3, !dbg !52

ret i32 %7, !dbg !53

}

 

The issue is that CSE just looks at two loads from the same location and goes 'hey I can combine them!' but it doesn't take into account whether either load has extra aliasing information or not. So in foo it has turned a noalias pointer into an aliasing one, but in bar it has turned an aliasing pointer into a non-aliasing one.

 

I'm not sure what the C spec says (if anything) about this, but for us we'd like the behaviour to be defined.

 

Does anyone have any opinions on solving this before I drop a patch?

Should we perhaps make the behaviour to choose (alias over non-alias, or vice versa) controllable via a hidden CSE option?

What should we do in the presence of two conflicting sets of noalias information?

 

Sidenote: I'm aware of the 'full restrict' patch that has been circulated, but irrespective of whether that lands or not we'd still like to have some defined behaviour for the above case.

 

Cheers,

-Neil.

--

Image removed by sender.

Neil Henning

Senior Software Engineer Compiler

 

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Inlining + CSE + restrict pointers == funtimes

Jeremy Morse via llvm-dev

That's indeed a good example that shows that dropping the metadata (A) is the only correct solution.

 

Greetings,

 

Jeroen Dobbelaere

 

 

From: Finkel, Hal J. <[hidden email]>
Sent: Wednesday, January 22, 2020 17:33
To: Jeroen Dobbelaere <[hidden email]>; Neil Henning <[hidden email]>; [hidden email]
Subject: Re: [llvm-dev] Inlining + CSE + restrict pointers == funtimes

 

At a high level, EarlyCSE should be intersecting the metadata of instructions that it combines. If it doesn't, and also doesn't drop the metadata, that seems like a bug, regardless of anything else.

On 1/22/20 9:30 AM, Jeroen Dobbelaere wrote:

[..]

-- (A) dropping the noalias information will result in less optimization opportunities

-- (B) preferring the load with the noalias annotation can allow more reorderings and a better schedule.

  (especially when there would be a 'store' in the context where the restrict is valid; aka in 'called').

 

My preference would go to (B) as that opens up more optimization opportunities.

 

I'm concerned that this isn't sound. So, imagine if I had something like this:

int * restrict r = a;
...
int x = noaliasing ? *r : *a;

then we need the aliasing load to retain its dependencies relative to other things in the block. If we combine them into one load, it needs to be the aliasing one.

Thanks again,

Hal 

[...]

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Inlining + CSE + restrict pointers == funtimes

Jeremy Morse via llvm-dev
Ok I think we have some common ground - CSE should choose the aliased pointer over the non-aliased one because we don't want the no-aliasing information to creep outwards from the inlined callsite.

I'll put together a patch in the coming days and add y'all as reviewers so you get visibility.

Cheers,
-Neil.

On Wed, Jan 22, 2020 at 4:47 PM Jeroen Dobbelaere <[hidden email]> wrote:

That's indeed a good example that shows that dropping the metadata (A) is the only correct solution.

 

Greetings,

 

Jeroen Dobbelaere

 

 

From: Finkel, Hal J. <[hidden email]>
Sent: Wednesday, January 22, 2020 17:33
To: Jeroen Dobbelaere <[hidden email]>; Neil Henning <[hidden email]>; [hidden email]
Subject: Re: [llvm-dev] Inlining + CSE + restrict pointers == funtimes

 

At a high level, EarlyCSE should be intersecting the metadata of instructions that it combines. If it doesn't, and also doesn't drop the metadata, that seems like a bug, regardless of anything else.

On 1/22/20 9:30 AM, Jeroen Dobbelaere wrote:

[..]

-- (A) dropping the noalias information will result in less optimization opportunities

-- (B) preferring the load with the noalias annotation can allow more reorderings and a better schedule.

  (especially when there would be a 'store' in the context where the restrict is valid; aka in 'called').

 

My preference would go to (B) as that opens up more optimization opportunities.

 

I'm concerned that this isn't sound. So, imagine if I had something like this:

int * restrict r = a;
...
int x = noaliasing ? *r : *a;

then we need the aliasing load to retain its dependencies relative to other things in the block. If we combine them into one load, it needs to be the aliasing one.

Thanks again,

Hal 

[...]


--
Neil Henning
Senior Software Engineer Compiler

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Inlining + CSE + restrict pointers == funtimes

Jeremy Morse via llvm-dev

Thanks, Neil. To be clear, the patch should cause the intersect function to be called on the metadata.

 -Hal

On 1/22/20 11:09 AM, Neil Henning wrote:
Ok I think we have some common ground - CSE should choose the aliased pointer over the non-aliased one because we don't want the no-aliasing information to creep outwards from the inlined callsite.

I'll put together a patch in the coming days and add y'all as reviewers so you get visibility.

Cheers,
-Neil.

On Wed, Jan 22, 2020 at 4:47 PM Jeroen Dobbelaere <[hidden email]> wrote:

That's indeed a good example that shows that dropping the metadata (A) is the only correct solution.

 

Greetings,

 

Jeroen Dobbelaere

 

 

From: Finkel, Hal J. <[hidden email]>
Sent: Wednesday, January 22, 2020 17:33
To: Jeroen Dobbelaere <[hidden email]>; Neil Henning <[hidden email]>; [hidden email]
Subject: Re: [llvm-dev] Inlining + CSE + restrict pointers == funtimes

 

At a high level, EarlyCSE should be intersecting the metadata of instructions that it combines. If it doesn't, and also doesn't drop the metadata, that seems like a bug, regardless of anything else.

On 1/22/20 9:30 AM, Jeroen Dobbelaere wrote:

[..]

-- (A) dropping the noalias information will result in less optimization opportunities

-- (B) preferring the load with the noalias annotation can allow more reorderings and a better schedule.

  (especially when there would be a 'store' in the context where the restrict is valid; aka in 'called').

 

My preference would go to (B) as that opens up more optimization opportunities.

 

I'm concerned that this isn't sound. So, imagine if I had something like this:

int * restrict r = a;
...
int x = noaliasing ? *r : *a;

then we need the aliasing load to retain its dependencies relative to other things in the block. If we combine them into one load, it needs to be the aliasing one.

Thanks again,

Hal 

[...]


--
Neil Henning
Senior Software Engineer Compiler
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Inlining + CSE + restrict pointers == funtimes

Jeremy Morse via llvm-dev
I've put the change up here https://reviews.llvm.org/D73342 for those interested.

Cheers,
-Neil.

On Wed, Jan 22, 2020 at 5:13 PM Finkel, Hal J. <[hidden email]> wrote:

Thanks, Neil. To be clear, the patch should cause the intersect function to be called on the metadata.

 -Hal

On 1/22/20 11:09 AM, Neil Henning wrote:
Ok I think we have some common ground - CSE should choose the aliased pointer over the non-aliased one because we don't want the no-aliasing information to creep outwards from the inlined callsite.

I'll put together a patch in the coming days and add y'all as reviewers so you get visibility.

Cheers,
-Neil.

On Wed, Jan 22, 2020 at 4:47 PM Jeroen Dobbelaere <[hidden email]> wrote:

That's indeed a good example that shows that dropping the metadata (A) is the only correct solution.

 

Greetings,

 

Jeroen Dobbelaere

 

 

From: Finkel, Hal J. <[hidden email]>
Sent: Wednesday, January 22, 2020 17:33
To: Jeroen Dobbelaere <[hidden email]>; Neil Henning <[hidden email]>; [hidden email]
Subject: Re: [llvm-dev] Inlining + CSE + restrict pointers == funtimes

 

At a high level, EarlyCSE should be intersecting the metadata of instructions that it combines. If it doesn't, and also doesn't drop the metadata, that seems like a bug, regardless of anything else.

On 1/22/20 9:30 AM, Jeroen Dobbelaere wrote:

[..]

-- (A) dropping the noalias information will result in less optimization opportunities

-- (B) preferring the load with the noalias annotation can allow more reorderings and a better schedule.

  (especially when there would be a 'store' in the context where the restrict is valid; aka in 'called').

 

My preference would go to (B) as that opens up more optimization opportunities.

 

I'm concerned that this isn't sound. So, imagine if I had something like this:

int * restrict r = a;
...
int x = noaliasing ? *r : *a;

then we need the aliasing load to retain its dependencies relative to other things in the block. If we combine them into one load, it needs to be the aliasing one.

Thanks again,

Hal 

[...]


--
Neil Henning
Senior Software Engineer Compiler
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory


--
Neil Henning
Senior Software Engineer Compiler

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev