[llvm-dev] more reassociation in IR

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] more reassociation in IR

Muhui Jiang via llvm-dev
There are at least 3 active proposals to add reassociative optimizations in IR:
[1] D41574 - a new pass for reassociation/factoring
[2] D46336 - enhance -instcombine to do more reassociation/factoring
[3] D45842 - add to the existing -reassociate pass to enable factoring

Here's a basic motivating example that we fail to optimize:
https://godbolt.org/g/64NmJM
https://rise4fun.com/Alive/wec

Currently, responsibility for reassociative transforms is split between -reassociate and -instcombine. I don't know if that split is intentional or just how the code evolved. Debug stats for test-suite compiled with -O3 show:
78K "reassociations" by -instcombine.
58K "reassociated" by -reassociate

A debug stat with D45842 showed that that transform fired 1.3K times.

Keep in mind that instcombine runs 1st, runs to fixed-point, and runs 8 times in the -O2 pipeline. Reassociate runs 1 time and doesn't run to fixed-point. Some transforms are unique to 1 pass or the other, but there is some overlapping functionality already there.

So the questions are:
1. How do we solve these remaining reassociation optimizations?
2. Do we want to consolidate existing reassociation transforms in 1 pass or is there value in splitting the folds across multiple passes?




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] more reassociation in IR

Muhui Jiang via llvm-dev
1.  The reassociate pass that exists right now was *originally* (AFAIK) written to enable CSE/GVN to do better. That particular issue is solvable in other ways, because there are good ways to integrate reassociation into CSE/GVN (and at this point, it would probably be cheaper than -reassociate since it would modify code less, only changing it when there are actual redundancies )

I don't know what other things people are trying to target with reassociate these days, it would be good to understand what others see as the use cases.

My only other fear with removing it is that we have a tendency to try to make every optimization do everything, so i fear removing reassociate might just have people try to improve every pass to do reassociation instead of declaring what is good enough.

(For example, GVN already does some reassociation, as do some analysis. Other analysis relies more on a canonical form existing and don't, for example, try to perform commutativity on their own.  IMHO, we should just declare one or the other to be the case, and hold that line, rather than generate a mishmash)


2. "Keep in mind that instcombine runs 1st, runs to fixed-point, and runs 8 times in the -O2 pipeline."

Y'all know how i feel about this one, so i'll leave it alone :)



--Dan

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] more reassociation in IR

Muhui Jiang via llvm-dev
In reply to this post by Muhui Jiang via llvm-dev
(
​I came across this issue in the context of
 D46336.
​ ​
Thanks, Sanjay, for starting this discussion.)

If
​we will
 move
​reassociation, 
or keep additional ones
​,​
out of instcombine,
​open questions for me would be
​​:


1. Since -reassociate isn't a fixed point pass, we might need to repeat "-instcombine -reassociate" multiple times to
​fold
 down to what we want (relating to my comment here). I assumed this isn't not what we want to do
​? My impression is we don't do a fixed-point with passes?


2.
​Since -reassociate needs to come up with one operand order (at least currently as the only reassociate pass), would there exist a single, unique operand order that would enable all reassociative/commutative foldings that we want? If not, could there be conflicts among different reassociation orders depending on what folding we want?



On Tue, May 8, 2018 at 9:19 AM Sanjay Patel <[hidden email]> wrote:
There are at least 3 active proposals to add reassociative optimizations in IR:
[1] D41574 - a new pass for reassociation/factoring
[2] D46336 - enhance -instcombine to do more reassociation/factoring
[3] D45842 - add to the existing -reassociate pass to enable factoring

Here's a basic motivating example that we fail to optimize:
https://godbolt.org/g/64NmJM
https://rise4fun.com/Alive/wec

Currently, responsibility for reassociative transforms is split between -reassociate and -instcombine. I don't know if that split is intentional or just how the code evolved. Debug stats for test-suite compiled with -O3 show:
78K "reassociations" by -instcombine.
58K "reassociated" by -reassociate

A debug stat with D45842 showed that that transform fired 1.3K times.

Keep in mind that instcombine runs 1st, runs to fixed-point, and runs 8 times in the -O2 pipeline. Reassociate runs 1 time and doesn't run to fixed-point. Some transforms are unique to 1 pass or the other, but there is some overlapping functionality already there.

So the questions are:
1. How do we solve these remaining reassociation optimizations?
2. Do we want to consolidate existing reassociation transforms in 1 pass or is there value in splitting the folds across multiple passes?




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] more reassociation in IR

Muhui Jiang via llvm-dev


On Tue, May 8, 2018 at 10:38 AM, Hiroshi Yamauchi via llvm-dev <[hidden email]> wrote:
(
​I came across this issue in the context of
 D46336.
​ ​
Thanks, Sanjay, for starting this discussion.)

If
​we will
 move
​reassociation, 
or keep additional ones
​,​
out of instcombine,
​open questions for me would be
​​:


1. Since -reassociate isn't a fixed point pass,

This is fixable, fwiw, without fixpointing it.
 
we might need to repeat "-instcombine -reassociate" multiple times to
​fold
 down to what we want (relating to my comment here). I assumed this isn't not what we want to do
​? My impression is we don't do a fixed-point with passes?

Well, i mean there is no practical difference between passes that we fixpoint externally and fixpoint internally.
 
 
2.
​Since -reassociate needs to come up with one operand order (at least currently as the only reassociate pass), would there exist a single, unique operand order that would enable all reassociative/commutative foldings that we want?

In what way?
Are you asking whether there is a single reassociation order that makes all foldings occur in the same operation or something?
I don't feel like i understand what you are asking.
 
If not, could there be conflicts among different reassociation orders depending on what folding we want?

We actually do try to have a single canonical form at various points in the compiler. 

We may want to canonicalize differently in different parts of the compiler, but it would be nice to at least be self consistent. 


On Tue, May 8, 2018 at 9:19 AM Sanjay Patel <[hidden email]> wrote:
There are at least 3 active proposals to add reassociative optimizations in IR:
[1] D41574 - a new pass for reassociation/factoring
[2] D46336 - enhance -instcombine to do more reassociation/factoring
[3] D45842 - add to the existing -reassociate pass to enable factoring

Here's a basic motivating example that we fail to optimize:
https://godbolt.org/g/64NmJM
https://rise4fun.com/Alive/wec

Currently, responsibility for reassociative transforms is split between -reassociate and -instcombine. I don't know if that split is intentional or just how the code evolved. Debug stats for test-suite compiled with -O3 show:
78K "reassociations" by -instcombine.
58K "reassociated" by -reassociate

A debug stat with D45842 showed that that transform fired 1.3K times.

Keep in mind that instcombine runs 1st, runs to fixed-point, and runs 8 times in the -O2 pipeline. Reassociate runs 1 time and doesn't run to fixed-point. Some transforms are unique to 1 pass or the other, but there is some overlapping functionality already there.

So the questions are:
1. How do we solve these remaining reassociation optimizations?
2. Do we want to consolidate existing reassociation transforms in 1 pass or is there value in splitting the folds across multiple passes?




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] more reassociation in IR

Muhui Jiang via llvm-dev
In reply to this post by Muhui Jiang via llvm-dev


On May 8, 2018, at 9:50 AM, Daniel Berlin via llvm-dev <[hidden email]> wrote:

1.  The reassociate pass that exists right now was *originally* (AFAIK) written to enable CSE/GVN to do better.

Agreed.  The original mindset included a (naive) belief that going with a canonical form was better than teaching redundancy elimination to handle abstractions (as a matter of separation of concerns).  I think that that was a failed experiment at this point :-).  We should still canonicalize x*x*x into powi, but the more aggressive canonicalization (like distribution) shouldn’t be used IMO.

-Chris



That particular issue is solvable in other ways, because there are good ways to integrate reassociation into CSE/GVN (and at this point, it would probably be cheaper than -reassociate since it would modify code less, only changing it when there are actual redundancies )

I don't know what other things people are trying to target with reassociate these days, it would be good to understand what others see as the use cases.

My only other fear with removing it is that we have a tendency to try to make every optimization do everything, so i fear removing reassociate might just have people try to improve every pass to do reassociation instead of declaring what is good enough.

(For example, GVN already does some reassociation, as do some analysis. Other analysis relies more on a canonical form existing and don't, for example, try to perform commutativity on their own.  IMHO, we should just declare one or the other to be the case, and hold that line, rather than generate a mishmash)


2. "Keep in mind that instcombine runs 1st, runs to fixed-point, and runs 8 times in the -O2 pipeline."

Y'all know how i feel about this one, so i'll leave it alone :)



--Dan
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] more reassociation in IR

Muhui Jiang via llvm-dev
My 5 cent, since i've been recently working on instcombine a bit.

First of all, I do acknowledge and recognize the problem at hand -
the IR is not getting fully optimized. I think that should be solved.

As it is evident, LLVM is quite modular, there are separate passes.
This is good. It allows to test things separately. Unfortunately, that
is also bad, because there are always these things that fall through
the cracks between passes...

instcombine is rather huge as is. Almost "all size fits all".
It tries to do everything. Sometimes it does wonders, sometimes not.
It is not modular. The order of folds is completely hardcoded,
not documented, with no obvious logic to it. And only the full and
final IR produced by the whole pass is testable.

Personally, i think this is bad. I'd prefer something else, like having
*separate* folds within instcombine, each tested on it's own,
(much like clang-tidy checks vs monolithic clang diagnostics)
and then somehow combined, not unlike the usual optimization pipelines.

The second problem is the size/complexity of the instcombine pass.
I do acknowledge that D46336 / D46595 (both in instcombine) is an
improvement, which results in more folds that didn't happen before.
But it makes the already-large inscombine even more complex,
and i'm not sure how it affects the performance.

Then there is D45842.
That code is pretty simple, and takes ~one page. It does not do
any change unless that opens the road for further simplification.
(I hope that is what the D46336 / D46595 do, but i don't know.)

As stated in https://reviews.llvm.org/D46336#1090588, the D45842
does allow instcombine to do //most// of the folds that would D46336 do.

But there is an open problem - several consecutive runs of
instcombine and reassociate passes are needed. So basically we'd need
to have a umbrella-pass which would run -instcombine -reassociate until
they stop making changes.

TLDR: monolithic code is bad. instcombine is monolithic, bloated.
The problem at hand can be solved by a separate pass, which would
reduce(*) the size of instcombine, provided there is a way to have
such an umbrella-pass to combine them.
* I do think that we should not only not do that instcombine, but check what
  else is solved by the separate reassociate pass (not talking about
  commutative matchers here), and consider removing that.

Roman.

On Wed, May 9, 2018 at 6:22 AM, Chris Lattner via llvm-dev
<[hidden email]> wrote:

>
>
> On May 8, 2018, at 9:50 AM, Daniel Berlin via llvm-dev
> <[hidden email]> wrote:
>
> 1.  The reassociate pass that exists right now was *originally* (AFAIK)
> written to enable CSE/GVN to do better.
>
>
> Agreed.  The original mindset included a (naive) belief that going with a
> canonical form was better than teaching redundancy elimination to handle
> abstractions (as a matter of separation of concerns).  I think that that was
> a failed experiment at this point :-).  We should still canonicalize x*x*x
> into powi, but the more aggressive canonicalization (like distribution)
> shouldn’t be used IMO.
>
> -Chris
>
>
>
> That particular issue is solvable in other ways, because there are good ways
> to integrate reassociation into CSE/GVN (and at this point, it would
> probably be cheaper than -reassociate since it would modify code less, only
> changing it when there are actual redundancies )
>
> I don't know what other things people are trying to target with reassociate
> these days, it would be good to understand what others see as the use cases.
>
> My only other fear with removing it is that we have a tendency to try to
> make every optimization do everything, so i fear removing reassociate might
> just have people try to improve every pass to do reassociation instead of
> declaring what is good enough.
>
> (For example, GVN already does some reassociation, as do some analysis.
> Other analysis relies more on a canonical form existing and don't, for
> example, try to perform commutativity on their own.  IMHO, we should just
> declare one or the other to be the case, and hold that line, rather than
> generate a mishmash)
>
>
> 2. "Keep in mind that instcombine runs 1st, runs to fixed-point, and runs 8
> times in the -O2 pipeline."
>
> Y'all know how i feel about this one, so i'll leave it alone :)
>
>
>
> --Dan
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] more reassociation in IR

Muhui Jiang via llvm-dev
In reply to this post by Muhui Jiang via llvm-dev
When you say that distribution shouldn't be used, do you mean within instcombine rather than some other pass? Or not all as an IR optimization?

A dedicated optimization pass that looks for and makes factoring/distribution folds to eliminate instructions seems like it would solve the problems that I'm seeing.
Ie, I'm leaning towards the proposal here: https://reviews.llvm.org/D41574
But I'd make it less "weak". :)

Ideally, that allows removing duplicate functionality from instcombine, but I'm guessing that it will take a lot of work to cut that cord without causing regressions. Note that instcombine even has a specialized reassociation pass-within-a-pass for fadd/fsub alone:
https://reviews.llvm.org/rL170471

On Tue, May 8, 2018 at 9:22 PM, Chris Lattner <[hidden email]> wrote:


On May 8, 2018, at 9:50 AM, Daniel Berlin via llvm-dev <[hidden email]> wrote:

1.  The reassociate pass that exists right now was *originally* (AFAIK) written to enable CSE/GVN to do better.

Agreed.  The original mindset included a (naive) belief that going with a canonical form was better than teaching redundancy elimination to handle abstractions (as a matter of separation of concerns).  I think that that was a failed experiment at this point :-).  We should still canonicalize x*x*x into powi, but the more aggressive canonicalization (like distribution) shouldn’t be used IMO.

-Chris



That particular issue is solvable in other ways, because there are good ways to integrate reassociation into CSE/GVN (and at this point, it would probably be cheaper than -reassociate since it would modify code less, only changing it when there are actual redundancies )

I don't know what other things people are trying to target with reassociate these days, it would be good to understand what others see as the use cases.

My only other fear with removing it is that we have a tendency to try to make every optimization do everything, so i fear removing reassociate might just have people try to improve every pass to do reassociation instead of declaring what is good enough.

(For example, GVN already does some reassociation, as do some analysis. Other analysis relies more on a canonical form existing and don't, for example, try to perform commutativity on their own.  IMHO, we should just declare one or the other to be the case, and hold that line, rather than generate a mishmash)


2. "Keep in mind that instcombine runs 1st, runs to fixed-point, and runs 8 times in the -O2 pipeline."

Y'all know how i feel about this one, so i'll leave it alone :)



--Dan
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] more reassociation in IR

Muhui Jiang via llvm-dev
In reply to this post by Muhui Jiang via llvm-dev


On Tue, May 8, 2018 at 11:15 AM Daniel Berlin <[hidden email]> wrote:


On Tue, May 8, 2018 at 10:38 AM, Hiroshi Yamauchi via llvm-dev <[hidden email]> wrote:
(
​I came across this issue in the context of
 D46336.
​ ​
Thanks, Sanjay, for starting this discussion.)

If
​we will
 move
​reassociation, 
or keep additional ones
​,​
out of instcombine,
​open questions for me would be
​​:


1. Since -reassociate isn't a fixed point pass,

This is fixable, fwiw, without fixpointing it.

How?

 
we might need to repeat "-instcombine -reassociate" multiple times to
​fold
 down to what we want (relating to my comment here). I assumed this isn't not what we want to do
​? My impression is we don't do a fixed-point with passes?

Well, i mean there is no practical difference between passes that we fixpoint externally and fixpoint internally.
​​
I had the following in mind: Does the pass manager support fixpointing externally? Is there any performance difference? Are people okay with that in general?

But if there is no practical difference, I don't see any problem with that :)

 
 
2.
​Since -reassociate needs to come up with one operand order (at least currently as the only reassociate pass), would there exist a single, unique operand order that would enable all reassociative/commutative foldings that we want?

In what way?
Are you asking whether there is a single reassociation order that makes all foldings occur in the same operation or something?
I don't feel like i understand what you are asking.

Does this rephrase help: with the motivating examples (like and-of-shifts or bit check patterns) from the above differentials in mind, can we come up with a single reassociation order that solves all those and all the others that may come up in the future? Would we need different reassociation orders to fold different patterns?
 
 
If not, could there be conflicts among different reassociation orders depending on what folding we want?

We actually do try to have a single canonical form at various points in the compiler. 

We may want to canonicalize differently in different parts of the compiler, but it would be nice to at least be self consistent. 


On Tue, May 8, 2018 at 9:19 AM Sanjay Patel <[hidden email]> wrote:
There are at least 3 active proposals to add reassociative optimizations in IR:
[1] D41574 - a new pass for reassociation/factoring
[2] D46336 - enhance -instcombine to do more reassociation/factoring
[3] D45842 - add to the existing -reassociate pass to enable factoring

Here's a basic motivating example that we fail to optimize:
https://godbolt.org/g/64NmJM
https://rise4fun.com/Alive/wec

Currently, responsibility for reassociative transforms is split between -reassociate and -instcombine. I don't know if that split is intentional or just how the code evolved. Debug stats for test-suite compiled with -O3 show:
78K "reassociations" by -instcombine.
58K "reassociated" by -reassociate

A debug stat with D45842 showed that that transform fired 1.3K times.

Keep in mind that instcombine runs 1st, runs to fixed-point, and runs 8 times in the -O2 pipeline. Reassociate runs 1 time and doesn't run to fixed-point. Some transforms are unique to 1 pass or the other, but there is some overlapping functionality already there.

So the questions are:
1. How do we solve these remaining reassociation optimizations?
2. Do we want to consolidate existing reassociation transforms in 1 pass or is there value in splitting the folds across multiple passes?




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] more reassociation in IR

Muhui Jiang via llvm-dev
In reply to this post by Muhui Jiang via llvm-dev

As stated in https://reviews.llvm.org/D46336#1090588, the D45842
does allow instcombine to do //most// of the folds that would D46336 do.

​The (large) 4th test output differs this way: ​https://reviews.llvm.org/D46336#1093161

It may be due to the same underlying issue with the first test.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] more reassociation in IR

Muhui Jiang via llvm-dev
In reply to this post by Muhui Jiang via llvm-dev


On Wed, May 9, 2018 at 10:39 AM, Hiroshi Yamauchi <[hidden email]> wrote:


On Tue, May 8, 2018 at 11:15 AM Daniel Berlin <[hidden email]> wrote:


On Tue, May 8, 2018 at 10:38 AM, Hiroshi Yamauchi via llvm-dev <[hidden email]> wrote:
(
​I came across this issue in the context of
 D46336.
​ ​
Thanks, Sanjay, for starting this discussion.)

If
​we will
 move
​reassociation, 
or keep additional ones
​,​
out of instcombine,
​open questions for me would be
​​:


1. Since -reassociate isn't a fixed point pass,

This is fixable, fwiw, without fixpointing it.

How?

Depends on specifically which part you would like to know about ;)
 

 
we might need to repeat "-instcombine -reassociate" multiple times to
​fold
 down to what we want (relating to my comment here). I assumed this isn't not what we want to do
​? My impression is we don't do a fixed-point with passes?

Well, i mean there is no practical difference between passes that we fixpoint externally and fixpoint internally.
​​
I had the following in mind: Does the pass manager support fixpointing externally? Is there any performance difference? Are people okay with that in general?

But if there is no practical difference, I don't see any problem with that :)

 
 
2.
​Since -reassociate needs to come up with one operand order (at least currently as the only reassociate pass), would there exist a single, unique operand order that would enable all reassociative/commutative foldings that we want?

In what way?
Are you asking whether there is a single reassociation order that makes all foldings occur in the same operation or something?
I don't feel like i understand what you are asking.

Does this rephrase help: with the motivating examples (like and-of-shifts or bit check patterns) from the above differentials in mind, can we come up with a single reassociation order that solves all those and all the others that may come up in the future? Would we need different reassociation orders to fold different patterns?

It doesn't quite help.
When stated that generally, there can be no such ordering at all, that's easy to prove.  It is a statically undecidable problem.

There is however, a different question and answer to a few related problems that maybe you are really asking?
1. Is there a way to determine and apply the a maximal or nearly-maximal set of folds/graph transforms that could be applied to a given set of code in a sane and principled way -> yes


2. Is there a way to determine all expressions in the program as it exists that are equivalent or equivalent under constant time constant folding/reassociation, in a reasonable time bound -> yes

(not a single easy link, happy to talk about it)

Your original question is basically equivalent to
Is there a way to determine all expressions in the program as it exists that are equivalent or could be made equivalent through any type of folding that one can think up?
The answer to that is "no", it's provable that this is not statically decidable, so the time bound doesn't matter :)

You have to limit the possible folding/evaluation you apply in various ways to make this decidable, and then further limit it to make the time bound reasonable.

This all quickly devolves into herbrand equivalence and it's variations.



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] more reassociation in IR

Muhui Jiang via llvm-dev
In reply to this post by Muhui Jiang via llvm-dev

Here are my thoughts on the issue at hand:

As far as I understand the current reassociate pass is a rather niche pass, not intended for extensive use. It also has quite a few weaknesses, the most prominent one is its aggressive re-arrangement of the topology of the associative operator tree into "linear tree", e.g. for addition you’ll get a_0 + (a_1 + (a_2 + ... (a_n-1 + a_n))...) regardless of the topology you started with.

Due to that, and because reassociations of all kinds and types are sometimes a key component of a transformation (which is either missing from an InstCombine transformation or unnaturally added to it), these reassociations, as of today, don’t have a “home”. For example, when I suggested https://reviews.llvm.org/D41574, the “Weak reassociation pass”, it was to complement a transformation I’m adding to InstCombine – My transformation involved a reassociation of the form (a_0 + a_1) + (a_2 + a_3) -> (a_0 + a_2) + (a_1 + a_3) which can’t be expressed in the existing reassociate pass, so I had to write a new pass. That is, amongst the rest, why It’s called “weak” - it doesn’t change the topology of the associative operator tree.

I believe that a new reassociation pass that will complement InstCombine and will run with it under a “parent” pass is the way to go. However, there is clearly much work involved in such a process.

 

Omer

 

From: Sanjay Patel [mailto:[hidden email]]
Sent: Wednesday, May 09, 2018 16:09
To: Chris Lattner <[hidden email]>
Cc: Daniel Berlin <[hidden email]>; llvm-dev <[hidden email]>; Paparo Bivas, Omer <[hidden email]>
Subject: Re: [llvm-dev] more reassociation in IR

 

When you say that distribution shouldn't be used, do you mean within instcombine rather than some other pass? Or not all as an IR optimization?

A dedicated optimization pass that looks for and makes factoring/distribution folds to eliminate instructions seems like it would solve the problems that I'm seeing.
Ie, I'm leaning towards the proposal here: https://reviews.llvm.org/D41574

But I'd make it less "weak". :)

Ideally, that allows removing duplicate functionality from instcombine, but I'm guessing that it will take a lot of work to cut that cord without causing regressions. Note that instcombine even has a specialized reassociation pass-within-a-pass for fadd/fsub alone:
https://reviews.llvm.org/rL170471

 

On Tue, May 8, 2018 at 9:22 PM, Chris Lattner <[hidden email]> wrote:

 



On May 8, 2018, at 9:50 AM, Daniel Berlin via llvm-dev <[hidden email]> wrote:

 

1.  The reassociate pass that exists right now was *originally* (AFAIK) written to enable CSE/GVN to do better.

 

Agreed.  The original mindset included a (naive) belief that going with a canonical form was better than teaching redundancy elimination to handle abstractions (as a matter of separation of concerns).  I think that that was a failed experiment at this point :-).  We should still canonicalize x*x*x into powi, but the more aggressive canonicalization (like distribution) shouldn’t be used IMO.

 

-Chris

 

 



That particular issue is solvable in other ways, because there are good ways to integrate reassociation into CSE/GVN (and at this point, it would probably be cheaper than -reassociate since it would modify code less, only changing it when there are actual redundancies )

 

I don't know what other things people are trying to target with reassociate these days, it would be good to understand what others see as the use cases.

 

My only other fear with removing it is that we have a tendency to try to make every optimization do everything, so i fear removing reassociate might just have people try to improve every pass to do reassociation instead of declaring what is good enough.

 

(For example, GVN already does some reassociation, as do some analysis. Other analysis relies more on a canonical form existing and don't, for example, try to perform commutativity on their own.  IMHO, we should just declare one or the other to be the case, and hold that line, rather than generate a mishmash)

 

 

2. "Keep in mind that instcombine runs 1st, runs to fixed-point, and runs 8 times in the -O2 pipeline."

 

Y'all know how i feel about this one, so i'll leave it alone :)

 

 

 

--Dan

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

 

 

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] more reassociation in IR

Muhui Jiang via llvm-dev
In reply to this post by Muhui Jiang via llvm-dev


On Wed, May 9, 2018 at 8:24 PM Daniel Berlin <[hidden email]> wrote:


On Wed, May 9, 2018 at 10:39 AM, Hiroshi Yamauchi <[hidden email]> wrote:


On Tue, May 8, 2018 at 11:15 AM Daniel Berlin <[hidden email]> wrote:


On Tue, May 8, 2018 at 10:38 AM, Hiroshi Yamauchi via llvm-dev <[hidden email]> wrote:
(
​I came across this issue in the context of
 D46336.
​ ​
Thanks, Sanjay, for starting this discussion.)

If
​we will
 move
​reassociation, 
or keep additional ones
​,​
out of instcombine,
​open questions for me would be
​​:


1. Since -reassociate isn't a fixed point pass,

This is fixable, fwiw, without fixpointing it.

How?

Depends on specifically which part you would like to know about ;)

​Maybe I misunderstood what you meant by "This is fixable". Did you mean that we won't somehow need to fixpoint between instcombine and reassociate, or that the specific motivating examples from the above differentials are foldable without fixpointing?

If the latter, that may be the case. The concern was that we may encounter examples that may need many more iterations, if not fixpointing. As long as it's feasible to fixpoint between instcombine and reassociate, it seems to work, but I guess that would probably need some pass management change.

 

 
we might need to repeat "-instcombine -reassociate" multiple times to
​fold
 down to what we want (relating to my comment here). I assumed this isn't not what we want to do
​? My impression is we don't do a fixed-point with passes?

Well, i mean there is no practical difference between passes that we fixpoint externally and fixpoint internally.
​​
I had the following in mind: Does the pass manager support fixpointing externally? Is there any performance difference? Are people okay with that in general?

But if there is no practical difference, I don't see any problem with that :)

 
 
2.
​Since -reassociate needs to come up with one operand order (at least currently as the only reassociate pass), would there exist a single, unique operand order that would enable all reassociative/commutative foldings that we want?

In what way?
Are you asking whether there is a single reassociation order that makes all foldings occur in the same operation or something?
I don't feel like i understand what you are asking.

Does this rephrase help: with the motivating examples (like and-of-shifts or bit check patterns) from the above differentials in mind, can we come up with a single reassociation order that solves all those and all the others that may come up in the future? Would we need different reassociation orders to fold different patterns?

It doesn't quite help.
When stated that generally, there can be no such ordering at all, that's easy to prove.  It is a statically undecidable problem.

There is however, a different question and answer to a few related problems that maybe you are really asking?
1. Is there a way to determine and apply the a maximal or nearly-maximal set of folds/graph transforms that could be applied to a given set of code in a sane and principled way -> yes


2. Is there a way to determine all expressions in the program as it exists that are equivalent or equivalent under constant time constant folding/reassociation, in a reasonable time bound -> yes

(not a single easy link, happy to talk about it)

Your original question is basically equivalent to
Is there a way to determine all expressions in the program as it exists that are equivalent or could be made equivalent through any type of folding that one can think up?
The answer to that is "no", it's provable that this is not statically decidable, so the time bound doesn't matter :)

You have to limit the possible folding/evaluation you apply in various ways to make this decidable, and then further limit it to make the time bound reasonable.

This all quickly devolves into herbrand equivalence and it's variations.


Let me try one more time :) May we need multiple reassociate passes to fold different reassociative patterns?

A longer version: If Sanjay wants a particular reassociative pattern to be folded (D45842), Omer wants another particular reassociative pattern to be folded (D41574), and I want yet another particular reassociative pattern to be folded (D46336), would we potentially need three different reassociate passes with each combined with instcombine, rather than just one that may be able to somehow handle those cases in one shot, (assuming we don't want to put those in instcombine)?

And it sounds like the answer is yes?



 

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] more reassociation in IR

Muhui Jiang via llvm-dev


On Thu, May 10, 2018 at 12:05 PM, Hiroshi Yamauchi <[hidden email]> wrote:


On Wed, May 9, 2018 at 8:24 PM Daniel Berlin <[hidden email]> wrote:


On Wed, May 9, 2018 at 10:39 AM, Hiroshi Yamauchi <[hidden email]> wrote:


On Tue, May 8, 2018 at 11:15 AM Daniel Berlin <[hidden email]> wrote:


On Tue, May 8, 2018 at 10:38 AM, Hiroshi Yamauchi via llvm-dev <[hidden email]> wrote:
(
​I came across this issue in the context of
 D46336.
​ ​
Thanks, Sanjay, for starting this discussion.)

If
​we will
 move
​reassociation, 
or keep additional ones
​,​
out of instcombine,
​open questions for me would be
​​:


1. Since -reassociate isn't a fixed point pass,

This is fixable, fwiw, without fixpointing it.

How?

Depends on specifically which part you would like to know about ;)

​Maybe I misunderstood what you meant by "This is fixable". Did you mean that we won't somehow need to fixpoint between instcombine and reassociate, or that the specific motivating examples from the above differentials are foldable without fixpointing?

If by fixpointing you mean "fixpointing reassociate and instcombine", then yes, that is fixable without fixpointing reassociate and instcombine, but would require rewriting instcombine :)
 

If the latter, that may be the case. The concern was that we may encounter examples that may need many more iterations, if not fixpointing. As long as it's feasible to fixpoint between instcombine and reassociate, it seems to work, but I guess that would probably need some pass management change.

 

 
we might need to repeat "-instcombine -reassociate" multiple times to
​fold
 down to what we want (relating to my comment here). I assumed this isn't not what we want to do
​? My impression is we don't do a fixed-point with passes?

Well, i mean there is no practical difference between passes that we fixpoint externally and fixpoint internally.
​​
I had the following in mind: Does the pass manager support fixpointing externally? Is there any performance difference? Are people okay with that in general?

But if there is no practical difference, I don't see any problem with that :)

 
 
2.
​Since -reassociate needs to come up with one operand order (at least currently as the only reassociate pass), would there exist a single, unique operand order that would enable all reassociative/commutative foldings that we want?

In what way?
Are you asking whether there is a single reassociation order that makes all foldings occur in the same operation or something?
I don't feel like i understand what you are asking.

Does this rephrase help: with the motivating examples (like and-of-shifts or bit check patterns) from the above differentials in mind, can we come up with a single reassociation order that solves all those and all the others that may come up in the future? Would we need different reassociation orders to fold different patterns?

It doesn't quite help.
When stated that generally, there can be no such ordering at all, that's easy to prove.  It is a statically undecidable problem.

There is however, a different question and answer to a few related problems that maybe you are really asking?
1. Is there a way to determine and apply the a maximal or nearly-maximal set of folds/graph transforms that could be applied to a given set of code in a sane and principled way -> yes


2. Is there a way to determine all expressions in the program as it exists that are equivalent or equivalent under constant time constant folding/reassociation, in a reasonable time bound -> yes

(not a single easy link, happy to talk about it)

Your original question is basically equivalent to
Is there a way to determine all expressions in the program as it exists that are equivalent or could be made equivalent through any type of folding that one can think up?
The answer to that is "no", it's provable that this is not statically decidable, so the time bound doesn't matter :)

You have to limit the possible folding/evaluation you apply in various ways to make this decidable, and then further limit it to make the time bound reasonable.

This all quickly devolves into herbrand equivalence and it's variations.


Let me try one more time :) May we need multiple reassociate passes to fold different reassociative patterns?

A longer version: If Sanjay wants a particular reassociative pattern to be folded (D45842), Omer wants another particular reassociative pattern to be folded (D41574), and I want yet another particular reassociative pattern to be folded (D46336), would we potentially need three different reassociate passes with each combined with instcombine, rather than just one that may be able to somehow handle those cases in one shot, (assuming we don't want to put those in instcombine)?

And it sounds like the answer is yes?

If you take the current instcombine as a base, then yes, that is correct.
If you are willing to rearchitect instcombine, the answer is no, it's possible to do this all in a single pass in a relatively sane way.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] more reassociation in IR

Muhui Jiang via llvm-dev


On Thu, May 10, 2018 at 12:49 PM Daniel Berlin <[hidden email]> wrote:


On Thu, May 10, 2018 at 12:05 PM, Hiroshi Yamauchi <[hidden email]> wrote:


On Wed, May 9, 2018 at 8:24 PM Daniel Berlin <[hidden email]> wrote:


On Wed, May 9, 2018 at 10:39 AM, Hiroshi Yamauchi <[hidden email]> wrote:


On Tue, May 8, 2018 at 11:15 AM Daniel Berlin <[hidden email]> wrote:


On Tue, May 8, 2018 at 10:38 AM, Hiroshi Yamauchi via llvm-dev <[hidden email]> wrote:
(
​I came across this issue in the context of
 D46336.
​ ​
Thanks, Sanjay, for starting this discussion.)

If
​we will
 move
​reassociation, 
or keep additional ones
​,​
out of instcombine,
​open questions for me would be
​​:


1. Since -reassociate isn't a fixed point pass,

This is fixable, fwiw, without fixpointing it.

How?

Depends on specifically which part you would like to know about ;)

​Maybe I misunderstood what you meant by "This is fixable". Did you mean that we won't somehow need to fixpoint between instcombine and reassociate, or that the specific motivating examples from the above differentials are foldable without fixpointing?

If by fixpointing you mean "fixpointing reassociate and instcombine", then yes, that is fixable without fixpointing reassociate and instcombine, but would require rewriting instcombine :)
 

If the latter, that may be the case. The concern was that we may encounter examples that may need many more iterations, if not fixpointing. As long as it's feasible to fixpoint between instcombine and reassociate, it seems to work, but I guess that would probably need some pass management change.

 

 
we might need to repeat "-instcombine -reassociate" multiple times to
​fold
 down to what we want (relating to my comment here). I assumed this isn't not what we want to do
​? My impression is we don't do a fixed-point with passes?

Well, i mean there is no practical difference between passes that we fixpoint externally and fixpoint internally.
​​
I had the following in mind: Does the pass manager support fixpointing externally? Is there any performance difference? Are people okay with that in general?

But if there is no practical difference, I don't see any problem with that :)

 
 
2.
​Since -reassociate needs to come up with one operand order (at least currently as the only reassociate pass), would there exist a single, unique operand order that would enable all reassociative/commutative foldings that we want?

In what way?
Are you asking whether there is a single reassociation order that makes all foldings occur in the same operation or something?
I don't feel like i understand what you are asking.

Does this rephrase help: with the motivating examples (like and-of-shifts or bit check patterns) from the above differentials in mind, can we come up with a single reassociation order that solves all those and all the others that may come up in the future? Would we need different reassociation orders to fold different patterns?

It doesn't quite help.
When stated that generally, there can be no such ordering at all, that's easy to prove.  It is a statically undecidable problem.

There is however, a different question and answer to a few related problems that maybe you are really asking?
1. Is there a way to determine and apply the a maximal or nearly-maximal set of folds/graph transforms that could be applied to a given set of code in a sane and principled way -> yes


2. Is there a way to determine all expressions in the program as it exists that are equivalent or equivalent under constant time constant folding/reassociation, in a reasonable time bound -> yes

(not a single easy link, happy to talk about it)

Your original question is basically equivalent to
Is there a way to determine all expressions in the program as it exists that are equivalent or could be made equivalent through any type of folding that one can think up?
The answer to that is "no", it's provable that this is not statically decidable, so the time bound doesn't matter :)

You have to limit the possible folding/evaluation you apply in various ways to make this decidable, and then further limit it to make the time bound reasonable.

This all quickly devolves into herbrand equivalence and it's variations.


Let me try one more time :) May we need multiple reassociate passes to fold different reassociative patterns?

A longer version: If Sanjay wants a particular reassociative pattern to be folded (D45842), Omer wants another particular reassociative pattern to be folded (D41574), and I want yet another particular reassociative pattern to be folded (D46336), would we potentially need three different reassociate passes with each combined with instcombine, rather than just one that may be able to somehow handle those cases in one shot, (assuming we don't want to put those in instcombine)?

And it sounds like the answer is yes?

If you take the current instcombine as a base, then yes, that is correct.
​​
If you are willing to rearchitect instcombine, the answer is no, it's possible to do this all in a single pass in a relatively sane way.

I assume by rearchitect, you mean a major rewrite as per this comment"Is there a way to determine all expressions in the program as it exists that are equivalent or equivalent under constant time constant folding/reassociation, in a reasonable time bound -> yes". Any pointer or time to chat?
I think that an approach like ​
D
​​
​​
4
​​
​​
6
​​
​​
3
​​
​​
3
​​
​​
6
​​
​​
​ / 
​​
D
​​
​​
4
​​
​​
6
​​
​​
5
​​
​​
9
​​
​​
5
​​
​​ has a merit: it would adds a bit of complexity, but would not require:

1. a major rewrite of instcombine,
2. writing multiple (potentially many) reassociate passes and figuring out how to fixpoint them with instcombine, or
3. writing a self-contained folding pass for a specific pattern

If you look at the diffs in the existing .ll files in 
D46336
​, it helps fold some previously-unfolded reassociation patterns beyond the bit check patterns that it originally targeted.








_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] more reassociation in IR

Muhui Jiang via llvm-dev


On Fri, May 11, 2018 at 2:37 PM, Hiroshi Yamauchi <[hidden email]> wrote:


On Thu, May 10, 2018 at 12:49 PM Daniel Berlin <[hidden email]> wrote:


On Thu, May 10, 2018 at 12:05 PM, Hiroshi Yamauchi <[hidden email]> wrote:


On Wed, May 9, 2018 at 8:24 PM Daniel Berlin <[hidden email]> wrote:


On Wed, May 9, 2018 at 10:39 AM, Hiroshi Yamauchi <[hidden email]> wrote:


On Tue, May 8, 2018 at 11:15 AM Daniel Berlin <[hidden email]> wrote:


On Tue, May 8, 2018 at 10:38 AM, Hiroshi Yamauchi via llvm-dev <[hidden email]> wrote:
(
​I came across this issue in the context of
 D46336.
​ ​
Thanks, Sanjay, for starting this discussion.)

If
​we will
 move
​reassociation, 
or keep additional ones
​,​
out of instcombine,
​open questions for me would be
​​:


1. Since -reassociate isn't a fixed point pass,

This is fixable, fwiw, without fixpointing it.

How?

Depends on specifically which part you would like to know about ;)

​Maybe I misunderstood what you meant by "This is fixable". Did you mean that we won't somehow need to fixpoint between instcombine and reassociate, or that the specific motivating examples from the above differentials are foldable without fixpointing?

If by fixpointing you mean "fixpointing reassociate and instcombine", then yes, that is fixable without fixpointing reassociate and instcombine, but would require rewriting instcombine :)
 

If the latter, that may be the case. The concern was that we may encounter examples that may need many more iterations, if not fixpointing. As long as it's feasible to fixpoint between instcombine and reassociate, it seems to work, but I guess that would probably need some pass management change.

 

 
we might need to repeat "-instcombine -reassociate" multiple times to
​fold
 down to what we want (relating to my comment here). I assumed this isn't not what we want to do
​? My impression is we don't do a fixed-point with passes?

Well, i mean there is no practical difference between passes that we fixpoint externally and fixpoint internally.
​​
I had the following in mind: Does the pass manager support fixpointing externally? Is there any performance difference? Are people okay with that in general?

But if there is no practical difference, I don't see any problem with that :)

 
 
2.
​Since -reassociate needs to come up with one operand order (at least currently as the only reassociate pass), would there exist a single, unique operand order that would enable all reassociative/commutative foldings that we want?

In what way?
Are you asking whether there is a single reassociation order that makes all foldings occur in the same operation or something?
I don't feel like i understand what you are asking.

Does this rephrase help: with the motivating examples (like and-of-shifts or bit check patterns) from the above differentials in mind, can we come up with a single reassociation order that solves all those and all the others that may come up in the future? Would we need different reassociation orders to fold different patterns?

It doesn't quite help.
When stated that generally, there can be no such ordering at all, that's easy to prove.  It is a statically undecidable problem.

There is however, a different question and answer to a few related problems that maybe you are really asking?
1. Is there a way to determine and apply the a maximal or nearly-maximal set of folds/graph transforms that could be applied to a given set of code in a sane and principled way -> yes


2. Is there a way to determine all expressions in the program as it exists that are equivalent or equivalent under constant time constant folding/reassociation, in a reasonable time bound -> yes

(not a single easy link, happy to talk about it)

Your original question is basically equivalent to
Is there a way to determine all expressions in the program as it exists that are equivalent or could be made equivalent through any type of folding that one can think up?
The answer to that is "no", it's provable that this is not statically decidable, so the time bound doesn't matter :)

You have to limit the possible folding/evaluation you apply in various ways to make this decidable, and then further limit it to make the time bound reasonable.

This all quickly devolves into herbrand equivalence and it's variations.


Let me try one more time :) May we need multiple reassociate passes to fold different reassociative patterns?

A longer version: If Sanjay wants a particular reassociative pattern to be folded (D45842), Omer wants another particular reassociative pattern to be folded (D41574), and I want yet another particular reassociative pattern to be folded (D46336), would we potentially need three different reassociate passes with each combined with instcombine, rather than just one that may be able to somehow handle those cases in one shot, (assuming we don't want to put those in instcombine)?

And it sounds like the answer is yes?

If you take the current instcombine as a base, then yes, that is correct.
​​
If you are willing to rearchitect instcombine, the answer is no, it's possible to do this all in a single pass in a relatively sane way.

I assume by rearchitect, you mean a major rewrite as per this comment"Is there a way to determine all expressions in the program as it exists that are equivalent or equivalent under constant time constant folding/reassociation, in a reasonable time bound -> yes". Any pointer or time to chat?

I'm happy to do both.
 
I think that an approach like ​
D
​​
​​
4
​​
​​
6
​​
​​
3
​​
​​
3
​​
​​
6
​​
​​
​ / 
​​
D
​​
​​
4
​​
​​
6
​​
​​
5
​​
​​
9
​​
​​
5
​​
​​ has a merit: it would adds a bit of complexity, but would not require:

1. a major rewrite of instcombine,
2. writing multiple (potentially many) reassociate passes and figuring out how to fixpoint them with instcombine, or
3. writing a self-contained folding pass for a specific pattern

If you look at the diffs in the existing .ll files in 
D46336
​, it helps fold some previously-unfolded reassociation patterns beyond the bit check patterns that it originally targeted.

Sure, and it does so by  adding another O(N) cost to evaluation in each case. Instcombine doesn't even do lazy reevaluation through tracking dependencies, so it'll do so a lot of times as well.

To me, that's not a good tradeoff, especially given how slow instcombine is *already*.  The code it produces is "good enough" to stop for a while and do something else and not suffer horribly in performance.[1]

Let me ask a different question:

At what point would anyone here be willing to stop adding things to instcombine and start doing something else instead, instead of waiting for someone else to do it?
As far as i can tell, the answer is: "never", which makes most of these discussions just pointless rehashes as we slowly repeat the same disaster that became  gcc's instruction combiner  :)

If the answer is "something", great, i'll set a mail filter and ignore these threads until that something happens :)

Personally, in my experience people will never do more here unless pushed somewhat, or the thing becomes such a complete disaster no one wants to touch it.

[1]  Last year i computed the "improvement in performance on applications" due to instcombine for a bunch of google apps and open source apps that had easy to use benchmarks (IE I isolated about two years of instcombine changes and made them to a current compiler piece by piece while measuring performance).
I also computed the compile time increase in single instcombine passes over the same time period.

On x86, but the numbers basically said we were basically gaining nearly nothing for high cost.  IE our drive for better looking output does not appear to translate into any real gains that i can find.  Either improvements to other opts hid them, or they simply didn't matter on the processors i tested on.

Certainly, apps/workloads/architectures may vary here, and my goal is not to claim it's all worthless.
My actual goal in all of this was to get a sense of whether my perspective on instcombine was still "reasonable", not to do a true scientific exploration :)
I didn't have time/energy/etc to run it elsewhere, and again, my goal was not to give certainty/try to give exact percentages.

--Dan






_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] more reassociation in IR

Muhui Jiang via llvm-dev

On 05/11/2018 08:40 PM, Daniel Berlin via llvm-dev wrote:


On Fri, May 11, 2018 at 2:37 PM, Hiroshi Yamauchi <[hidden email]> wrote:


On Thu, May 10, 2018 at 12:49 PM Daniel Berlin <[hidden email]> wrote:


On Thu, May 10, 2018 at 12:05 PM, Hiroshi Yamauchi <[hidden email]> wrote:


On Wed, May 9, 2018 at 8:24 PM Daniel Berlin <[hidden email]> wrote:


On Wed, May 9, 2018 at 10:39 AM, Hiroshi Yamauchi <[hidden email]> wrote:


On Tue, May 8, 2018 at 11:15 AM Daniel Berlin <[hidden email]> wrote:


On Tue, May 8, 2018 at 10:38 AM, Hiroshi Yamauchi via llvm-dev <[hidden email]> wrote:
(
​I came across this issue in the context of
 D46336.
​ ​
Thanks, Sanjay, for starting this discussion.)

If
​we will
 move
​reassociation, 
or keep additional ones
​,​
out of instcombine,
​open questions for me would be
​​:


1. Since -reassociate isn't a fixed point pass,

This is fixable, fwiw, without fixpointing it.

How?

Depends on specifically which part you would like to know about ;)

​Maybe I misunderstood what you meant by "This is fixable". Did you mean that we won't somehow need to fixpoint between instcombine and reassociate, or that the specific motivating examples from the above differentials are foldable without fixpointing?

If by fixpointing you mean "fixpointing reassociate and instcombine", then yes, that is fixable without fixpointing reassociate and instcombine, but would require rewriting instcombine :)
 

If the latter, that may be the case. The concern was that we may encounter examples that may need many more iterations, if not fixpointing. As long as it's feasible to fixpoint between instcombine and reassociate, it seems to work, but I guess that would probably need some pass management change.

 

 
we might need to repeat "-instcombine -reassociate" multiple times to
​fold
 down to what we want (relating to my comment here). I assumed this isn't not what we want to do
​? My impression is we don't do a fixed-point with passes?

Well, i mean there is no practical difference between passes that we fixpoint externally and fixpoint internally.
​​
I had the following in mind: Does the pass manager support fixpointing externally? Is there any performance difference? Are people okay with that in general?

But if there is no practical difference, I don't see any problem with that :)

 
 
2.
​Since -reassociate needs to come up with one operand order (at least currently as the only reassociate pass), would there exist a single, unique operand order that would enable all reassociative/commutative foldings that we want?

In what way?
Are you asking whether there is a single reassociation order that makes all foldings occur in the same operation or something?
I don't feel like i understand what you are asking.

Does this rephrase help: with the motivating examples (like and-of-shifts or bit check patterns) from the above differentials in mind, can we come up with a single reassociation order that solves all those and all the others that may come up in the future? Would we need different reassociation orders to fold different patterns?

It doesn't quite help.
When stated that generally, there can be no such ordering at all, that's easy to prove.  It is a statically undecidable problem.

There is however, a different question and answer to a few related problems that maybe you are really asking?
1. Is there a way to determine and apply the a maximal or nearly-maximal set of folds/graph transforms that could be applied to a given set of code in a sane and principled way -> yes


2. Is there a way to determine all expressions in the program as it exists that are equivalent or equivalent under constant time constant folding/reassociation, in a reasonable time bound -> yes

(not a single easy link, happy to talk about it)

Your original question is basically equivalent to
Is there a way to determine all expressions in the program as it exists that are equivalent or could be made equivalent through any type of folding that one can think up?
The answer to that is "no", it's provable that this is not statically decidable, so the time bound doesn't matter :)

You have to limit the possible folding/evaluation you apply in various ways to make this decidable, and then further limit it to make the time bound reasonable.

This all quickly devolves into herbrand equivalence and it's variations.


Let me try one more time :) May we need multiple reassociate passes to fold different reassociative patterns?

A longer version: If Sanjay wants a particular reassociative pattern to be folded (D45842), Omer wants another particular reassociative pattern to be folded (D41574), and I want yet another particular reassociative pattern to be folded (D46336), would we potentially need three different reassociate passes with each combined with instcombine, rather than just one that may be able to somehow handle those cases in one shot, (assuming we don't want to put those in instcombine)?

And it sounds like the answer is yes?

If you take the current instcombine as a base, then yes, that is correct.
​​
If you are willing to rearchitect instcombine, the answer is no, it's possible to do this all in a single pass in a relatively sane way.

I assume by rearchitect, you mean a major rewrite as per this comment"Is there a way to determine all expressions in the program as it exists that are equivalent or equivalent under constant time constant folding/reassociation, in a reasonable time bound -> yes". Any pointer or time to chat?

I'm happy to do both.
 
I think that an approach like ​
D
​​
​​
4
​​
​​
6
​​
​​
3
​​
​​
3
​​
​​
6
​​
​​
​ / 
​​
D
​​
​​
4
​​
​​
6
​​
​​
5
​​
​​
9
​​
​​
5
​​
​​ has a merit: it would adds a bit of complexity, but would not require:

1. a major rewrite of instcombine,
2. writing multiple (potentially many) reassociate passes and figuring out how to fixpoint them with instcombine, or
3. writing a self-contained folding pass for a specific pattern

If you look at the diffs in the existing .ll files in 
D46336
​, it helps fold some previously-unfolded reassociation patterns beyond the bit check patterns that it originally targeted.

Sure, and it does so by  adding another O(N) cost to evaluation in each case. Instcombine doesn't even do lazy reevaluation through tracking dependencies, so it'll do so a lot of times as well.

To me, that's not a good tradeoff, especially given how slow instcombine is *already*.  The code it produces is "good enough" to stop for a while and do something else and not suffer horribly in performance.[1]

Let me ask a different question:

At what point would anyone here be willing to stop adding things to instcombine and start doing something else instead, instead of waiting for someone else to do it?
As far as i can tell, the answer is: "never", which makes most of these discussions just pointless rehashes as we slowly repeat the same disaster that became  gcc's instruction combiner  :)

If the answer is "something", great, i'll set a mail filter and ignore these threads until that something happens :)

Personally, in my experience people will never do more here unless pushed somewhat, or the thing becomes such a complete disaster no one wants to touch it.

I've said this before, but I think a major impediment to forward progress here is coming up with an agreement on what the "something else" should be. Some of us have talked for years about having some TableGen-driven replacement, or maybe we want something with a syntax more like what is used by the Alive tool, but regardless, in order to gain in efficiency I suspect we need a model that is more restrictive than more-or-less arbitrary C++ code, and so we should pick a model and figure out how things might work.


[1]  Last year i computed the "improvement in performance on applications" due to instcombine for a bunch of google apps and open source apps that had easy to use benchmarks (IE I isolated about two years of instcombine changes and made them to a current compiler piece by piece while measuring performance).
I also computed the compile time increase in single instcombine passes over the same time period.

On x86, but the numbers basically said we were basically gaining nearly nothing for high cost.  IE our drive for better looking output does not appear to translate into any real gains that i can find.  Either improvements to other opts hid them, or they simply didn't matter on the processors i tested on.

Certainly, apps/workloads/architectures may vary here, and my goal is not to claim it's all worthless.
My actual goal in all of this was to get a sense of whether my perspective on instcombine was still "reasonable", not to do a true scientific exploration :)
I didn't have time/energy/etc to run it elsewhere, and again, my goal was not to give certainty/try to give exact percentages.

This also matches my experience, but I draw a somewhat different lesson. I often tell application developers that *this* is why they must file compiler bug reports. Waiting and assuming that someone else will hit the same problem, and file the report, is a bad strategy. I think that this is due to two things:

 1. As far as things go, the tail of the distribution is often really long, and probability that the particular thing hampering one piece of hot code is the same thing hampering another piece of hot code is often small.

 2. We tend to add special cases instead of adding more-general algorithms. The more-general work is often hard because figuring out the cost modeling is often highly non-trivial. Also, when it's finally done, the chances that the old special cases are removed is also small (so we'll still accumulate cruft without specific effort).

 -Hal


--Dan







_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] more reassociation in IR

Muhui Jiang via llvm-dev


On Fri, May 11, 2018 at 7:20 PM Hal Finkel <[hidden email]> wrote:

On 05/11/2018 08:40 PM, Daniel Berlin via llvm-dev wrote:


On Fri, May 11, 2018 at 2:37 PM, Hiroshi Yamauchi <[hidden email]> wrote:


On Thu, May 10, 2018 at 12:49 PM Daniel Berlin <[hidden email]> wrote:


On Thu, May 10, 2018 at 12:05 PM, Hiroshi Yamauchi <[hidden email]> wrote:


On Wed, May 9, 2018 at 8:24 PM Daniel Berlin <[hidden email]> wrote:


On Wed, May 9, 2018 at 10:39 AM, Hiroshi Yamauchi <[hidden email]> wrote:


On Tue, May 8, 2018 at 11:15 AM Daniel Berlin <[hidden email]> wrote:


On Tue, May 8, 2018 at 10:38 AM, Hiroshi Yamauchi via llvm-dev <[hidden email]> wrote:
(
​I came across this issue in the context of
 D46336.
​ ​
Thanks, Sanjay, for starting this discussion.)

If
​we will
 move
​reassociation, 
or keep additional ones
​,​
out of instcombine,
​open questions for me would be
​​:


1. Since -reassociate isn't a fixed point pass,

This is fixable, fwiw, without fixpointing it.

How?

Depends on specifically which part you would like to know about ;)

​Maybe I misunderstood what you meant by "This is fixable". Did you mean that we won't somehow need to fixpoint between instcombine and reassociate, or that the specific motivating examples from the above differentials are foldable without fixpointing?

If by fixpointing you mean "fixpointing reassociate and instcombine", then yes, that is fixable without fixpointing reassociate and instcombine, but would require rewriting instcombine :)
 

If the latter, that may be the case. The concern was that we may encounter examples that may need many more iterations, if not fixpointing. As long as it's feasible to fixpoint between instcombine and reassociate, it seems to work, but I guess that would probably need some pass management change.

 

 
we might need to repeat "-instcombine -reassociate" multiple times to
​fold
 down to what we want (relating to my comment here). I assumed this isn't not what we want to do
​? My impression is we don't do a fixed-point with passes?

Well, i mean there is no practical difference between passes that we fixpoint externally and fixpoint internally.
​​
I had the following in mind: Does the pass manager support fixpointing externally? Is there any performance difference? Are people okay with that in general?

But if there is no practical difference, I don't see any problem with that :)

 
 
2.
​Since -reassociate needs to come up with one operand order (at least currently as the only reassociate pass), would there exist a single, unique operand order that would enable all reassociative/commutative foldings that we want?

In what way?
Are you asking whether there is a single reassociation order that makes all foldings occur in the same operation or something?
I don't feel like i understand what you are asking.

Does this rephrase help: with the motivating examples (like and-of-shifts or bit check patterns) from the above differentials in mind, can we come up with a single reassociation order that solves all those and all the others that may come up in the future? Would we need different reassociation orders to fold different patterns?

It doesn't quite help.
When stated that generally, there can be no such ordering at all, that's easy to prove.  It is a statically undecidable problem.

There is however, a different question and answer to a few related problems that maybe you are really asking?
1. Is there a way to determine and apply the a maximal or nearly-maximal set of folds/graph transforms that could be applied to a given set of code in a sane and principled way -> yes


2. Is there a way to determine all expressions in the program as it exists that are equivalent or equivalent under constant time constant folding/reassociation, in a reasonable time bound -> yes

(not a single easy link, happy to talk about it)

Your original question is basically equivalent to
Is there a way to determine all expressions in the program as it exists that are equivalent or could be made equivalent through any type of folding that one can think up?
The answer to that is "no", it's provable that this is not statically decidable, so the time bound doesn't matter :)

You have to limit the possible folding/evaluation you apply in various ways to make this decidable, and then further limit it to make the time bound reasonable.

This all quickly devolves into herbrand equivalence and it's variations.


Let me try one more time :) May we need multiple reassociate passes to fold different reassociative patterns?

A longer version: If Sanjay wants a particular reassociative pattern to be folded (D45842), Omer wants another particular reassociative pattern to be folded (D41574), and I want yet another particular reassociative pattern to be folded (D46336), would we potentially need three different reassociate passes with each combined with instcombine, rather than just one that may be able to somehow handle those cases in one shot, (assuming we don't want to put those in instcombine)?

And it sounds like the answer is yes?

If you take the current instcombine as a base, then yes, that is correct.
​​
If you are willing to rearchitect instcombine, the answer is no, it's possible to do this all in a single pass in a relatively sane way.

I assume by rearchitect, you mean a major rewrite as per this comment"Is there a way to determine all expressions in the program as it exists that are equivalent or equivalent under constant time constant folding/reassociation, in a reasonable time bound -> yes". Any pointer or time to chat?

I'm happy to do both.
 
I think that an approach like ​
D
​​
​​
4
​​
​​
6
​​
​​
3
​​
​​
3
​​
​​
6
​​
​​
​ / 
​​
D
​​
​​
4
​​
​​
6
​​
​​
5
​​
​​
9
​​
​​
5
​​
​​ has a merit: it would adds a bit of complexity, but would not require:

1. a major rewrite of instcombine,
2. writing multiple (potentially many) reassociate passes and figuring out how to fixpoint them with instcombine, or
3. writing a self-contained folding pass for a specific pattern

If you look at the diffs in the existing .ll files in 
D46336
​, it helps fold some previously-unfolded reassociation patterns beyond the bit check patterns that it originally targeted.

Sure, and it does so by  adding another O(N) cost to evaluation in each case. Instcombine doesn't even do lazy reevaluation through tracking dependencies, so it'll do so a lot of times as well.

To me, that's not a good tradeoff, especially given how slow instcombine is *already*.  The code it produces is "good enough" to stop for a while and do something else and not suffer horribly in performance.[1]

Let me ask a different question:

At what point would anyone here be willing to stop adding things to instcombine and start doing something else instead, instead of waiting for someone else to do it?
As far as i can tell, the answer is: "never", which makes most of these discussions just pointless rehashes as we slowly repeat the same disaster that became  gcc's instruction combiner  :)

If the answer is "something", great, i'll set a mail filter and ignore these threads until that something happens :)

Personally, in my experience people will never do more here unless pushed somewhat, or the thing becomes such a complete disaster no one wants to touch it.

I've said this before, but I think a major impediment to forward progress here is coming up with an agreement on what the "something else" should be. Some of us have talked for years about having some TableGen-driven replacement, or maybe we want something with a syntax more like what is used by the Alive tool, but regardless, in order to gain in efficiency I suspect we need a model that is more restrictive than more-or-less arbitrary C++ code, and so we should pick a model and figure out how things might work.


[1]  Last year i computed the "improvement in performance on applications" due to instcombine for a bunch of google apps and open source apps that had easy to use benchmarks (IE I isolated about two years of instcombine changes and made them to a current compiler piece by piece while measuring performance).
I also computed the compile time increase in single instcombine passes over the same time period.

On x86, but the numbers basically said we were basically gaining nearly nothing for high cost.  IE our drive for better looking output does not appear to translate into any real gains that i can find.  Either improvements to other opts hid them, or they simply didn't matter on the processors i tested on.

Certainly, apps/workloads/architectures may vary here, and my goal is not to claim it's all worthless.
My actual goal in all of this was to get a sense of whether my perspective on instcombine was still "reasonable", not to do a true scientific exploration :)
I didn't have time/energy/etc to run it elsewhere, and again, my goal was not to give certainty/try to give exact percentages.

This also matches my experience, but I draw a somewhat different lesson. I often tell application developers that *this* is why they must file compiler bug reports. Waiting and assuming that someone else will hit the same problem, and file the report, is a bad strategy. I think that this is due to two things:

 1. As far as things go, the tail of the distribution is often really long, and probability that the particular thing hampering one piece of hot code is the same thing hampering another piece of hot code is often small.

 2. We tend to add special cases instead of adding more-general algorithms. The more-general work is often hard because figuring out the cost modeling is often highly non-trivial. Also, when it's finally done, the chances that the old special cases are removed is also small (so we'll still accumulate cruft without specific effort).


​I don't have better or ​larger study results or data, but for fixing some smaller but important enough performance degradations (often in microbenchmarks, but occasionally in larger settings) smaller compiler improvements (not necessarily in instcombine but around that level) do seem to matter at least in individual contexts. It's not clear how much those actually mattered in a grander scheme of things, though.

It sounds like we don't want to add to instcombine due to added cost/complexity, rearchitecting would be hard, it's not clear if incremental changes in instcombine did much per cost, which would make it harder to justify rearchitecting it...

What do people think of approaches like D41574 and D45842?

 
 -Hal


--Dan







_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] more reassociation in IR

Muhui Jiang via llvm-dev
I mentioned this earlier in the thread - I would like to see something like D41574 in the optimizer. It's optimizing code that no other pass does currently, and I don't see any other near-term proposal that gets us those optimizations.
 
Omer, can you rebase that to trunk? I think a header has moved, so it doesn't build as-is. I'd like to know if it can catch the cases in D45842. If not, why not? If it can handle those easily, I'll abandon D45842.

Also, I don't know if it's better to include that functionality as another iteration of the existing -reassociate or split it off as its own pass. But I think it should do the distributive simplifications that are currently in -instcombine (InstCombiner::SimplifyUsingDistributiveLaws). Using that instsimplify logic for analysis lets us decide if the reassociation is worthwhile in the 1st place, it removes the risk that some other pass would somehow mess up the pattern before instcombine could zap it, and it reduces the burden on instcombine to be the entire optimizer. :)
 

On Mon, May 14, 2018 at 1:34 PM, Hiroshi Yamauchi via llvm-dev <[hidden email]> wrote:


On Fri, May 11, 2018 at 7:20 PM Hal Finkel <[hidden email]> wrote:

On 05/11/2018 08:40 PM, Daniel Berlin via llvm-dev wrote:


On Fri, May 11, 2018 at 2:37 PM, Hiroshi Yamauchi <[hidden email]> wrote:


On Thu, May 10, 2018 at 12:49 PM Daniel Berlin <[hidden email]> wrote:


On Thu, May 10, 2018 at 12:05 PM, Hiroshi Yamauchi <[hidden email]> wrote:


On Wed, May 9, 2018 at 8:24 PM Daniel Berlin <[hidden email]> wrote:


On Wed, May 9, 2018 at 10:39 AM, Hiroshi Yamauchi <[hidden email]> wrote:


On Tue, May 8, 2018 at 11:15 AM Daniel Berlin <[hidden email]> wrote:


On Tue, May 8, 2018 at 10:38 AM, Hiroshi Yamauchi via llvm-dev <[hidden email]> wrote:
(
​I came across this issue in the context of
 D46336.
​ ​
Thanks, Sanjay, for starting this discussion.)

If
​we will
 move
​reassociation, 
or keep additional ones
​,​
out of instcombine,
​open questions for me would be
​​:


1. Since -reassociate isn't a fixed point pass,

This is fixable, fwiw, without fixpointing it.

How?

Depends on specifically which part you would like to know about ;)

​Maybe I misunderstood what you meant by "This is fixable". Did you mean that we won't somehow need to fixpoint between instcombine and reassociate, or that the specific motivating examples from the above differentials are foldable without fixpointing?

If by fixpointing you mean "fixpointing reassociate and instcombine", then yes, that is fixable without fixpointing reassociate and instcombine, but would require rewriting instcombine :)
 

If the latter, that may be the case. The concern was that we may encounter examples that may need many more iterations, if not fixpointing. As long as it's feasible to fixpoint between instcombine and reassociate, it seems to work, but I guess that would probably need some pass management change.

 

 
we might need to repeat "-instcombine -reassociate" multiple times to
​fold
 down to what we want (relating to my comment here). I assumed this isn't not what we want to do
​? My impression is we don't do a fixed-point with passes?

Well, i mean there is no practical difference between passes that we fixpoint externally and fixpoint internally.
​​
I had the following in mind: Does the pass manager support fixpointing externally? Is there any performance difference? Are people okay with that in general?

But if there is no practical difference, I don't see any problem with that :)

 
 
2.
​Since -reassociate needs to come up with one operand order (at least currently as the only reassociate pass), would there exist a single, unique operand order that would enable all reassociative/commutative foldings that we want?

In what way?
Are you asking whether there is a single reassociation order that makes all foldings occur in the same operation or something?
I don't feel like i understand what you are asking.

Does this rephrase help: with the motivating examples (like and-of-shifts or bit check patterns) from the above differentials in mind, can we come up with a single reassociation order that solves all those and all the others that may come up in the future? Would we need different reassociation orders to fold different patterns?

It doesn't quite help.
When stated that generally, there can be no such ordering at all, that's easy to prove.  It is a statically undecidable problem.

There is however, a different question and answer to a few related problems that maybe you are really asking?
1. Is there a way to determine and apply the a maximal or nearly-maximal set of folds/graph transforms that could be applied to a given set of code in a sane and principled way -> yes


2. Is there a way to determine all expressions in the program as it exists that are equivalent or equivalent under constant time constant folding/reassociation, in a reasonable time bound -> yes

(not a single easy link, happy to talk about it)

Your original question is basically equivalent to
Is there a way to determine all expressions in the program as it exists that are equivalent or could be made equivalent through any type of folding that one can think up?
The answer to that is "no", it's provable that this is not statically decidable, so the time bound doesn't matter :)

You have to limit the possible folding/evaluation you apply in various ways to make this decidable, and then further limit it to make the time bound reasonable.

This all quickly devolves into herbrand equivalence and it's variations.


Let me try one more time :) May we need multiple reassociate passes to fold different reassociative patterns?

A longer version: If Sanjay wants a particular reassociative pattern to be folded (D45842), Omer wants another particular reassociative pattern to be folded (D41574), and I want yet another particular reassociative pattern to be folded (D46336), would we potentially need three different reassociate passes with each combined with instcombine, rather than just one that may be able to somehow handle those cases in one shot, (assuming we don't want to put those in instcombine)?

And it sounds like the answer is yes?

If you take the current instcombine as a base, then yes, that is correct.
​​
If you are willing to rearchitect instcombine, the answer is no, it's possible to do this all in a single pass in a relatively sane way.

I assume by rearchitect, you mean a major rewrite as per this comment"Is there a way to determine all expressions in the program as it exists that are equivalent or equivalent under constant time constant folding/reassociation, in a reasonable time bound -> yes". Any pointer or time to chat?

I'm happy to do both.
 
I think that an approach like ​
D
​​
​​
4
​​
​​
6
​​
​​
3
​​
​​
3
​​
​​
6
​​
​​
​ / 
​​
D
​​
​​
4
​​
​​
6
​​
​​
5
​​
​​
9
​​
​​
5
​​
​​ has a merit: it would adds a bit of complexity, but would not require:

1. a major rewrite of instcombine,
2. writing multiple (potentially many) reassociate passes and figuring out how to fixpoint them with instcombine, or
3. writing a self-contained folding pass for a specific pattern

If you look at the diffs in the existing .ll files in 
D46336
​, it helps fold some previously-unfolded reassociation patterns beyond the bit check patterns that it originally targeted.

Sure, and it does so by  adding another O(N) cost to evaluation in each case. Instcombine doesn't even do lazy reevaluation through tracking dependencies, so it'll do so a lot of times as well.

To me, that's not a good tradeoff, especially given how slow instcombine is *already*.  The code it produces is "good enough" to stop for a while and do something else and not suffer horribly in performance.[1]

Let me ask a different question:

At what point would anyone here be willing to stop adding things to instcombine and start doing something else instead, instead of waiting for someone else to do it?
As far as i can tell, the answer is: "never", which makes most of these discussions just pointless rehashes as we slowly repeat the same disaster that became  gcc's instruction combiner  :)

If the answer is "something", great, i'll set a mail filter and ignore these threads until that something happens :)

Personally, in my experience people will never do more here unless pushed somewhat, or the thing becomes such a complete disaster no one wants to touch it.

I've said this before, but I think a major impediment to forward progress here is coming up with an agreement on what the "something else" should be. Some of us have talked for years about having some TableGen-driven replacement, or maybe we want something with a syntax more like what is used by the Alive tool, but regardless, in order to gain in efficiency I suspect we need a model that is more restrictive than more-or-less arbitrary C++ code, and so we should pick a model and figure out how things might work.


[1]  Last year i computed the "improvement in performance on applications" due to instcombine for a bunch of google apps and open source apps that had easy to use benchmarks (IE I isolated about two years of instcombine changes and made them to a current compiler piece by piece while measuring performance).
I also computed the compile time increase in single instcombine passes over the same time period.

On x86, but the numbers basically said we were basically gaining nearly nothing for high cost.  IE our drive for better looking output does not appear to translate into any real gains that i can find.  Either improvements to other opts hid them, or they simply didn't matter on the processors i tested on.

Certainly, apps/workloads/architectures may vary here, and my goal is not to claim it's all worthless.
My actual goal in all of this was to get a sense of whether my perspective on instcombine was still "reasonable", not to do a true scientific exploration :)
I didn't have time/energy/etc to run it elsewhere, and again, my goal was not to give certainty/try to give exact percentages.

This also matches my experience, but I draw a somewhat different lesson. I often tell application developers that *this* is why they must file compiler bug reports. Waiting and assuming that someone else will hit the same problem, and file the report, is a bad strategy. I think that this is due to two things:

 1. As far as things go, the tail of the distribution is often really long, and probability that the particular thing hampering one piece of hot code is the same thing hampering another piece of hot code is often small.

 2. We tend to add special cases instead of adding more-general algorithms. The more-general work is often hard because figuring out the cost modeling is often highly non-trivial. Also, when it's finally done, the chances that the old special cases are removed is also small (so we'll still accumulate cruft without specific effort).


​I don't have better or ​larger study results or data, but for fixing some smaller but important enough performance degradations (often in microbenchmarks, but occasionally in larger settings) smaller compiler improvements (not necessarily in instcombine but around that level) do seem to matter at least in individual contexts. It's not clear how much those actually mattered in a grander scheme of things, though.

It sounds like we don't want to add to instcombine due to added cost/complexity, rearchitecting would be hard, it's not clear if incremental changes in instcombine did much per cost, which would make it harder to justify rearchitecting it...

What do people think of approaches like D41574 and D45842?

 
 -Hal


--Dan







_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev