[RFC] AArch64: Should we disable GlobalMerge?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[RFC] AArch64: Should we disable GlobalMerge?

Ahmed Bougacha
Hi all,

I've started looking at the GlobalMerge pass, enabled by default on
ARM and AArch64.  I think we should reconsider that, at least for
AArch64.

As is, the pass just merges all globals together, in groups of 4KB
(AArch64, 128B on ARM).

At the time it was enabled, the general thinking was "it's almost
free, it doesn't affect performance much, we might as well use it".
Now, it's preventing some link-time optimizations (as acknowledged in
one of the FIXMEs).


-- Performance impact
Overall, it isn't that profitable on the test-suite, and actually
degrades performance on a lot of other - "non-benchmark" - projects I
tried (where the main reason to use a global is file- or function-
static variables, only accessed through a single getter function).

Across several runs on the entire test-suite, when disabling the pass,
I measured:
without LTO, a -0.19% geomean improvement
with LTO, a +0.11% geomean regression.

As for just SPEC2006, there are two big regressions: 400.perlbench
(10.6% w/ LTO, 2.7% w/o) and 471.omnetpp (2.3% w/, 3.9% w/o).

Numbers are attached.


-- A way forward
One obvious way to improve it is: look at uses of globals, and try to
form sets of globals commonly used together.  The tricky part is to
define heuristics for "commonly".  Also, the pass then becomes much
more expensive.  I'm currently looking into improving it, and will
report if I come up with a good solution.  But this shouldn't stop us
from disabling it, for now.

Also, the pass seems like a good candidate for
-O3/CodeGenOpt::Aggressive.  However, the latter is implied by LTO,
which IMO shouldn't include these not-always-profitable optimizations.
That's another problem though.



Right now, I think we should disable the pass by default, until it's
deemed profitable enough.

-Ahmed
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AArch64: Should we disable GlobalMerge?

Ahmed Bougacha
With the numbers!
-Ahmed


On Wed, Feb 25, 2015 at 4:57 PM, Ahmed Bougacha
<[hidden email]> wrote:

> Hi all,
>
> I've started looking at the GlobalMerge pass, enabled by default on
> ARM and AArch64.  I think we should reconsider that, at least for
> AArch64.
>
> As is, the pass just merges all globals together, in groups of 4KB
> (AArch64, 128B on ARM).
>
> At the time it was enabled, the general thinking was "it's almost
> free, it doesn't affect performance much, we might as well use it".
> Now, it's preventing some link-time optimizations (as acknowledged in
> one of the FIXMEs).
>
>
> -- Performance impact
> Overall, it isn't that profitable on the test-suite, and actually
> degrades performance on a lot of other - "non-benchmark" - projects I
> tried (where the main reason to use a global is file- or function-
> static variables, only accessed through a single getter function).
>
> Across several runs on the entire test-suite, when disabling the pass,
> I measured:
> without LTO, a -0.19% geomean improvement
> with LTO, a +0.11% geomean regression.
>
> As for just SPEC2006, there are two big regressions: 400.perlbench
> (10.6% w/ LTO, 2.7% w/o) and 471.omnetpp (2.3% w/, 3.9% w/o).
>
> Numbers are attached.
>
>
> -- A way forward
> One obvious way to improve it is: look at uses of globals, and try to
> form sets of globals commonly used together.  The tricky part is to
> define heuristics for "commonly".  Also, the pass then becomes much
> more expensive.  I'm currently looking into improving it, and will
> report if I come up with a good solution.  But this shouldn't stop us
> from disabling it, for now.
>
> Also, the pass seems like a good candidate for
> -O3/CodeGenOpt::Aggressive.  However, the latter is implied by LTO,
> which IMO shouldn't include these not-always-profitable optimizations.
> That's another problem though.
>
>
>
> Right now, I think we should disable the pass by default, until it's
> deemed profitable enough.
>
> -Ahmed

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

disable_globalmerge_aarch64_LTO.txt (34K) Download Attachment
disable_globalmerge_aarch64.txt (37K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AArch64: Should we disable GlobalMerge?

Kristof Beyls
Hi Ahmed,

Did you run these experiments on a platform with a linker that makes
use of the AArch64CollectLOH-pass-produced information?
I'm guessing that the AArch64CollectLOH-pass information and a linker
that makes use of that information could affect the profitability of
the GlobalMerge pass?

Thanks,

Kristof

> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]]
> On Behalf Of Ahmed Bougacha
> Sent: 26 February 2015 01:13
> To: LLVM Dev
> Subject: Re: [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
>
> With the numbers!
> -Ahmed
>
>
> On Wed, Feb 25, 2015 at 4:57 PM, Ahmed Bougacha
> <[hidden email]> wrote:
> > Hi all,
> >
> > I've started looking at the GlobalMerge pass, enabled by default on
> > ARM and AArch64.  I think we should reconsider that, at least for
> > AArch64.
> >
> > As is, the pass just merges all globals together, in groups of 4KB
> > (AArch64, 128B on ARM).
> >
> > At the time it was enabled, the general thinking was "it's almost
> > free, it doesn't affect performance much, we might as well use it".
> > Now, it's preventing some link-time optimizations (as acknowledged in
> > one of the FIXMEs).
> >
> >
> > -- Performance impact
> > Overall, it isn't that profitable on the test-suite, and actually
> > degrades performance on a lot of other - "non-benchmark" - projects I
> > tried (where the main reason to use a global is file- or function-
> > static variables, only accessed through a single getter function).
> >
> > Across several runs on the entire test-suite, when disabling the pass,
> > I measured:
> > without LTO, a -0.19% geomean improvement with LTO, a +0.11% geomean
> > regression.
> >
> > As for just SPEC2006, there are two big regressions: 400.perlbench
> > (10.6% w/ LTO, 2.7% w/o) and 471.omnetpp (2.3% w/, 3.9% w/o).
> >
> > Numbers are attached.
> >
> >
> > -- A way forward
> > One obvious way to improve it is: look at uses of globals, and try to
> > form sets of globals commonly used together.  The tricky part is to
> > define heuristics for "commonly".  Also, the pass then becomes much
> > more expensive.  I'm currently looking into improving it, and will
> > report if I come up with a good solution.  But this shouldn't stop us
> > from disabling it, for now.
> >
> > Also, the pass seems like a good candidate for
> > -O3/CodeGenOpt::Aggressive.  However, the latter is implied by LTO,
> > which IMO shouldn't include these not-always-profitable optimizations.
> > That's another problem though.
> >
> >
> >
> > Right now, I think we should disable the pass by default, until it's
> > deemed profitable enough.
> >
> > -Ahmed




_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AArch64: Should we disable GlobalMerge?

Renato Golin-2
In reply to this post by Ahmed Bougacha
On 26 February 2015 at 00:57, Ahmed Bougacha <[hidden email]> wrote:
> -- A way forward
> One obvious way to improve it is: look at uses of globals, and try to
> form sets of globals commonly used together.  The tricky part is to
> define heuristics for "commonly".  Also, the pass then becomes much
> more expensive.  I'm currently looking into improving it, and will
> report if I come up with a good solution.  But this shouldn't stop us
> from disabling it, for now.

Hi Ahmed,

Before "moving forward", it would be good to understand what in
GlobalMerge is impacting what in LTO.

With LTO becoming more important nowadays, I agree we have to balance
the compiler optimisations to work well with it, but by turning things
off we might be impacting unknown code in an unknown way.

We'll never know how unknown code behaves, but if at least we
understand what of GM affects what of LTO, then people using unknown
code will have a more informed view on what to disable, when.

cheers,
--renato
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AArch64: Should we disable GlobalMerge?

Jiangning Liu
Hi Ahmed,

Yes. I'd share with Kristof and Renato's concerns, and the impact/dependence upon link-time tool should be clarified before disabling this pass.

On the other hand, actually the test on our hardware shows disabling this pass without LTO considered, some spec benchmarks would have big regressions, (positive is bad)

spec.cpu2000.ref.253_perlbmk 3.27%
spec.cpu2000.ref.254_gap 3.18%

although I do see some improvements like below, (negative is good)

spec.cpu2006.ref.400_perlbench -1.90%
spec.cpu2006.ref.471_omnetpp -1.64%
spec.cpu2006.ref.482_sphinx3 -1.03% 

Thanks,
-Jiangning


2015-02-26 20:09 GMT+08:00 Renato Golin <[hidden email]>:
On 26 February 2015 at 00:57, Ahmed Bougacha <[hidden email]> wrote:
> -- A way forward
> One obvious way to improve it is: look at uses of globals, and try to
> form sets of globals commonly used together.  The tricky part is to
> define heuristics for "commonly".  Also, the pass then becomes much
> more expensive.  I'm currently looking into improving it, and will
> report if I come up with a good solution.  But this shouldn't stop us
> from disabling it, for now.

Hi Ahmed,

Before "moving forward", it would be good to understand what in
GlobalMerge is impacting what in LTO.

With LTO becoming more important nowadays, I agree we have to balance
the compiler optimisations to work well with it, but by turning things
off we might be impacting unknown code in an unknown way.

We'll never know how unknown code behaves, but if at least we
understand what of GM affects what of LTO, then people using unknown
code will have a more informed view on what to disable, when.

cheers,
--renato
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AArch64: Should we disable GlobalMerge?

Jim Grosbach
In reply to this post by Kristof Beyls
Hi Kristof,

Our tests are on iOS, which definitely uses the LOH optimizations for ARM64.

-Jim

> On Feb 26, 2015, at 2:33 AM, Kristof Beyls <[hidden email]> wrote:
>
> Hi Ahmed,
>
> Did you run these experiments on a platform with a linker that makes
> use of the AArch64CollectLOH-pass-produced information?
> I'm guessing that the AArch64CollectLOH-pass information and a linker
> that makes use of that information could affect the profitability of
> the GlobalMerge pass?
>
> Thanks,
>
> Kristof
>
>> -----Original Message-----
>> From: [hidden email] [mailto:[hidden email]]
>> On Behalf Of Ahmed Bougacha
>> Sent: 26 February 2015 01:13
>> To: LLVM Dev
>> Subject: Re: [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
>>
>> With the numbers!
>> -Ahmed
>>
>>
>> On Wed, Feb 25, 2015 at 4:57 PM, Ahmed Bougacha
>> <[hidden email]> wrote:
>>> Hi all,
>>>
>>> I've started looking at the GlobalMerge pass, enabled by default on
>>> ARM and AArch64.  I think we should reconsider that, at least for
>>> AArch64.
>>>
>>> As is, the pass just merges all globals together, in groups of 4KB
>>> (AArch64, 128B on ARM).
>>>
>>> At the time it was enabled, the general thinking was "it's almost
>>> free, it doesn't affect performance much, we might as well use it".
>>> Now, it's preventing some link-time optimizations (as acknowledged in
>>> one of the FIXMEs).
>>>
>>>
>>> -- Performance impact
>>> Overall, it isn't that profitable on the test-suite, and actually
>>> degrades performance on a lot of other - "non-benchmark" - projects I
>>> tried (where the main reason to use a global is file- or function-
>>> static variables, only accessed through a single getter function).
>>>
>>> Across several runs on the entire test-suite, when disabling the pass,
>>> I measured:
>>> without LTO, a -0.19% geomean improvement with LTO, a +0.11% geomean
>>> regression.
>>>
>>> As for just SPEC2006, there are two big regressions: 400.perlbench
>>> (10.6% w/ LTO, 2.7% w/o) and 471.omnetpp (2.3% w/, 3.9% w/o).
>>>
>>> Numbers are attached.
>>>
>>>
>>> -- A way forward
>>> One obvious way to improve it is: look at uses of globals, and try to
>>> form sets of globals commonly used together.  The tricky part is to
>>> define heuristics for "commonly".  Also, the pass then becomes much
>>> more expensive.  I'm currently looking into improving it, and will
>>> report if I come up with a good solution.  But this shouldn't stop us
>>> from disabling it, for now.
>>>
>>> Also, the pass seems like a good candidate for
>>> -O3/CodeGenOpt::Aggressive.  However, the latter is implied by LTO,
>>> which IMO shouldn't include these not-always-profitable optimizations.
>>> That's another problem though.
>>>
>>>
>>>
>>> Right now, I think we should disable the pass by default, until it's
>>> deemed profitable enough.
>>>
>>> -Ahmed
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AArch64: Should we disable GlobalMerge?

Ahmed Bougacha
In reply to this post by Kristof Beyls
On Thu, Feb 26, 2015 at 2:33 AM, Kristof Beyls <[hidden email]> wrote:
>
> Hi Ahmed,
>
> Did you run these experiments on a platform with a linker that makes
> use of the AArch64CollectLOH-pass-produced information?

As Jim says, I'm on iOS, so yes.  However, I'm mostly running tests
with the pass disabled.

>
> I'm guessing that the AArch64CollectLOH-pass information and a linker
> that makes use of that information could affect the profitability of
> the GlobalMerge pass?

It could, and does, from what I've seen (beware anecdata):
- reusing the adrp base prevents optimizing it (the various
Adrp*{ldr,str} LOHs).
- reusing the adrp+add MergedGlobal pointer, with indexed addressing,
doesn't prevent the AdrpAdd optimization.

All in all, whether GlobalMerge is profitable or not (by increasing
register pressure, or adding another indirection), whenever the LOH
optimizations fire, they reduce its usefulness.

AFAICT, the only case where LOHs help GlobalMerge is when the
MergedGlobal base is closer to the adrp sequence than the actual
global.  Given that we only merge 4k of globals, on a 1MB range this
doesn't happen very often.



Which brings us to my fallback proposal:  what about disabling the
pass on darwin only?  Various darwin-enabled features (e.g., LOHs)
help mitigate the adrp problem, and global usage is usually frowned
upon in those circles (except for singletons, class-/function-statics
and whatnot, which I'm trying to address in an upcoming patch).

As for other targets, as a first step, making the pass run under -O3
rather than -O1 is hopefully agreeable to everyone?  After all, it is
"aggressive", and isn't always profitable.  That's pretty much the
description of -O3.
We can still run into problematic cases under LTO, though.

-Ahmed

>
> Thanks,
>
> Kristof
>
> > -----Original Message-----
> > From: [hidden email] [mailto:[hidden email]]
> > On Behalf Of Ahmed Bougacha
> > Sent: 26 February 2015 01:13
> > To: LLVM Dev
> > Subject: Re: [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
> >
> > With the numbers!
> > -Ahmed
> >
> >
> > On Wed, Feb 25, 2015 at 4:57 PM, Ahmed Bougacha
> > <[hidden email]> wrote:
> > > Hi all,
> > >
> > > I've started looking at the GlobalMerge pass, enabled by default on
> > > ARM and AArch64.  I think we should reconsider that, at least for
> > > AArch64.
> > >
> > > As is, the pass just merges all globals together, in groups of 4KB
> > > (AArch64, 128B on ARM).
> > >
> > > At the time it was enabled, the general thinking was "it's almost
> > > free, it doesn't affect performance much, we might as well use it".
> > > Now, it's preventing some link-time optimizations (as acknowledged in
> > > one of the FIXMEs).
> > >
> > >
> > > -- Performance impact
> > > Overall, it isn't that profitable on the test-suite, and actually
> > > degrades performance on a lot of other - "non-benchmark" - projects I
> > > tried (where the main reason to use a global is file- or function-
> > > static variables, only accessed through a single getter function).
> > >
> > > Across several runs on the entire test-suite, when disabling the pass,
> > > I measured:
> > > without LTO, a -0.19% geomean improvement with LTO, a +0.11% geomean
> > > regression.
> > >
> > > As for just SPEC2006, there are two big regressions: 400.perlbench
> > > (10.6% w/ LTO, 2.7% w/o) and 471.omnetpp (2.3% w/, 3.9% w/o).
> > >
> > > Numbers are attached.
> > >
> > >
> > > -- A way forward
> > > One obvious way to improve it is: look at uses of globals, and try to
> > > form sets of globals commonly used together.  The tricky part is to
> > > define heuristics for "commonly".  Also, the pass then becomes much
> > > more expensive.  I'm currently looking into improving it, and will
> > > report if I come up with a good solution.  But this shouldn't stop us
> > > from disabling it, for now.
> > >
> > > Also, the pass seems like a good candidate for
> > > -O3/CodeGenOpt::Aggressive.  However, the latter is implied by LTO,
> > > which IMO shouldn't include these not-always-profitable optimizations.
> > > That's another problem though.
> > >
> > >
> > >
> > > Right now, I think we should disable the pass by default, until it's
> > > deemed profitable enough.
> > >
> > > -Ahmed
>
>
>
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AArch64: Should we disable GlobalMerge?

Eric Christopher


On Fri, Feb 27, 2015 at 1:38 PM Ahmed Bougacha <[hidden email]> wrote:
On Thu, Feb 26, 2015 at 2:33 AM, Kristof Beyls <[hidden email]> wrote:
>
> Hi Ahmed,
>
> Did you run these experiments on a platform with a linker that makes
> use of the AArch64CollectLOH-pass-produced information?

As Jim says, I'm on iOS, so yes.  However, I'm mostly running tests
with the pass disabled.

>
> I'm guessing that the AArch64CollectLOH-pass information and a linker
> that makes use of that information could affect the profitability of
> the GlobalMerge pass?

It could, and does, from what I've seen (beware anecdata):
- reusing the adrp base prevents optimizing it (the various
Adrp*{ldr,str} LOHs).
- reusing the adrp+add MergedGlobal pointer, with indexed addressing,
doesn't prevent the AdrpAdd optimization.

All in all, whether GlobalMerge is profitable or not (by increasing
register pressure, or adding another indirection), whenever the LOH
optimizations fire, they reduce its usefulness.

AFAICT, the only case where LOHs help GlobalMerge is when the
MergedGlobal base is closer to the adrp sequence than the actual
global.  Given that we only merge 4k of globals, on a 1MB range this
doesn't happen very often.



Which brings us to my fallback proposal:  what about disabling the
pass on darwin only?  Various darwin-enabled features (e.g., LOHs)
help mitigate the adrp problem, and global usage is usually frowned
upon in those circles (except for singletons, class-/function-statics
and whatnot, which I'm trying to address in an upcoming patch).


Before making the disabling darwin only I'd like to see some analysis of the regressions/improvements. Has anyone looked at the code for those yet?

 
As for other targets, as a first step, making the pass run under -O3
rather than -O1 is hopefully agreeable to everyone?  After all, it is
"aggressive", and isn't always profitable.  That's pretty much the
description of -O3.
We can still run into problematic cases under LTO, though.


Seems reasonable to me, but probably want to see what happens with the above questions first.

-eric
 
-Ahmed

>
> Thanks,
>
> Kristof
>
> > -----Original Message-----
> > From: [hidden email] [mailto:[hidden email]]
> > On Behalf Of Ahmed Bougacha
> > Sent: 26 February 2015 01:13
> > To: LLVM Dev
> > Subject: Re: [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
> >
> > With the numbers!
> > -Ahmed
> >
> >
> > On Wed, Feb 25, 2015 at 4:57 PM, Ahmed Bougacha
> > <[hidden email]> wrote:
> > > Hi all,
> > >
> > > I've started looking at the GlobalMerge pass, enabled by default on
> > > ARM and AArch64.  I think we should reconsider that, at least for
> > > AArch64.
> > >
> > > As is, the pass just merges all globals together, in groups of 4KB
> > > (AArch64, 128B on ARM).
> > >
> > > At the time it was enabled, the general thinking was "it's almost
> > > free, it doesn't affect performance much, we might as well use it".
> > > Now, it's preventing some link-time optimizations (as acknowledged in
> > > one of the FIXMEs).
> > >
> > >
> > > -- Performance impact
> > > Overall, it isn't that profitable on the test-suite, and actually
> > > degrades performance on a lot of other - "non-benchmark" - projects I
> > > tried (where the main reason to use a global is file- or function-
> > > static variables, only accessed through a single getter function).
> > >
> > > Across several runs on the entire test-suite, when disabling the pass,
> > > I measured:
> > > without LTO, a -0.19% geomean improvement with LTO, a +0.11% geomean
> > > regression.
> > >
> > > As for just SPEC2006, there are two big regressions: 400.perlbench
> > > (10.6% w/ LTO, 2.7% w/o) and 471.omnetpp (2.3% w/, 3.9% w/o).
> > >
> > > Numbers are attached.
> > >
> > >
> > > -- A way forward
> > > One obvious way to improve it is: look at uses of globals, and try to
> > > form sets of globals commonly used together.  The tricky part is to
> > > define heuristics for "commonly".  Also, the pass then becomes much
> > > more expensive.  I'm currently looking into improving it, and will
> > > report if I come up with a good solution.  But this shouldn't stop us
> > > from disabling it, for now.
> > >
> > > Also, the pass seems like a good candidate for
> > > -O3/CodeGenOpt::Aggressive.  However, the latter is implied by LTO,
> > > which IMO shouldn't include these not-always-profitable optimizations.
> > > That's another problem though.
> > >
> > >
> > >
> > > Right now, I think we should disable the pass by default, until it's
> > > deemed profitable enough.
> > >
> > > -Ahmed
>
>
>
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AArch64: Should we disable GlobalMerge?

Renato Golin-2
In reply to this post by Ahmed Bougacha
On 27 February 2015 at 21:26, Ahmed Bougacha <[hidden email]> wrote:
> Which brings us to my fallback proposal:  what about disabling the
> pass on darwin only?

That's a decision for Jim/Evan. I'm ok if they are.


> As for other targets, as a first step, making the pass run under -O3
> rather than -O1 is hopefully agreeable to everyone?

Sounds reasonable.

Even though it conflicts with LTO, that's what O3 means, as you said,
instability. People at O3 might want to fiddle with the passes
(on/off) to get the best performance for their own code/workload.

cheers,
--renato
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AArch64: Should we disable GlobalMerge?

Ahmed Bougacha
In reply to this post by Renato Golin-2
On Thu, Feb 26, 2015 at 4:09 AM, Renato Golin <[hidden email]> wrote:

> On 26 February 2015 at 00:57, Ahmed Bougacha <[hidden email]> wrote:
>> -- A way forward
>> One obvious way to improve it is: look at uses of globals, and try to
>> form sets of globals commonly used together.  The tricky part is to
>> define heuristics for "commonly".  Also, the pass then becomes much
>> more expensive.  I'm currently looking into improving it, and will
>> report if I come up with a good solution.  But this shouldn't stop us
>> from disabling it, for now.
>
> Hi Ahmed,
>
> Before "moving forward", it would be good to understand what in
> GlobalMerge is impacting what in LTO.
>
> With LTO becoming more important nowadays, I agree we have to balance
> the compiler optimisations to work well with it, but by turning things
> off we might be impacting unknown code in an unknown way.
>
> We'll never know how unknown code behaves, but if at least we
> understand what of GM affects what of LTO, then people using unknown
> code will have a more informed view on what to disable, when.

Fair enough.  First, a couple things to note:
- GlobalMerge runs as a pre-ISel pass, so very late in the mid-level pipeline.
- GlobalMerge (by default) only looks at internal globals.

Internal globals come up with file- or function- static variables.  In
LTO, all module-level globals are internalized, and are eligible for
merging.

So, we can generally group global usage into a few categories:
- a function that uses a local static variable (say, llvm::outs())
- a function that uses several globals at once.  For instance,
400.perlbench's interpreter has a bunch of those, as does its
parser/lexer.
- a set of functions that share a few common globals (say, an inlined
reference to a function-local static variable), but otherwise each use
several other globals (again, perl's interpreter).


GlobalMerge is only ever a win if we are able to share base pointers.
This requires:
- several globals being referenced
- the references being close enough (otherwise we'll just
rematerialize the base, or worse, increase register pressure)

There is one obvious special case for the first requirement:  if a
global is only ever used alone, there's no point in merging it
anywhere. (this is improvement #1).
Once we can determine the set of used globals for each function, we
can try to merge those sets only. (#2)

We can try to better handle the second requirement, by having some
more precise metric for distance between uses.  One trivially
available such metric is grouping used sets by parent basic-block
rather than function (#3).



Experimentally, #1 catches a lot of the singleton-ish globals out
there, which is the majority in some of the more "modern" code I've
looked at.  It leaves the legitimate merging in perl alone.

#2 (and even moreso #3) is actually too aggressive, and doesn't catch
a lot/most of the profitable cases in perl.  Consider:
- a "g_log" global (or, say, LLVM's outs/dbgs/errs), used pretty much everywhere
- several sets of globals, used in different parts of the program
(perl's interpreter vs parser)

You'd pick one of the latter sets, and add the "g_log" global to it.
Now you made it more expensive everywhere you use "g_log", without the
benefit of base sharing in all the other functions.

So you need to be smart when picking the sets.  You can combine some
of them, using some cost metric.  (#4)  This is where it gets
complicated.


I'll try measuring some of those, see what happens on benchmarks.
Again, that shouldn't stop us from enabling GlobalMerge less often.
Hopefully it's clear that the pass isn't always a win, so -O3 should
be OK.  I'm less comfortable with disabling it on Darwin only, but
that seems like the obvious next step.

Thanks for the feedback!

-Ahmed

> cheers,
> --renato
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AArch64: Should we disable GlobalMerge?

Ahmed Bougacha
In reply to this post by Jiangning Liu
On Thu, Feb 26, 2015 at 1:13 PM, Jiangning Liu <[hidden email]> wrote:

> Hi Ahmed,
>
> Yes. I'd share with Kristof and Renato's concerns, and the impact/dependence
> upon link-time tool should be clarified before disabling this pass.
>
> On the other hand, actually the test on our hardware shows disabling this
> pass without LTO considered, some spec benchmarks would have big
> regressions, (positive is bad)
>
> spec.cpu2000.ref.253_perlbmk 3.27%
> spec.cpu2000.ref.254_gap 3.18%
>
> although I do see some improvements like below, (negative is good)
>
> spec.cpu2006.ref.400_perlbench -1.90%
> spec.cpu2006.ref.471_omnetpp -1.64%
> spec.cpu2006.ref.482_sphinx3 -1.03%

Interesting!  Can you share geomean SPEC2006/2000 numbers, perhaps?

-Ahmed

> Thanks,
> -Jiangning
>
>
> 2015-02-26 20:09 GMT+08:00 Renato Golin <[hidden email]>:
>>
>> On 26 February 2015 at 00:57, Ahmed Bougacha <[hidden email]>
>> wrote:
>> > -- A way forward
>> > One obvious way to improve it is: look at uses of globals, and try to
>> > form sets of globals commonly used together.  The tricky part is to
>> > define heuristics for "commonly".  Also, the pass then becomes much
>> > more expensive.  I'm currently looking into improving it, and will
>> > report if I come up with a good solution.  But this shouldn't stop us
>> > from disabling it, for now.
>>
>> Hi Ahmed,
>>
>> Before "moving forward", it would be good to understand what in
>> GlobalMerge is impacting what in LTO.
>>
>> With LTO becoming more important nowadays, I agree we have to balance
>> the compiler optimisations to work well with it, but by turning things
>> off we might be impacting unknown code in an unknown way.
>>
>> We'll never know how unknown code behaves, but if at least we
>> understand what of GM affects what of LTO, then people using unknown
>> code will have a more informed view on what to disable, when.
>>
>> cheers,
>> --renato
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AArch64: Should we disable GlobalMerge?

Ahmed Bougacha
In reply to this post by Eric Christopher
On Fri, Feb 27, 2015 at 1:42 PM, Eric Christopher <[hidden email]> wrote:

>
>
> On Fri, Feb 27, 2015 at 1:38 PM Ahmed Bougacha <[hidden email]>
> wrote:
>>
>> On Thu, Feb 26, 2015 at 2:33 AM, Kristof Beyls <[hidden email]>
>> wrote:
>> >
>> > Hi Ahmed,
>> >
>> > Did you run these experiments on a platform with a linker that makes
>> > use of the AArch64CollectLOH-pass-produced information?
>>
>> As Jim says, I'm on iOS, so yes.  However, I'm mostly running tests
>> with the pass disabled.
>>
>> >
>> > I'm guessing that the AArch64CollectLOH-pass information and a linker
>> > that makes use of that information could affect the profitability of
>> > the GlobalMerge pass?
>>
>> It could, and does, from what I've seen (beware anecdata):
>> - reusing the adrp base prevents optimizing it (the various
>> Adrp*{ldr,str} LOHs).
>> - reusing the adrp+add MergedGlobal pointer, with indexed addressing,
>> doesn't prevent the AdrpAdd optimization.
>>
>> All in all, whether GlobalMerge is profitable or not (by increasing
>> register pressure, or adding another indirection), whenever the LOH
>> optimizations fire, they reduce its usefulness.
>>
>> AFAICT, the only case where LOHs help GlobalMerge is when the
>> MergedGlobal base is closer to the adrp sequence than the actual
>> global.  Given that we only merge 4k of globals, on a 1MB range this
>> doesn't happen very often.
>>
>>
>>
>> Which brings us to my fallback proposal:  what about disabling the
>> pass on darwin only?  Various darwin-enabled features (e.g., LOHs)
>> help mitigate the adrp problem, and global usage is usually frowned
>> upon in those circles (except for singletons, class-/function-statics
>> and whatnot, which I'm trying to address in an upcoming patch).
>>
>
> Before making the disabling darwin only I'd like to see some analysis of the
> regressions/improvements. Has anyone looked at the code for those yet?

Yep, I put a quick analysis in my other reply.

>
>>
>> As for other targets, as a first step, making the pass run under -O3
>> rather than -O1 is hopefully agreeable to everyone?  After all, it is
>> "aggressive", and isn't always profitable.  That's pretty much the
>> description of -O3.
>> We can still run into problematic cases under LTO, though.
>>
>
> Seems reasonable to me, but probably want to see what happens with the above
> questions first.

Fair enough.  Bottom line is:
- disabling it without LTO is a slight win on the test-suite, a solid
win everywhere else I've looked.
- disabling it with LTO regresses quite a few SPEC benchmarks, and is
overall a slight regression on the test-suite.

-Ahmed

> -eric
>
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AArch64: Should we disable GlobalMerge?

Ahmed Bougacha
In reply to this post by Renato Golin-2
On Fri, Feb 27, 2015 at 2:01 PM, Renato Golin <[hidden email]> wrote:
> On 27 February 2015 at 21:26, Ahmed Bougacha <[hidden email]> wrote:
>> Which brings us to my fallback proposal:  what about disabling the
>> pass on darwin only?
>
> That's a decision for Jim/Evan. I'm ok if they are.

Jim, thoughts?

>
>> As for other targets, as a first step, making the pass run under -O3
>> rather than -O1 is hopefully agreeable to everyone?
>
> Sounds reasonable.

Great!

> Even though it conflicts with LTO, that's what O3 means, as you said,
> instability. People at O3 might want to fiddle with the passes
> (on/off) to get the best performance for their own code/workload.

By the way, I'm not convinced LTO being either -O3 or -O0 is sensible.
But that's a discussion for another day =)

-Ahmed

> cheers,
> --renato
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AArch64: Should we disable GlobalMerge?

Eric Christopher
In reply to this post by Ahmed Bougacha


On Fri, Feb 27, 2015 at 2:13 PM Ahmed Bougacha <[hidden email]> wrote:
On Fri, Feb 27, 2015 at 1:42 PM, Eric Christopher <[hidden email]> wrote:
>
>
> On Fri, Feb 27, 2015 at 1:38 PM Ahmed Bougacha <[hidden email]>
> wrote:
>>
>> On Thu, Feb 26, 2015 at 2:33 AM, Kristof Beyls <[hidden email]>
>> wrote:
>> >
>> > Hi Ahmed,
>> >
>> > Did you run these experiments on a platform with a linker that makes
>> > use of the AArch64CollectLOH-pass-produced information?
>>
>> As Jim says, I'm on iOS, so yes.  However, I'm mostly running tests
>> with the pass disabled.
>>
>> >
>> > I'm guessing that the AArch64CollectLOH-pass information and a linker
>> > that makes use of that information could affect the profitability of
>> > the GlobalMerge pass?
>>
>> It could, and does, from what I've seen (beware anecdata):
>> - reusing the adrp base prevents optimizing it (the various
>> Adrp*{ldr,str} LOHs).
>> - reusing the adrp+add MergedGlobal pointer, with indexed addressing,
>> doesn't prevent the AdrpAdd optimization.
>>
>> All in all, whether GlobalMerge is profitable or not (by increasing
>> register pressure, or adding another indirection), whenever the LOH
>> optimizations fire, they reduce its usefulness.
>>
>> AFAICT, the only case where LOHs help GlobalMerge is when the
>> MergedGlobal base is closer to the adrp sequence than the actual
>> global.  Given that we only merge 4k of globals, on a 1MB range this
>> doesn't happen very often.
>>
>>
>>
>> Which brings us to my fallback proposal:  what about disabling the
>> pass on darwin only?  Various darwin-enabled features (e.g., LOHs)
>> help mitigate the adrp problem, and global usage is usually frowned
>> upon in those circles (except for singletons, class-/function-statics
>> and whatnot, which I'm trying to address in an upcoming patch).
>>
>
> Before making the disabling darwin only I'd like to see some analysis of the
> regressions/improvements. Has anyone looked at the code for those yet?

Yep, I put a quick analysis in my other reply.

The LOH/ADRP bit?
 

>
>>
>> As for other targets, as a first step, making the pass run under -O3
>> rather than -O1 is hopefully agreeable to everyone?  After all, it is
>> "aggressive", and isn't always profitable.  That's pretty much the
>> description of -O3.
>> We can still run into problematic cases under LTO, though.
>>
>
> Seems reasonable to me, but probably want to see what happens with the above
> questions first.

Fair enough.  Bottom line is:
- disabling it without LTO is a slight win on the test-suite, a solid
win everywhere else I've looked.
- disabling it with LTO regresses quite a few SPEC benchmarks, and is
overall a slight regression on the test-suite.


Ah, I meant an analysis of the code, not just the numbers. I think the ADRP/LOH commentary really helps. It might only be a decent LTOish optimization, but I'm still curious how it's helping there over other optimizations.

Anyhow, FWIW I'm in favor of pulling it out of the non-LTO pipeline universally.

-eric
 
-Ahmed

> -eric
>

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AArch64: Should we disable GlobalMerge?

Ahmed Bougacha
In reply to this post by Ahmed Bougacha
On Fri, Feb 27, 2015 at 2:15 PM, Ahmed Bougacha
<[hidden email]> wrote:
> By the way, I'm not convinced LTO being either -O3 or -O0 is sensible.
> But that's a discussion for another day =)

Duncan tells me there is a plan to put -mno-global-merge into module
flags for this precise reason, so this would disable it for LTO as
well, when -O3 wasn't specified.  This takes care of our non-O3
concerns;  I'll have a look!

-Ahmed
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AArch64: Should we disable GlobalMerge?

Jim Grosbach
In reply to this post by Ahmed Bougacha

> On Feb 27, 2015, at 2:15 PM, Ahmed Bougacha <[hidden email]> wrote:
>
> On Fri, Feb 27, 2015 at 2:01 PM, Renato Golin <[hidden email]> wrote:
>> On 27 February 2015 at 21:26, Ahmed Bougacha <[hidden email]> wrote:
>>> Which brings us to my fallback proposal:  what about disabling the
>>> pass on darwin only?
>>
>> That's a decision for Jim/Evan. I'm ok if they are.
>
> Jim, thoughts?

I would prefer Darwin not differ in this regard, but I don’t feel incredibly strongly about it. Just a general preference to keeping platform dependencies and differences to a minimum. Whatever y’all decide is fine with me.

>
>>
>>> As for other targets, as a first step, making the pass run under -O3
>>> rather than -O1 is hopefully agreeable to everyone?
>>
>> Sounds reasonable.
>
> Great!
>
>> Even though it conflicts with LTO, that's what O3 means, as you said,
>> instability. People at O3 might want to fiddle with the passes
>> (on/off) to get the best performance for their own code/workload.
>
> By the way, I'm not convinced LTO being either -O3 or -O0 is sensible.
> But that's a discussion for another day =)
>
> -Ahmed
>
>> cheers,
>> --renato


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AArch64: Should we disable GlobalMerge?

Quentin Colombet
In reply to this post by Ahmed Bougacha

> On Feb 27, 2015, at 2:03 PM, Ahmed Bougacha <[hidden email]> wrote:
>
> On Thu, Feb 26, 2015 at 4:09 AM, Renato Golin <[hidden email]> wrote:
>> On 26 February 2015 at 00:57, Ahmed Bougacha <[hidden email]> wrote:
>>> -- A way forward
>>> One obvious way to improve it is: look at uses of globals, and try to
>>> form sets of globals commonly used together.  The tricky part is to
>>> define heuristics for "commonly".  Also, the pass then becomes much
>>> more expensive.  I'm currently looking into improving it, and will
>>> report if I come up with a good solution.  But this shouldn't stop us
>>> from disabling it, for now.
>>
>> Hi Ahmed,
>>
>> Before "moving forward", it would be good to understand what in
>> GlobalMerge is impacting what in LTO.
>>
>> With LTO becoming more important nowadays, I agree we have to balance
>> the compiler optimisations to work well with it, but by turning things
>> off we might be impacting unknown code in an unknown way.
>>
>> We'll never know how unknown code behaves, but if at least we
>> understand what of GM affects what of LTO, then people using unknown
>> code will have a more informed view on what to disable, when.
>
> Fair enough.  First, a couple things to note:
> - GlobalMerge runs as a pre-ISel pass, so very late in the mid-level pipeline.

To be precise, GlobalMerge is registered as a pre-ISel pass, but still it runs very early in the pipeline, because all its work in done during doInitialization… Pretty broken, I know.

-Quentin

> - GlobalMerge (by default) only looks at internal globals.
>
> Internal globals come up with file- or function- static variables.  In
> LTO, all module-level globals are internalized, and are eligible for
> merging.
>
> So, we can generally group global usage into a few categories:
> - a function that uses a local static variable (say, llvm::outs())
> - a function that uses several globals at once.  For instance,
> 400.perlbench's interpreter has a bunch of those, as does its
> parser/lexer.
> - a set of functions that share a few common globals (say, an inlined
> reference to a function-local static variable), but otherwise each use
> several other globals (again, perl's interpreter).
>
>
> GlobalMerge is only ever a win if we are able to share base pointers.
> This requires:
> - several globals being referenced
> - the references being close enough (otherwise we'll just
> rematerialize the base, or worse, increase register pressure)
>
> There is one obvious special case for the first requirement:  if a
> global is only ever used alone, there's no point in merging it
> anywhere. (this is improvement #1).
> Once we can determine the set of used globals for each function, we
> can try to merge those sets only. (#2)
>
> We can try to better handle the second requirement, by having some
> more precise metric for distance between uses.  One trivially
> available such metric is grouping used sets by parent basic-block
> rather than function (#3).
>
>
>
> Experimentally, #1 catches a lot of the singleton-ish globals out
> there, which is the majority in some of the more "modern" code I've
> looked at.  It leaves the legitimate merging in perl alone.
>
> #2 (and even moreso #3) is actually too aggressive, and doesn't catch
> a lot/most of the profitable cases in perl.  Consider:
> - a "g_log" global (or, say, LLVM's outs/dbgs/errs), used pretty much everywhere
> - several sets of globals, used in different parts of the program
> (perl's interpreter vs parser)
>
> You'd pick one of the latter sets, and add the "g_log" global to it.
> Now you made it more expensive everywhere you use "g_log", without the
> benefit of base sharing in all the other functions.
>
> So you need to be smart when picking the sets.  You can combine some
> of them, using some cost metric.  (#4)  This is where it gets
> complicated.
>
>
> I'll try measuring some of those, see what happens on benchmarks.
> Again, that shouldn't stop us from enabling GlobalMerge less often.
> Hopefully it's clear that the pass isn't always a win, so -O3 should
> be OK.  I'm less comfortable with disabling it on Darwin only, but
> that seems like the obvious next step.
>
> Thanks for the feedback!
>
> -Ahmed
>
>> cheers,
>> --renato
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AArch64: Should we disable GlobalMerge?

Ahmed Bougacha
In reply to this post by Jim Grosbach
On Fri, Feb 27, 2015 at 3:06 PM, Jim Grosbach <[hidden email]> wrote:
> I would prefer Darwin not differ in this regard, but I don’t feel incredibly strongly about it. Just a general preference to keeping platform dependencies and differences to a minimum.

Same.

> Whatever y’all decide is fine with me.

We might not need this after all =)   With a module-level flag, we
could only enable it under -O3 even for LTO, which is fine no matter
the platform.

-Ahmed

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AArch64: Should we disable GlobalMerge?

Ahmed Bougacha
In reply to this post by Quentin Colombet
On Fri, Feb 27, 2015 at 3:13 PM, Quentin Colombet <[hidden email]> wrote:
> To be precise, GlobalMerge is registered as a pre-ISel pass, but still it runs very early in the pipeline, because all its work in done during doInitialization… Pretty broken, I know.

Oh god, I forgot about this... it actually runs pretty early,  not
sure when exactly..

-Ahmed

> -Quentin

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC] AArch64: Should we disable GlobalMerge?

Ahmed Bougacha
In reply to this post by Eric Christopher
On Fri, Feb 27, 2015 at 2:21 PM, Eric Christopher <[hidden email]> wrote:

>> > Before making the disabling darwin only I'd like to see some analysis of
>> > the
>> > regressions/improvements. Has anyone looked at the code for those yet?
>>
>> Yep, I put a quick analysis in my other reply.
>
>
> The LOH/ADRP bit?
>
>>
>>
>> >
>> >>
>> >> As for other targets, as a first step, making the pass run under -O3
>> >> rather than -O1 is hopefully agreeable to everyone?  After all, it is
>> >> "aggressive", and isn't always profitable.  That's pretty much the
>> >> description of -O3.
>> >> We can still run into problematic cases under LTO, though.
>> >>
>> >
>> > Seems reasonable to me, but probably want to see what happens with the
>> > above
>> > questions first.
>>
>> Fair enough.  Bottom line is:
>> - disabling it without LTO is a slight win on the test-suite, a solid
>> win everywhere else I've looked.
>> - disabling it with LTO regresses quite a few SPEC benchmarks, and is
>> overall a slight regression on the test-suite.
>>
>
> Ah, I meant an analysis of the code, not just the numbers. I think the
> ADRP/LOH commentary really helps. It might only be a decent LTOish
> optimization, but I'm still curious how it's helping there over other
> optimizations.

Basically - and I think this is what Renato asks as well - it doesn't
really interact with later optimizations.  Throughout most of the
backend, we keep global references (e.g., adrp+add) together, as a
pseudo instruction (MOVaddr, LOADgot, ...).  Very late we expand it to
adrp+add/....  So, the only thing that helps is the LOH linker
optimizations, which try to simplify some of the adrp sequences.
Really, the backend is oblivious to the fact that global references
aren't trivial.  We don't try to CSE the adrp's, for instance (I
believe there was a patch for that, Quentin and Jiangning might know
more).  Does that clarify a bit?


Looking at the code, you have two main problematic situations:
- the register pressure tradeoff:

Consider:

adrp x8, 133
ldr x8, [x8, #3568]
...
adrp x8, 133
ldr x0, [x8, #3576]

Turning into:

adrp x19, 133
add x19, x19, #3392
ldr x8, [x19, #192]
...
ldr x0, [x19, #200]


- an additional instruction when only one global from a merged set is
accessed (or when the LOH optimizations fired)

Consider the similar:

adrp x20, 133
ldr x8, [x20, #3432]
...
str x0, [x20, #3432]

Turning into:

adrp x20, 133
add x20, x20, #3392
ldr x8, [x20, #56]
...
str x0, [x20, #56]


One positive case is explained in the GlobalMerge.cpp comments:  it
reduces register pressure in a loop,  by using a single base register
for multiple globals.

Another positive is that merging globals effectively CSEs the base
address computation.

> Anyhow, FWIW I'm in favor of pulling it out of the non-LTO pipeline
> universally.

I tend to agree, but it's still sometimes useful in non-LTO.  One case
that came up in benchmarks was a bunch of file-static globals used
pervasively in a single file  (I believe lex/yacc can generate this
kind of thing).  There it's very beneficial, even without LTO.  Hence,
-O3 and -mno-global-merge, if necessary.

-Ahmed

> -eric
>
>>
>> -Ahmed
>>
>> > -eric
>> >
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
12