Best way to clean up empty global_ctors

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Best way to clean up empty global_ctors

Nico Weber-2
Hi,

I'd like to fix PR19590, which is about llvm.global_ctors containing
functions that end up being empty after optimization (which causes the
linker to add useless init_array entries to the output binary).
globalopt removes empty functions from llvm.global_ctors, but by the
time the function becomes empty globalopt has already run and it
doesn't run again.

I'm wondering what the best fix is:
1. Should globalopt run once more after all other passes have run?
2. Or should the llvm.global_ctors optimization code in globalopt be
moved into a helper function somewhere that's called from both
globalopt and a new optimization pass cdtoropt that does nothing but
remove empty functions from llvm.global_ctors and llvm.global_dtors?
Then only this new pass would be added after all other passes. (This
new pass should run very quickly, it doesn't have to do much.)

Thanks for any advice,
Nico
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Best way to clean up empty global_ctors

Reid Kleckner-2
Can we strengthen globalopt to symbolically execute at most one direct call to any given function, or is that crazy talk?


On Wed, Apr 30, 2014 at 4:48 PM, Nico Weber <[hidden email]> wrote:
Hi,

I'd like to fix PR19590, which is about llvm.global_ctors containing
functions that end up being empty after optimization (which causes the
linker to add useless init_array entries to the output binary).
globalopt removes empty functions from llvm.global_ctors, but by the
time the function becomes empty globalopt has already run and it
doesn't run again.

I'm wondering what the best fix is:
1. Should globalopt run once more after all other passes have run?
2. Or should the llvm.global_ctors optimization code in globalopt be
moved into a helper function somewhere that's called from both
globalopt and a new optimization pass cdtoropt that does nothing but
remove empty functions from llvm.global_ctors and llvm.global_dtors?
Then only this new pass would be added after all other passes. (This
new pass should run very quickly, it doesn't have to do much.)

Thanks for any advice,
Nico
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Best way to clean up empty global_ctors

Nico Weber-2
I'm not sure I understand what you mean. The unoptimized IR is

    @llvm.global_ctors = appending global [1 x { i32, void ()* }] [{
i32, void ()* } { i32 65535, void ()* @_GLOBAL__I_a }]

    define internal void @_GLOBAL__I_a() section
"__TEXT,__StaticInit,regular,pure_instructions" {
    entry:
      call void @__cxx_global_var_init()
      ret void
    }

    define internal void @__cxx_global_var_init() section
"__TEXT,__StaticInit,regular,pure_instructions" {
    entry:
      %0 = load i32* @_ZN3Bar18LINKER_INITIALIZEDE, align 4
      call void @_ZN3FooC1E17LinkerInitialized(%class.Foo* @foo, i32 %0)
      ret void
    }

    ; Function Attrs: ssp uwtable
    define linkonce_odr void
@_ZN3FooC1E17LinkerInitialized(%class.Foo* %this, i32) unnamed_addr #0
align 2 {
    entry:
      %this.addr = alloca %class.Foo*, align 8
      %.addr = alloca i32, align 4
      store %class.Foo* %this, %class.Foo** %this.addr, align 8
      store i32 %0, i32* %.addr, align 4
      %this1 = load %class.Foo** %this.addr
      %1 = load i32* %.addr, align 4
      call void @_ZN3FooC2E17LinkerInitialized(%class.Foo* %this1, i32 %1)
      ret void
    }

…which is quite a bit of stuff. But other passes successfully reduce this to

    @llvm.global_ctors = appending global [1 x { i32, void ()* }] [{
i32, void ()* } { i32 65535, void ()* @_GLOBAL__I_a }]

    ; Function Attrs: nounwind readnone
    define internal void @_GLOBAL__I_a() #1 section
"__TEXT,__StaticInit,regular,pure_instructions" {
    entry:
      ret void
    }

which is much nicer, and globalopt is able to reduce the latter bit to
an empty global_ctors list – the problem is that it runs before the IR
is optimized that much. Just doing globalopt once more (or doing just
the global_ctor bits of globalopt once more) is enough to fix the
problem. The existing machinery seems to do all that's needed, it just
needs to be used in a slightly different order as far as I understand.

On Wed, Apr 30, 2014 at 4:59 PM, Reid Kleckner <[hidden email]> wrote:

> Can we strengthen globalopt to symbolically execute at most one direct call
> to any given function, or is that crazy talk?
>
>
> On Wed, Apr 30, 2014 at 4:48 PM, Nico Weber <[hidden email]> wrote:
>>
>> Hi,
>>
>> I'd like to fix PR19590, which is about llvm.global_ctors containing
>> functions that end up being empty after optimization (which causes the
>> linker to add useless init_array entries to the output binary).
>> globalopt removes empty functions from llvm.global_ctors, but by the
>> time the function becomes empty globalopt has already run and it
>> doesn't run again.
>>
>> I'm wondering what the best fix is:
>> 1. Should globalopt run once more after all other passes have run?
>> 2. Or should the llvm.global_ctors optimization code in globalopt be
>> moved into a helper function somewhere that's called from both
>> globalopt and a new optimization pass cdtoropt that does nothing but
>> remove empty functions from llvm.global_ctors and llvm.global_dtors?
>> Then only this new pass would be added after all other passes. (This
>> new pass should run very quickly, it doesn't have to do much.)
>>
>> Thanks for any advice,
>> Nico
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Best way to clean up empty global_ctors

Rafael Espíndola
In reply to this post by Nico Weber-2
On 30 April 2014 19:48, Nico Weber <[hidden email]> wrote:

> Hi,
>
> I'd like to fix PR19590, which is about llvm.global_ctors containing
> functions that end up being empty after optimization (which causes the
> linker to add useless init_array entries to the output binary).
> globalopt removes empty functions from llvm.global_ctors, but by the
> time the function becomes empty globalopt has already run and it
> doesn't run again.
>
> I'm wondering what the best fix is:
> 1. Should globalopt run once more after all other passes have run?
> 2. Or should the llvm.global_ctors optimization code in globalopt be
> moved into a helper function somewhere that's called from both
> globalopt and a new optimization pass cdtoropt that does nothing but
> remove empty functions from llvm.global_ctors and llvm.global_dtors?
> Then only this new pass would be added after all other passes. (This
> new pass should run very quickly, it doesn't have to do much.)

How late do you need to move globalopt to get this to work? Do you get
any performance regressions if you just move it there?

Cheers,
Rafael
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Best way to clean up empty global_ctors

Reid Kleckner-2
I talked about this with Nick in person months ago, and my understanding is that GlobalOpt is also an enabling optimization that needs to run early.  For example, if we can eliminate an initializer to an internal global with no other stores to it, we can propagate the result.

Maybe we should run it twice.


On Thu, May 1, 2014 at 11:01 AM, Rafael Espíndola <[hidden email]> wrote:
On 30 April 2014 19:48, Nico Weber <[hidden email]> wrote:
> Hi,
>
> I'd like to fix PR19590, which is about llvm.global_ctors containing
> functions that end up being empty after optimization (which causes the
> linker to add useless init_array entries to the output binary).
> globalopt removes empty functions from llvm.global_ctors, but by the
> time the function becomes empty globalopt has already run and it
> doesn't run again.
>
> I'm wondering what the best fix is:
> 1. Should globalopt run once more after all other passes have run?
> 2. Or should the llvm.global_ctors optimization code in globalopt be
> moved into a helper function somewhere that's called from both
> globalopt and a new optimization pass cdtoropt that does nothing but
> remove empty functions from llvm.global_ctors and llvm.global_dtors?
> Then only this new pass would be added after all other passes. (This
> new pass should run very quickly, it doesn't have to do much.)

How late do you need to move globalopt to get this to work? Do you get
any performance regressions if you just move it there?

Cheers,
Rafael
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Best way to clean up empty global_ctors

Rafael Espíndola
On 1 May 2014 14:08, Reid Kleckner <[hidden email]> wrote:
> I talked about this with Nick in person months ago, and my understanding is
> that GlobalOpt is also an enabling optimization that needs to run early.
> For example, if we can eliminate an initializer to an internal global with
> no other stores to it, we can propagate the result.
>
> Maybe we should run it twice.

That is probably fine, we just have to make sure compile time doesn't
surfer too much.

Cheers,
Rafael
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Best way to clean up empty global_ctors

Nico Weber-2
On Thu, May 1, 2014 at 11:12 AM, Rafael Espíndola
<[hidden email]> wrote:

> On 1 May 2014 14:08, Reid Kleckner <[hidden email]> wrote:
>> I talked about this with Nick in person months ago, and my understanding is
>> that GlobalOpt is also an enabling optimization that needs to run early.
>> For example, if we can eliminate an initializer to an internal global with
>> no other stores to it, we can propagate the result.
>>
>> Maybe we should run it twice.
>
> That is probably fine, we just have to make sure compile time doesn't
> surfer too much.

Looking at GlobalOpt, it seems to do full symbolic evaluation of all
constructors, which is more than what's needed here. Just walking
llvm.global_ctors and removing calls to functions that are just "ret"
is all that's needed. Unless folks think that's a terrible idea, I'll
try to write a lightweight pass for doing just that.

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Best way to clean up empty global_ctors

Rafael Espíndola
> Looking at GlobalOpt, it seems to do full symbolic evaluation of all
> constructors, which is more than what's needed here. Just walking
> llvm.global_ctors and removing calls to functions that are just "ret"
> is all that's needed. Unless folks think that's a terrible idea, I'll
> try to write a lightweight pass for doing just that.

Not a bad idea, we just normally avoid small passes like that. Looking
at -O2, looks like we run -globaldce really late, maybe this could be
done in it?

Cheers,
Rafael
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Best way to clean up empty global_ctors

Nico Weber-2
On Thu, May 1, 2014 at 11:19 AM, Rafael Espíndola
<[hidden email]> wrote:
>> Looking at GlobalOpt, it seems to do full symbolic evaluation of all
>> constructors, which is more than what's needed here. Just walking
>> llvm.global_ctors and removing calls to functions that are just "ret"
>> is all that's needed. Unless folks think that's a terrible idea, I'll
>> try to write a lightweight pass for doing just that.
>
> Not a bad idea, we just normally avoid small passes like that.

That's good to know, thanks. Is there a some kind of 'philosophical
overview' document for llvm's pass design somewhere that explains
things like this?

> Looking
> at -O2, looks like we run -globaldce really late, maybe this could be
> done in it?

I'll take a look, thanks!

>
> Cheers,
> Rafael

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Best way to clean up empty global_ctors

Nick Lewycky-2
In reply to this post by Rafael Espíndola
On 1 May 2014 11:12, Rafael Espíndola <[hidden email]> wrote:
On 1 May 2014 14:08, Reid Kleckner <[hidden email]> wrote:
> I talked about this with Nick in person months ago, and my understanding is
> that GlobalOpt is also an enabling optimization that needs to run early.
> For example, if we can eliminate an initializer to an internal global with
> no other stores to it, we can propagate the result.
>
> Maybe we should run it twice.

That is probably fine, we just have to make sure compile time doesn't
surfer too much.

There are many cases where another run of globalopt at the end would clean things up.

I have a few concerns. One is that globalopt can miscompile, and I'm afraid of making it do more or learn new tricks because that may expose its miscompiley underbelly. The second is that a late run of globalopt opens up all manner of new opportunities for the rest of the scalar optimizers, which we don't run again. That's a sign that we should be designing it differently, I just haven't thought about how exactly.

Teaching -globaldce about the ctors and dtors list sounds like a great plan to me.

Nick

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Best way to clean up empty global_ctors

Richard Smith-33
On Thu, May 1, 2014 at 1:01 PM, Nick Lewycky <[hidden email]> wrote:
On 1 May 2014 11:12, Rafael Espíndola <[hidden email]> wrote:
On 1 May 2014 14:08, Reid Kleckner <[hidden email]> wrote:
> I talked about this with Nick in person months ago, and my understanding is
> that GlobalOpt is also an enabling optimization that needs to run early.
> For example, if we can eliminate an initializer to an internal global with
> no other stores to it, we can propagate the result.
>
> Maybe we should run it twice.

That is probably fine, we just have to make sure compile time doesn't
surfer too much.

There are many cases where another run of globalopt at the end would clean things up.

I have a few concerns. One is that globalopt can miscompile, and I'm afraid of making it do more or learn new tricks because that may expose its miscompiley underbelly. The second is that a late run of globalopt opens up all manner of new opportunities for the rest of the scalar optimizers, which we don't run again. That's a sign that we should be designing it differently, I just haven't thought about how exactly.

OK, you've said you don't want to teach globalopt any new tricks, so I'm going to suggest teaching it a new trick ;)

If globalopt sees the load of a variable but doesn't have a definition of that variable, could it succeed and track that it has an unusable value (but fail if that value is actually used anywhere)? This is perhaps just addressing the symptom rather than the root cause, but it'd work for this case (where the only reason globalopt can't symbolically execute the function is because it loads and ignores an undefined global).

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev