Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
33 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)

Teresa Johnson
On Fri, May 15, 2015 at 11:20 PM, Teresa Johnson <[hidden email]> wrote:

> On Fri, May 15, 2015 at 2:52 PM, David Blaikie <[hidden email]> wrote:
>>
>>
>> On Fri, May 15, 2015 at 1:15 PM, Teresa Johnson <[hidden email]>
>> wrote:
>>>
>>> On Fri, May 15, 2015 at 10:04 AM, David Blaikie <[hidden email]>
>>> wrote:
>>> >
>>> >
>>> > On Fri, May 15, 2015 at 9:53 AM, Teresa Johnson <[hidden email]>
>>> > wrote:
>>> >>
>>> >> Looking back through my GlobalDCE changes, it looks like one of the
>>> >> places I had changed (where we mark defined globals in runOnModule)
>>> >> already has a guard for !hasAvailableExternallyLinkage and
>>> >> !isDiscardableIfUnused, so my additional guard against marking
>>> >> imported functions is unnecessary. But the other place I had to change
>>> >> was in GlobalIsNeeded where it walks through the function and
>>> >> recursively marks any referenced global as needed. Here there was no
>>> >> guard against marking a global that is available externally as needed
>>> >> if it is referenced. I had added a check here to not mark imported
>>> >> functions as needed on reference unless they were discardable (i.e.
>>> >> link once). Is this a bug - should this have a guard against marking
>>> >> available externally function refs as needed?
>>> >
>>> >
>>> > Duncan's probably got a better idea of what the right answer is here. I
>>> > suspect "yes".
>>> >
>>> > The trick with available_externally is to ensure we keep these around
>>> > for
>>> > long enough that their definitions are useful (for inlining, constant
>>> > prop,
>>> > all that good stuff) but remove them before we actually do too much work
>>> > on
>>> > them, or at least before we emit them into the final object file.
>>>
>>> Yep, and that is exactly how long I want my imported functions to
>>> stick around. Currently I have the ThinLTO import pass added at the
>>> start of addLTOOptimizationPasses, just after the AA passes. So the
>>> imported functions stick around through global opt, constant merge,
>>> combining, and inlining. The call to GlobalDCE is just after inlining
>>> and a second globalopt.
>>>
>>> >
>>> > I imagine if GlobalDCE isn't removing available_externally functions
>>> > it's
>>> > because they're still useful in the optimization pipeline and something
>>> > further down the pipe removes them (because they do, ultimately, get
>>> > removed).
>>>
>>> That would be good to know, since if they are useful later on in the
>>> pipe then presumably my imported functions could be as well. But I
>>> wonder who is removing them since GlobalDCE isn't (if there are refs)?
>>
>>
>> Yep - seems no one removes them (well, probably if there are no calls to the
>> function at all they get removed as usual (eg: if all the calls are inlined
>> we probably drop the function as we would for a linkonce_odr function)). If
>> they're still around come codegen time, we just skip them:
>>
>> lib/CodeGen/MachineFunctionPass.cpp, MachineFunctionPass::runOnFunction:
>>
>> 33| bool MachineFunctionPass::runOnFunction(Function &F) {
>> 34|   // Do not codegen any 'available_externally' functions at all, they
>> have
>> 35|   // definitions outside the translation unit.
>> 36|   if (F.hasAvailableExternallyLinkage())
>> 37|     return false;
>> 38|
>> 39|   MachineFunction &MF = getAnalysis<MachineFunctionAnalysis>().getMF();
>> 40|   return runOnMachineFunction(MF);
>> 41| }
>
> Ok, thanks for digging this up. I guess that would work for the
> imported functions as well, but it seems like a waste of compile time
> if they are not useful after inlining/global opt. Is there any reason
> to keep them past GlobalDCE or should I try changing GlobalDCE so that
> it removes them?
>

I found allowing GlobalDCE to remove referenced
AvailableExternallyLinkage values earlier passes all the clang/llvm
tests (see patch below). AFAICT, it shouldn't be necessary to keep
these functions around past the first call to GlobalDCE (which is
after inlining), but let me know if I missed something. Although they
will be removed eventually by MachineFunctionPass as David pointed
out, so this is more of a compile-time improvement than anything. It
will affect ThinLTO more significantly in compile time because there
will be more functions with this linkage type, depending on how many
static functions are imported that we have to promote.

Also, what about available_externally variable
definitions/initializations? I'm not sure what source construct
currently can generate those, but are these suppressed/turned into
declarations at some point? Note that ThinLTO imported file static
variables always need to be promoted, and will be marked
available_external.

Index: lib/Transforms/IPO/GlobalDCE.cpp
===================================================================
--- lib/Transforms/IPO/GlobalDCE.cpp (revision 237590)
+++ lib/Transforms/IPO/GlobalDCE.cpp (working copy)
@@ -162,7 +162,10 @@ bool GlobalDCE::runOnModule(Module &M) {
     // themselves.
     for (unsigned i = 0, e = DeadFunctions.size(); i != e; ++i) {
       RemoveUnusedGlobalValue(*DeadFunctions[i]);
-      M.getFunctionList().erase(DeadFunctions[i]);
+      // Might have deleted the body of an available externally function that
+      // is still referenced. Leave the declaration.
+      if (DeadFunctions[i]->use_empty())
+        M.getFunctionList().erase(DeadFunctions[i]);
     }
     NumFunctions += DeadFunctions.size();
     Changed = true;
@@ -171,7 +174,10 @@ bool GlobalDCE::runOnModule(Module &M) {
   if (!DeadGlobalVars.empty()) {
     for (unsigned i = 0, e = DeadGlobalVars.size(); i != e; ++i) {
       RemoveUnusedGlobalValue(*DeadGlobalVars[i]);
-      M.getGlobalList().erase(DeadGlobalVars[i]);
+      // Might have deleted the definition of an available externally function
+      // that is still referenced. Leave the declaration.
+      if (DeadGlobalVars[i]->use_empty())
+        M.getGlobalList().erase(DeadGlobalVars[i]);
     }
     NumVariables += DeadGlobalVars.size();
     Changed = true;
@@ -231,8 +237,10 @@ void GlobalDCE::GlobalIsNeeded(GlobalValue *G) {
     for (Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB)
       for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I)
         for (User::op_iterator U = I->op_begin(), E = I->op_end(); U != E; ++U)
-          if (GlobalValue *GV = dyn_cast<GlobalValue>(*U))
-            GlobalIsNeeded(GV);
+          if (GlobalValue *GV = dyn_cast<GlobalValue>(*U)) {
+            if (!GV->hasAvailableExternallyLinkage())
+              GlobalIsNeeded(GV);
+          }
           else if (Constant *C = dyn_cast<Constant>(*U))
             MarkUsedGlobalsAsNeeded(C);
   }



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)

David Blaikie


On Mon, May 18, 2015 at 9:09 PM, Teresa Johnson <[hidden email]> wrote:
On Fri, May 15, 2015 at 11:20 PM, Teresa Johnson <[hidden email]> wrote:
> On Fri, May 15, 2015 at 2:52 PM, David Blaikie <[hidden email]> wrote:
>>
>>
>> On Fri, May 15, 2015 at 1:15 PM, Teresa Johnson <[hidden email]>
>> wrote:
>>>
>>> On Fri, May 15, 2015 at 10:04 AM, David Blaikie <[hidden email]>
>>> wrote:
>>> >
>>> >
>>> > On Fri, May 15, 2015 at 9:53 AM, Teresa Johnson <[hidden email]>
>>> > wrote:
>>> >>
>>> >> Looking back through my GlobalDCE changes, it looks like one of the
>>> >> places I had changed (where we mark defined globals in runOnModule)
>>> >> already has a guard for !hasAvailableExternallyLinkage and
>>> >> !isDiscardableIfUnused, so my additional guard against marking
>>> >> imported functions is unnecessary. But the other place I had to change
>>> >> was in GlobalIsNeeded where it walks through the function and
>>> >> recursively marks any referenced global as needed. Here there was no
>>> >> guard against marking a global that is available externally as needed
>>> >> if it is referenced. I had added a check here to not mark imported
>>> >> functions as needed on reference unless they were discardable (i.e.
>>> >> link once). Is this a bug - should this have a guard against marking
>>> >> available externally function refs as needed?
>>> >
>>> >
>>> > Duncan's probably got a better idea of what the right answer is here. I
>>> > suspect "yes".
>>> >
>>> > The trick with available_externally is to ensure we keep these around
>>> > for
>>> > long enough that their definitions are useful (for inlining, constant
>>> > prop,
>>> > all that good stuff) but remove them before we actually do too much work
>>> > on
>>> > them, or at least before we emit them into the final object file.
>>>
>>> Yep, and that is exactly how long I want my imported functions to
>>> stick around. Currently I have the ThinLTO import pass added at the
>>> start of addLTOOptimizationPasses, just after the AA passes. So the
>>> imported functions stick around through global opt, constant merge,
>>> combining, and inlining. The call to GlobalDCE is just after inlining
>>> and a second globalopt.
>>>
>>> >
>>> > I imagine if GlobalDCE isn't removing available_externally functions
>>> > it's
>>> > because they're still useful in the optimization pipeline and something
>>> > further down the pipe removes them (because they do, ultimately, get
>>> > removed).
>>>
>>> That would be good to know, since if they are useful later on in the
>>> pipe then presumably my imported functions could be as well. But I
>>> wonder who is removing them since GlobalDCE isn't (if there are refs)?
>>
>>
>> Yep - seems no one removes them (well, probably if there are no calls to the
>> function at all they get removed as usual (eg: if all the calls are inlined
>> we probably drop the function as we would for a linkonce_odr function)). If
>> they're still around come codegen time, we just skip them:
>>
>> lib/CodeGen/MachineFunctionPass.cpp, MachineFunctionPass::runOnFunction:
>>
>> 33| bool MachineFunctionPass::runOnFunction(Function &F) {
>> 34|   // Do not codegen any 'available_externally' functions at all, they
>> have
>> 35|   // definitions outside the translation unit.
>> 36|   if (F.hasAvailableExternallyLinkage())
>> 37|     return false;
>> 38|
>> 39|   MachineFunction &MF = getAnalysis<MachineFunctionAnalysis>().getMF();
>> 40|   return runOnMachineFunction(MF);
>> 41| }
>
> Ok, thanks for digging this up. I guess that would work for the
> imported functions as well, but it seems like a waste of compile time
> if they are not useful after inlining/global opt. Is there any reason
> to keep them past GlobalDCE or should I try changing GlobalDCE so that
> it removes them?
>

I found allowing GlobalDCE to remove referenced
AvailableExternallyLinkage values earlier passes all the clang/llvm
tests (see patch below).

The LLVM regression tests might not catch the sort of regression this change could cause. Usually we test each pass in isolation and tend towards white box testing (so we test the cases that are "interesting" according to the algorithm) - in this case, you're adding a condition (adding an "interesting" case that wasn't interesting before - so didn't need to be tested explicitly) that changes the behavior of GlobalDCE. This behavior may've been depended upon by further down-stream optimizations for overall performance.
 
AFAICT, it shouldn't be necessary to keep
these functions around past the first call to GlobalDCE (which is
after inlining), but let me know if I missed something.

It's possible they're still useful for optimizations other than inlining. We can still do constant propagation through such functions (hey, look, this function always returns 3, etc).

It might be necessary/worth actually running perf numbers on such a change (or manually (possibly by asking those who know - not me, unfortunately) checking the passes that come after GlobalDCE to see if any would do cross function constant prop, etc. Perhaps someone else knows this off-the-cuff.
 
Although they
will be removed eventually by MachineFunctionPass as David pointed
out,

Interestingly they're not exactly /removed/, they're just not emitted. I don't know whether that's important.
 
so this is more of a compile-time improvement than anything. It
will affect ThinLTO more significantly in compile time because there
will be more functions with this linkage type, depending on how many
static functions

Hmm - why static functions in particular? I imagine this attribute would be used for non-static functions too in ThinLTO.
 
are imported that we have to promote. 

Also, what about available_externally variable
definitions/initializations? I'm not sure what source construct
currently can generate those,

I believe we emit C++ vtables as available_externally when the type has a key function. This allows devirtualization opportunities without bloating object file size with duplicate vtable data.
 
but are these suppressed/turned into
declarations at some point?

Maybe? Again, I would guess (and you can check some of this with -print-after-all to see how the IR changes as it goes through the optimizations) they might just be omitted from the generated code, not necessarily removed from the IR at any point.
 
Note that ThinLTO imported file static
variables always need to be promoted, and will be marked
available_external.

Index: lib/Transforms/IPO/GlobalDCE.cpp
===================================================================
--- lib/Transforms/IPO/GlobalDCE.cpp (revision 237590)
+++ lib/Transforms/IPO/GlobalDCE.cpp (working copy)
@@ -162,7 +162,10 @@ bool GlobalDCE::runOnModule(Module &M) {
     // themselves.
     for (unsigned i = 0, e = DeadFunctions.size(); i != e; ++i) {
       RemoveUnusedGlobalValue(*DeadFunctions[i]);
-      M.getFunctionList().erase(DeadFunctions[i]);
+      // Might have deleted the body of an available externally function that
+      // is still referenced. Leave the declaration.
+      if (DeadFunctions[i]->use_empty())
+        M.getFunctionList().erase(DeadFunctions[i]);
     }
     NumFunctions += DeadFunctions.size();
     Changed = true;
@@ -171,7 +174,10 @@ bool GlobalDCE::runOnModule(Module &M) {
   if (!DeadGlobalVars.empty()) {
     for (unsigned i = 0, e = DeadGlobalVars.size(); i != e; ++i) {
       RemoveUnusedGlobalValue(*DeadGlobalVars[i]);
-      M.getGlobalList().erase(DeadGlobalVars[i]);
+      // Might have deleted the definition of an available externally function
+      // that is still referenced. Leave the declaration.
+      if (DeadGlobalVars[i]->use_empty())
+        M.getGlobalList().erase(DeadGlobalVars[i]);
     }
     NumVariables += DeadGlobalVars.size();
     Changed = true;
@@ -231,8 +237,10 @@ void GlobalDCE::GlobalIsNeeded(GlobalValue *G) {
     for (Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB)
       for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I)
         for (User::op_iterator U = I->op_begin(), E = I->op_end(); U != E; ++U)
-          if (GlobalValue *GV = dyn_cast<GlobalValue>(*U))
-            GlobalIsNeeded(GV);
+          if (GlobalValue *GV = dyn_cast<GlobalValue>(*U)) {
+            if (!GV->hasAvailableExternallyLinkage())
+              GlobalIsNeeded(GV);
+          }
           else if (Constant *C = dyn_cast<Constant>(*U))
             MarkUsedGlobalsAsNeeded(C);
   }



--
Teresa Johnson | Software Engineer | [hidden email] | <a href="tel:408-460-2413" value="+14084602413" target="_blank">408-460-2413


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)

Teresa Johnson
On Tue, May 19, 2015 at 12:41 PM, David Blaikie <[hidden email]> wrote:

>
>
> On Mon, May 18, 2015 at 9:09 PM, Teresa Johnson <[hidden email]>
> wrote:
>>
>> On Fri, May 15, 2015 at 11:20 PM, Teresa Johnson <[hidden email]>
>> wrote:
>> > On Fri, May 15, 2015 at 2:52 PM, David Blaikie <[hidden email]>
>> > wrote:
>> >>
>> >>
>> >> On Fri, May 15, 2015 at 1:15 PM, Teresa Johnson <[hidden email]>
>> >> wrote:
>> >>>
>> >>> On Fri, May 15, 2015 at 10:04 AM, David Blaikie <[hidden email]>
>> >>> wrote:
>> >>> >
>> >>> >
>> >>> > On Fri, May 15, 2015 at 9:53 AM, Teresa Johnson
>> >>> > <[hidden email]>
>> >>> > wrote:
>> >>> >>
>> >>> >> Looking back through my GlobalDCE changes, it looks like one of the
>> >>> >> places I had changed (where we mark defined globals in runOnModule)
>> >>> >> already has a guard for !hasAvailableExternallyLinkage and
>> >>> >> !isDiscardableIfUnused, so my additional guard against marking
>> >>> >> imported functions is unnecessary. But the other place I had to
>> >>> >> change
>> >>> >> was in GlobalIsNeeded where it walks through the function and
>> >>> >> recursively marks any referenced global as needed. Here there was
>> >>> >> no
>> >>> >> guard against marking a global that is available externally as
>> >>> >> needed
>> >>> >> if it is referenced. I had added a check here to not mark imported
>> >>> >> functions as needed on reference unless they were discardable (i.e.
>> >>> >> link once). Is this a bug - should this have a guard against
>> >>> >> marking
>> >>> >> available externally function refs as needed?
>> >>> >
>> >>> >
>> >>> > Duncan's probably got a better idea of what the right answer is
>> >>> > here. I
>> >>> > suspect "yes".
>> >>> >
>> >>> > The trick with available_externally is to ensure we keep these
>> >>> > around
>> >>> > for
>> >>> > long enough that their definitions are useful (for inlining,
>> >>> > constant
>> >>> > prop,
>> >>> > all that good stuff) but remove them before we actually do too much
>> >>> > work
>> >>> > on
>> >>> > them, or at least before we emit them into the final object file.
>> >>>
>> >>> Yep, and that is exactly how long I want my imported functions to
>> >>> stick around. Currently I have the ThinLTO import pass added at the
>> >>> start of addLTOOptimizationPasses, just after the AA passes. So the
>> >>> imported functions stick around through global opt, constant merge,
>> >>> combining, and inlining. The call to GlobalDCE is just after inlining
>> >>> and a second globalopt.
>> >>>
>> >>> >
>> >>> > I imagine if GlobalDCE isn't removing available_externally functions
>> >>> > it's
>> >>> > because they're still useful in the optimization pipeline and
>> >>> > something
>> >>> > further down the pipe removes them (because they do, ultimately, get
>> >>> > removed).
>> >>>
>> >>> That would be good to know, since if they are useful later on in the
>> >>> pipe then presumably my imported functions could be as well. But I
>> >>> wonder who is removing them since GlobalDCE isn't (if there are refs)?
>> >>
>> >>
>> >> Yep - seems no one removes them (well, probably if there are no calls
>> >> to the
>> >> function at all they get removed as usual (eg: if all the calls are
>> >> inlined
>> >> we probably drop the function as we would for a linkonce_odr
>> >> function)). If
>> >> they're still around come codegen time, we just skip them:
>> >>
>> >> lib/CodeGen/MachineFunctionPass.cpp,
>> >> MachineFunctionPass::runOnFunction:
>> >>
>> >> 33| bool MachineFunctionPass::runOnFunction(Function &F) {
>> >> 34|   // Do not codegen any 'available_externally' functions at all,
>> >> they
>> >> have
>> >> 35|   // definitions outside the translation unit.
>> >> 36|   if (F.hasAvailableExternallyLinkage())
>> >> 37|     return false;
>> >> 38|
>> >> 39|   MachineFunction &MF =
>> >> getAnalysis<MachineFunctionAnalysis>().getMF();
>> >> 40|   return runOnMachineFunction(MF);
>> >> 41| }
>> >
>> > Ok, thanks for digging this up. I guess that would work for the
>> > imported functions as well, but it seems like a waste of compile time
>> > if they are not useful after inlining/global opt. Is there any reason
>> > to keep them past GlobalDCE or should I try changing GlobalDCE so that
>> > it removes them?
>> >
>>
>> I found allowing GlobalDCE to remove referenced
>> AvailableExternallyLinkage values earlier passes all the clang/llvm
>> tests (see patch below).
>
>
> The LLVM regression tests might not catch the sort of regression this change
> could cause. Usually we test each pass in isolation and tend towards white
> box testing (so we test the cases that are "interesting" according to the
> algorithm) - in this case, you're adding a condition (adding an
> "interesting" case that wasn't interesting before - so didn't need to be
> tested explicitly) that changes the behavior of GlobalDCE. This behavior
> may've been depended upon by further down-stream optimizations for overall
> performance.
>
>>
>> AFAICT, it shouldn't be necessary to keep
>> these functions around past the first call to GlobalDCE (which is
>> after inlining), but let me know if I missed something.
>
>
> It's possible they're still useful for optimizations other than inlining. We
> can still do constant propagation through such functions (hey, look, this
> function always returns 3, etc).

I thought this was done by passes such as IPSCCPPass and GlobalOpt
which happen earlier.

>
> It might be necessary/worth actually running perf numbers on such a change
> (or manually (possibly by asking those who know - not me, unfortunately)
> checking the passes that come after GlobalDCE to see if any would do cross
> function constant prop, etc. Perhaps someone else knows this off-the-cuff.

I can collect some perf data.

>
>>
>> Although they
>> will be removed eventually by MachineFunctionPass as David pointed
>> out,
>
>
> Interestingly they're not exactly /removed/, they're just not emitted. I
> don't know whether that's important.

Yes, true, suppressed not removed.

>
>>
>> so this is more of a compile-time improvement than anything. It
>> will affect ThinLTO more significantly in compile time because there
>> will be more functions with this linkage type, depending on how many
>> static functions
>
>
> Hmm - why static functions in particular? I imagine this attribute would be
> used for non-static functions too in ThinLTO.

Arg, yes, I made the same mistake in some earlier email - all "global"
functions (originally so and those static promoted) will be
available_externally. So this is a significant amount for ThinLTO.

>
>>
>> are imported that we have to promote.
>>
>>
>> Also, what about available_externally variable
>> definitions/initializations? I'm not sure what source construct
>> currently can generate those,
>
>
> I believe we emit C++ vtables as available_externally when the type has a
> key function. This allows devirtualization opportunities without bloating
> object file size with duplicate vtable data.
>
>>
>> but are these suppressed/turned into
>> declarations at some point?
>
>
> Maybe? Again, I would guess (and you can check some of this with
> -print-after-all to see how the IR changes as it goes through the
> optimizations) they might just be omitted from the generated code, not
> necessarily removed from the IR at any point.

Ok, I will investigate.

Thanks,
Teresa

>
>>
>> Note that ThinLTO imported file static
>> variables always need to be promoted, and will be marked
>> available_external.
>>
>> Index: lib/Transforms/IPO/GlobalDCE.cpp
>> ===================================================================
>> --- lib/Transforms/IPO/GlobalDCE.cpp (revision 237590)
>> +++ lib/Transforms/IPO/GlobalDCE.cpp (working copy)
>> @@ -162,7 +162,10 @@ bool GlobalDCE::runOnModule(Module &M) {
>>      // themselves.
>>      for (unsigned i = 0, e = DeadFunctions.size(); i != e; ++i) {
>>        RemoveUnusedGlobalValue(*DeadFunctions[i]);
>> -      M.getFunctionList().erase(DeadFunctions[i]);
>> +      // Might have deleted the body of an available externally function
>> that
>> +      // is still referenced. Leave the declaration.
>> +      if (DeadFunctions[i]->use_empty())
>> +        M.getFunctionList().erase(DeadFunctions[i]);
>>      }
>>      NumFunctions += DeadFunctions.size();
>>      Changed = true;
>> @@ -171,7 +174,10 @@ bool GlobalDCE::runOnModule(Module &M) {
>>    if (!DeadGlobalVars.empty()) {
>>      for (unsigned i = 0, e = DeadGlobalVars.size(); i != e; ++i) {
>>        RemoveUnusedGlobalValue(*DeadGlobalVars[i]);
>> -      M.getGlobalList().erase(DeadGlobalVars[i]);
>> +      // Might have deleted the definition of an available externally
>> function
>> +      // that is still referenced. Leave the declaration.
>> +      if (DeadGlobalVars[i]->use_empty())
>> +        M.getGlobalList().erase(DeadGlobalVars[i]);
>>      }
>>      NumVariables += DeadGlobalVars.size();
>>      Changed = true;
>> @@ -231,8 +237,10 @@ void GlobalDCE::GlobalIsNeeded(GlobalValue *G) {
>>      for (Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB)
>>        for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E;
>> ++I)
>>          for (User::op_iterator U = I->op_begin(), E = I->op_end(); U !=
>> E; ++U)
>> -          if (GlobalValue *GV = dyn_cast<GlobalValue>(*U))
>> -            GlobalIsNeeded(GV);
>> +          if (GlobalValue *GV = dyn_cast<GlobalValue>(*U)) {
>> +            if (!GV->hasAvailableExternallyLinkage())
>> +              GlobalIsNeeded(GV);
>> +          }
>>            else if (Constant *C = dyn_cast<Constant>(*U))
>>              MarkUsedGlobalsAsNeeded(C);
>>    }
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413
>
>



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)

David Blaikie


On Tue, May 19, 2015 at 1:05 PM, Teresa Johnson <[hidden email]> wrote:
On Tue, May 19, 2015 at 12:41 PM, David Blaikie <[hidden email]> wrote:
>
>
> On Mon, May 18, 2015 at 9:09 PM, Teresa Johnson <[hidden email]>
> wrote:
>>
>> On Fri, May 15, 2015 at 11:20 PM, Teresa Johnson <[hidden email]>
>> wrote:
>> > On Fri, May 15, 2015 at 2:52 PM, David Blaikie <[hidden email]>
>> > wrote:
>> >>
>> >>
>> >> On Fri, May 15, 2015 at 1:15 PM, Teresa Johnson <[hidden email]>
>> >> wrote:
>> >>>
>> >>> On Fri, May 15, 2015 at 10:04 AM, David Blaikie <[hidden email]>
>> >>> wrote:
>> >>> >
>> >>> >
>> >>> > On Fri, May 15, 2015 at 9:53 AM, Teresa Johnson
>> >>> > <[hidden email]>
>> >>> > wrote:
>> >>> >>
>> >>> >> Looking back through my GlobalDCE changes, it looks like one of the
>> >>> >> places I had changed (where we mark defined globals in runOnModule)
>> >>> >> already has a guard for !hasAvailableExternallyLinkage and
>> >>> >> !isDiscardableIfUnused, so my additional guard against marking
>> >>> >> imported functions is unnecessary. But the other place I had to
>> >>> >> change
>> >>> >> was in GlobalIsNeeded where it walks through the function and
>> >>> >> recursively marks any referenced global as needed. Here there was
>> >>> >> no
>> >>> >> guard against marking a global that is available externally as
>> >>> >> needed
>> >>> >> if it is referenced. I had added a check here to not mark imported
>> >>> >> functions as needed on reference unless they were discardable (i.e.
>> >>> >> link once). Is this a bug - should this have a guard against
>> >>> >> marking
>> >>> >> available externally function refs as needed?
>> >>> >
>> >>> >
>> >>> > Duncan's probably got a better idea of what the right answer is
>> >>> > here. I
>> >>> > suspect "yes".
>> >>> >
>> >>> > The trick with available_externally is to ensure we keep these
>> >>> > around
>> >>> > for
>> >>> > long enough that their definitions are useful (for inlining,
>> >>> > constant
>> >>> > prop,
>> >>> > all that good stuff) but remove them before we actually do too much
>> >>> > work
>> >>> > on
>> >>> > them, or at least before we emit them into the final object file.
>> >>>
>> >>> Yep, and that is exactly how long I want my imported functions to
>> >>> stick around. Currently I have the ThinLTO import pass added at the
>> >>> start of addLTOOptimizationPasses, just after the AA passes. So the
>> >>> imported functions stick around through global opt, constant merge,
>> >>> combining, and inlining. The call to GlobalDCE is just after inlining
>> >>> and a second globalopt.
>> >>>
>> >>> >
>> >>> > I imagine if GlobalDCE isn't removing available_externally functions
>> >>> > it's
>> >>> > because they're still useful in the optimization pipeline and
>> >>> > something
>> >>> > further down the pipe removes them (because they do, ultimately, get
>> >>> > removed).
>> >>>
>> >>> That would be good to know, since if they are useful later on in the
>> >>> pipe then presumably my imported functions could be as well. But I
>> >>> wonder who is removing them since GlobalDCE isn't (if there are refs)?
>> >>
>> >>
>> >> Yep - seems no one removes them (well, probably if there are no calls
>> >> to the
>> >> function at all they get removed as usual (eg: if all the calls are
>> >> inlined
>> >> we probably drop the function as we would for a linkonce_odr
>> >> function)). If
>> >> they're still around come codegen time, we just skip them:
>> >>
>> >> lib/CodeGen/MachineFunctionPass.cpp,
>> >> MachineFunctionPass::runOnFunction:
>> >>
>> >> 33| bool MachineFunctionPass::runOnFunction(Function &F) {
>> >> 34|   // Do not codegen any 'available_externally' functions at all,
>> >> they
>> >> have
>> >> 35|   // definitions outside the translation unit.
>> >> 36|   if (F.hasAvailableExternallyLinkage())
>> >> 37|     return false;
>> >> 38|
>> >> 39|   MachineFunction &MF =
>> >> getAnalysis<MachineFunctionAnalysis>().getMF();
>> >> 40|   return runOnMachineFunction(MF);
>> >> 41| }
>> >
>> > Ok, thanks for digging this up. I guess that would work for the
>> > imported functions as well, but it seems like a waste of compile time
>> > if they are not useful after inlining/global opt. Is there any reason
>> > to keep them past GlobalDCE or should I try changing GlobalDCE so that
>> > it removes them?
>> >
>>
>> I found allowing GlobalDCE to remove referenced
>> AvailableExternallyLinkage values earlier passes all the clang/llvm
>> tests (see patch below).
>
>
> The LLVM regression tests might not catch the sort of regression this change
> could cause. Usually we test each pass in isolation and tend towards white
> box testing (so we test the cases that are "interesting" according to the
> algorithm) - in this case, you're adding a condition (adding an
> "interesting" case that wasn't interesting before - so didn't need to be
> tested explicitly) that changes the behavior of GlobalDCE. This behavior
> may've been depended upon by further down-stream optimizations for overall
> performance.
>
>>
>> AFAICT, it shouldn't be necessary to keep
>> these functions around past the first call to GlobalDCE (which is
>> after inlining), but let me know if I missed something.
>
>
> It's possible they're still useful for optimizations other than inlining. We
> can still do constant propagation through such functions (hey, look, this
> function always returns 3, etc).

I thought this was done by passes such as IPSCCPPass and GlobalOpt
which happen earlier.

You might well be right - I know next to nothing about these things. Having someone a double-check/review from someone more familiar would be nice.

I'll leave the rest of the review to any such someone who might be sufficiently informed about the pass pipeline to decide whether this is the right call. (and/or perf numbers that demonstrate it)

Hopefully my naive review has at least covered the basics.

- David
 

>
> It might be necessary/worth actually running perf numbers on such a change
> (or manually (possibly by asking those who know - not me, unfortunately)
> checking the passes that come after GlobalDCE to see if any would do cross
> function constant prop, etc. Perhaps someone else knows this off-the-cuff.

I can collect some perf data.

>
>>
>> Although they
>> will be removed eventually by MachineFunctionPass as David pointed
>> out,
>
>
> Interestingly they're not exactly /removed/, they're just not emitted. I
> don't know whether that's important.

Yes, true, suppressed not removed.

>
>>
>> so this is more of a compile-time improvement than anything. It
>> will affect ThinLTO more significantly in compile time because there
>> will be more functions with this linkage type, depending on how many
>> static functions
>
>
> Hmm - why static functions in particular? I imagine this attribute would be
> used for non-static functions too in ThinLTO.

Arg, yes, I made the same mistake in some earlier email - all "global"
functions (originally so and those static promoted) will be
available_externally. So this is a significant amount for ThinLTO.

>
>>
>> are imported that we have to promote.
>>
>>
>> Also, what about available_externally variable
>> definitions/initializations? I'm not sure what source construct
>> currently can generate those,
>
>
> I believe we emit C++ vtables as available_externally when the type has a
> key function. This allows devirtualization opportunities without bloating
> object file size with duplicate vtable data.
>
>>
>> but are these suppressed/turned into
>> declarations at some point?
>
>
> Maybe? Again, I would guess (and you can check some of this with
> -print-after-all to see how the IR changes as it goes through the
> optimizations) they might just be omitted from the generated code, not
> necessarily removed from the IR at any point.

Ok, I will investigate.

Thanks,
Teresa

>
>>
>> Note that ThinLTO imported file static
>> variables always need to be promoted, and will be marked
>> available_external.
>>
>> Index: lib/Transforms/IPO/GlobalDCE.cpp
>> ===================================================================
>> --- lib/Transforms/IPO/GlobalDCE.cpp (revision 237590)
>> +++ lib/Transforms/IPO/GlobalDCE.cpp (working copy)
>> @@ -162,7 +162,10 @@ bool GlobalDCE::runOnModule(Module &M) {
>>      // themselves.
>>      for (unsigned i = 0, e = DeadFunctions.size(); i != e; ++i) {
>>        RemoveUnusedGlobalValue(*DeadFunctions[i]);
>> -      M.getFunctionList().erase(DeadFunctions[i]);
>> +      // Might have deleted the body of an available externally function
>> that
>> +      // is still referenced. Leave the declaration.
>> +      if (DeadFunctions[i]->use_empty())
>> +        M.getFunctionList().erase(DeadFunctions[i]);
>>      }
>>      NumFunctions += DeadFunctions.size();
>>      Changed = true;
>> @@ -171,7 +174,10 @@ bool GlobalDCE::runOnModule(Module &M) {
>>    if (!DeadGlobalVars.empty()) {
>>      for (unsigned i = 0, e = DeadGlobalVars.size(); i != e; ++i) {
>>        RemoveUnusedGlobalValue(*DeadGlobalVars[i]);
>> -      M.getGlobalList().erase(DeadGlobalVars[i]);
>> +      // Might have deleted the definition of an available externally
>> function
>> +      // that is still referenced. Leave the declaration.
>> +      if (DeadGlobalVars[i]->use_empty())
>> +        M.getGlobalList().erase(DeadGlobalVars[i]);
>>      }
>>      NumVariables += DeadGlobalVars.size();
>>      Changed = true;
>> @@ -231,8 +237,10 @@ void GlobalDCE::GlobalIsNeeded(GlobalValue *G) {
>>      for (Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB)
>>        for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E;
>> ++I)
>>          for (User::op_iterator U = I->op_begin(), E = I->op_end(); U !=
>> E; ++U)
>> -          if (GlobalValue *GV = dyn_cast<GlobalValue>(*U))
>> -            GlobalIsNeeded(GV);
>> +          if (GlobalValue *GV = dyn_cast<GlobalValue>(*U)) {
>> +            if (!GV->hasAvailableExternallyLinkage())
>> +              GlobalIsNeeded(GV);
>> +          }
>>            else if (Constant *C = dyn_cast<Constant>(*U))
>>              MarkUsedGlobalsAsNeeded(C);
>>    }
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | [hidden email] | <a href="tel:408-460-2413" value="+14084602413">408-460-2413
>
>



--
Teresa Johnson | Software Engineer | [hidden email] | <a href="tel:408-460-2413" value="+14084602413">408-460-2413


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)

Teresa Johnson
On Tue, May 19, 2015 at 1:18 PM, David Blaikie <[hidden email]> wrote:

>
>
> On Tue, May 19, 2015 at 1:05 PM, Teresa Johnson <[hidden email]>
> wrote:
>>
>> On Tue, May 19, 2015 at 12:41 PM, David Blaikie <[hidden email]>
>> wrote:
>> >
>> >
>> > On Mon, May 18, 2015 at 9:09 PM, Teresa Johnson <[hidden email]>
>> > wrote:
>> >>
>> >> On Fri, May 15, 2015 at 11:20 PM, Teresa Johnson <[hidden email]>
>> >> wrote:
>> >> > On Fri, May 15, 2015 at 2:52 PM, David Blaikie <[hidden email]>
>> >> > wrote:
>> >> >>
>> >> >>
>> >> >> On Fri, May 15, 2015 at 1:15 PM, Teresa Johnson
>> >> >> <[hidden email]>
>> >> >> wrote:
>> >> >>>
>> >> >>> On Fri, May 15, 2015 at 10:04 AM, David Blaikie
>> >> >>> <[hidden email]>
>> >> >>> wrote:
>> >> >>> >
>> >> >>> >
>> >> >>> > On Fri, May 15, 2015 at 9:53 AM, Teresa Johnson
>> >> >>> > <[hidden email]>
>> >> >>> > wrote:
>> >> >>> >>
>> >> >>> >> Looking back through my GlobalDCE changes, it looks like one of
>> >> >>> >> the
>> >> >>> >> places I had changed (where we mark defined globals in
>> >> >>> >> runOnModule)
>> >> >>> >> already has a guard for !hasAvailableExternallyLinkage and
>> >> >>> >> !isDiscardableIfUnused, so my additional guard against marking
>> >> >>> >> imported functions is unnecessary. But the other place I had to
>> >> >>> >> change
>> >> >>> >> was in GlobalIsNeeded where it walks through the function and
>> >> >>> >> recursively marks any referenced global as needed. Here there
>> >> >>> >> was
>> >> >>> >> no
>> >> >>> >> guard against marking a global that is available externally as
>> >> >>> >> needed
>> >> >>> >> if it is referenced. I had added a check here to not mark
>> >> >>> >> imported
>> >> >>> >> functions as needed on reference unless they were discardable
>> >> >>> >> (i.e.
>> >> >>> >> link once). Is this a bug - should this have a guard against
>> >> >>> >> marking
>> >> >>> >> available externally function refs as needed?
>> >> >>> >
>> >> >>> >
>> >> >>> > Duncan's probably got a better idea of what the right answer is
>> >> >>> > here. I
>> >> >>> > suspect "yes".
>> >> >>> >
>> >> >>> > The trick with available_externally is to ensure we keep these
>> >> >>> > around
>> >> >>> > for
>> >> >>> > long enough that their definitions are useful (for inlining,
>> >> >>> > constant
>> >> >>> > prop,
>> >> >>> > all that good stuff) but remove them before we actually do too
>> >> >>> > much
>> >> >>> > work
>> >> >>> > on
>> >> >>> > them, or at least before we emit them into the final object file.
>> >> >>>
>> >> >>> Yep, and that is exactly how long I want my imported functions to
>> >> >>> stick around. Currently I have the ThinLTO import pass added at the
>> >> >>> start of addLTOOptimizationPasses, just after the AA passes. So the
>> >> >>> imported functions stick around through global opt, constant merge,
>> >> >>> combining, and inlining. The call to GlobalDCE is just after
>> >> >>> inlining
>> >> >>> and a second globalopt.
>> >> >>>
>> >> >>> >
>> >> >>> > I imagine if GlobalDCE isn't removing available_externally
>> >> >>> > functions
>> >> >>> > it's
>> >> >>> > because they're still useful in the optimization pipeline and
>> >> >>> > something
>> >> >>> > further down the pipe removes them (because they do, ultimately,
>> >> >>> > get
>> >> >>> > removed).
>> >> >>>
>> >> >>> That would be good to know, since if they are useful later on in
>> >> >>> the
>> >> >>> pipe then presumably my imported functions could be as well. But I
>> >> >>> wonder who is removing them since GlobalDCE isn't (if there are
>> >> >>> refs)?
>> >> >>
>> >> >>
>> >> >> Yep - seems no one removes them (well, probably if there are no
>> >> >> calls
>> >> >> to the
>> >> >> function at all they get removed as usual (eg: if all the calls are
>> >> >> inlined
>> >> >> we probably drop the function as we would for a linkonce_odr
>> >> >> function)). If
>> >> >> they're still around come codegen time, we just skip them:
>> >> >>
>> >> >> lib/CodeGen/MachineFunctionPass.cpp,
>> >> >> MachineFunctionPass::runOnFunction:
>> >> >>
>> >> >> 33| bool MachineFunctionPass::runOnFunction(Function &F) {
>> >> >> 34|   // Do not codegen any 'available_externally' functions at all,
>> >> >> they
>> >> >> have
>> >> >> 35|   // definitions outside the translation unit.
>> >> >> 36|   if (F.hasAvailableExternallyLinkage())
>> >> >> 37|     return false;
>> >> >> 38|
>> >> >> 39|   MachineFunction &MF =
>> >> >> getAnalysis<MachineFunctionAnalysis>().getMF();
>> >> >> 40|   return runOnMachineFunction(MF);
>> >> >> 41| }
>> >> >
>> >> > Ok, thanks for digging this up. I guess that would work for the
>> >> > imported functions as well, but it seems like a waste of compile time
>> >> > if they are not useful after inlining/global opt. Is there any reason
>> >> > to keep them past GlobalDCE or should I try changing GlobalDCE so
>> >> > that
>> >> > it removes them?
>> >> >
>> >>
>> >> I found allowing GlobalDCE to remove referenced
>> >> AvailableExternallyLinkage values earlier passes all the clang/llvm
>> >> tests (see patch below).
>> >
>> >
>> > The LLVM regression tests might not catch the sort of regression this
>> > change
>> > could cause. Usually we test each pass in isolation and tend towards
>> > white
>> > box testing (so we test the cases that are "interesting" according to
>> > the
>> > algorithm) - in this case, you're adding a condition (adding an
>> > "interesting" case that wasn't interesting before - so didn't need to be
>> > tested explicitly) that changes the behavior of GlobalDCE. This behavior
>> > may've been depended upon by further down-stream optimizations for
>> > overall
>> > performance.
>> >
>> >>
>> >> AFAICT, it shouldn't be necessary to keep
>> >> these functions around past the first call to GlobalDCE (which is
>> >> after inlining), but let me know if I missed something.
>> >
>> >
>> > It's possible they're still useful for optimizations other than
>> > inlining. We
>> > can still do constant propagation through such functions (hey, look,
>> > this
>> > function always returns 3, etc).
>>
>> I thought this was done by passes such as IPSCCPPass and GlobalOpt
>> which happen earlier.
>
>
> You might well be right - I know next to nothing about these things. Having
> someone a double-check/review from someone more familiar would be nice.
>
> I'll leave the rest of the review to any such someone who might be
> sufficiently informed about the pass pipeline to decide whether this is the
> right call. (and/or perf numbers that demonstrate it)
>
> Hopefully my naive review has at least covered the basics.

I just did some SPEC cpu2006 performance testing with this change, at
-O2 both with and without LTO. There were no measurable performance
differences. In fact, the final binaries were largely identical
(always the same size before and after, generated code looks
identical, slight differences in label naming are the only differences
in the nm and objdump output for a couple benchmarks - most are
totally identical).

Can someone take a look at the patch and approve if possible? While
currently this doesn't affect large numbers of functions, with ThinLTO
we want to avoid keeping around the larger numbers of
available-externally imported objects unnecessarily. This change
avoids the need of special casing here for ThinLTO imported
functions/variables.

Thanks,
Teresa

>
> - David
>
>>
>>
>> >
>> > It might be necessary/worth actually running perf numbers on such a
>> > change
>> > (or manually (possibly by asking those who know - not me, unfortunately)
>> > checking the passes that come after GlobalDCE to see if any would do
>> > cross
>> > function constant prop, etc. Perhaps someone else knows this
>> > off-the-cuff.
>>
>> I can collect some perf data.
>>
>> >
>> >>
>> >> Although they
>> >> will be removed eventually by MachineFunctionPass as David pointed
>> >> out,
>> >
>> >
>> > Interestingly they're not exactly /removed/, they're just not emitted. I
>> > don't know whether that's important.
>>
>> Yes, true, suppressed not removed.
>>
>> >
>> >>
>> >> so this is more of a compile-time improvement than anything. It
>> >> will affect ThinLTO more significantly in compile time because there
>> >> will be more functions with this linkage type, depending on how many
>> >> static functions
>> >
>> >
>> > Hmm - why static functions in particular? I imagine this attribute would
>> > be
>> > used for non-static functions too in ThinLTO.
>>
>> Arg, yes, I made the same mistake in some earlier email - all "global"
>> functions (originally so and those static promoted) will be
>> available_externally. So this is a significant amount for ThinLTO.
>>
>> >
>> >>
>> >> are imported that we have to promote.
>> >>
>> >>
>> >> Also, what about available_externally variable
>> >> definitions/initializations? I'm not sure what source construct
>> >> currently can generate those,
>> >
>> >
>> > I believe we emit C++ vtables as available_externally when the type has
>> > a
>> > key function. This allows devirtualization opportunities without
>> > bloating
>> > object file size with duplicate vtable data.
>> >
>> >>
>> >> but are these suppressed/turned into
>> >> declarations at some point?
>> >
>> >
>> > Maybe? Again, I would guess (and you can check some of this with
>> > -print-after-all to see how the IR changes as it goes through the
>> > optimizations) they might just be omitted from the generated code, not
>> > necessarily removed from the IR at any point.
>>
>> Ok, I will investigate.
>>
>> Thanks,
>> Teresa
>>
>> >
>> >>
>> >> Note that ThinLTO imported file static
>> >> variables always need to be promoted, and will be marked
>> >> available_external.
>> >>
>> >> Index: lib/Transforms/IPO/GlobalDCE.cpp
>> >> ===================================================================
>> >> --- lib/Transforms/IPO/GlobalDCE.cpp (revision 237590)
>> >> +++ lib/Transforms/IPO/GlobalDCE.cpp (working copy)
>> >> @@ -162,7 +162,10 @@ bool GlobalDCE::runOnModule(Module &M) {
>> >>      // themselves.
>> >>      for (unsigned i = 0, e = DeadFunctions.size(); i != e; ++i) {
>> >>        RemoveUnusedGlobalValue(*DeadFunctions[i]);
>> >> -      M.getFunctionList().erase(DeadFunctions[i]);
>> >> +      // Might have deleted the body of an available externally
>> >> function
>> >> that
>> >> +      // is still referenced. Leave the declaration.
>> >> +      if (DeadFunctions[i]->use_empty())
>> >> +        M.getFunctionList().erase(DeadFunctions[i]);
>> >>      }
>> >>      NumFunctions += DeadFunctions.size();
>> >>      Changed = true;
>> >> @@ -171,7 +174,10 @@ bool GlobalDCE::runOnModule(Module &M) {
>> >>    if (!DeadGlobalVars.empty()) {
>> >>      for (unsigned i = 0, e = DeadGlobalVars.size(); i != e; ++i) {
>> >>        RemoveUnusedGlobalValue(*DeadGlobalVars[i]);
>> >> -      M.getGlobalList().erase(DeadGlobalVars[i]);
>> >> +      // Might have deleted the definition of an available externally
>> >> function
>> >> +      // that is still referenced. Leave the declaration.
>> >> +      if (DeadGlobalVars[i]->use_empty())
>> >> +        M.getGlobalList().erase(DeadGlobalVars[i]);
>> >>      }
>> >>      NumVariables += DeadGlobalVars.size();
>> >>      Changed = true;
>> >> @@ -231,8 +237,10 @@ void GlobalDCE::GlobalIsNeeded(GlobalValue *G) {
>> >>      for (Function::iterator BB = F->begin(), E = F->end(); BB != E;
>> >> ++BB)
>> >>        for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I !=
>> >> E;
>> >> ++I)
>> >>          for (User::op_iterator U = I->op_begin(), E = I->op_end(); U
>> >> !=
>> >> E; ++U)
>> >> -          if (GlobalValue *GV = dyn_cast<GlobalValue>(*U))
>> >> -            GlobalIsNeeded(GV);
>> >> +          if (GlobalValue *GV = dyn_cast<GlobalValue>(*U)) {
>> >> +            if (!GV->hasAvailableExternallyLinkage())
>> >> +              GlobalIsNeeded(GV);
>> >> +          }
>> >>            else if (Constant *C = dyn_cast<Constant>(*U))
>> >>              MarkUsedGlobalsAsNeeded(C);
>> >>    }
>> >>
>> >>
>> >>
>> >> --
>> >> Teresa Johnson | Software Engineer | [hidden email] |
>> >> 408-460-2413
>> >
>> >
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413
>
>



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)

Duncan P. N. Exon Smith

> On 2015-Jun-03, at 09:56, Teresa Johnson <[hidden email]> wrote:
>
> On Tue, May 19, 2015 at 1:18 PM, David Blaikie <[hidden email]> wrote:
>>
>>
>> On Tue, May 19, 2015 at 1:05 PM, Teresa Johnson <[hidden email]>
>> wrote:
>>>
>>> On Tue, May 19, 2015 at 12:41 PM, David Blaikie <[hidden email]>
>>> wrote:
>>>>
>>>>
>>>> On Mon, May 18, 2015 at 9:09 PM, Teresa Johnson <[hidden email]>
>>>> wrote:
>>>>>
>>>>> On Fri, May 15, 2015 at 11:20 PM, Teresa Johnson <[hidden email]>
>>>>> wrote:
>>>>>> On Fri, May 15, 2015 at 2:52 PM, David Blaikie <[hidden email]>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Fri, May 15, 2015 at 1:15 PM, Teresa Johnson
>>>>>>> <[hidden email]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> On Fri, May 15, 2015 at 10:04 AM, David Blaikie
>>>>>>>> <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, May 15, 2015 at 9:53 AM, Teresa Johnson
>>>>>>>>> <[hidden email]>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Looking back through my GlobalDCE changes, it looks like one of
>>>>>>>>>> the
>>>>>>>>>> places I had changed (where we mark defined globals in
>>>>>>>>>> runOnModule)
>>>>>>>>>> already has a guard for !hasAvailableExternallyLinkage and
>>>>>>>>>> !isDiscardableIfUnused, so my additional guard against marking
>>>>>>>>>> imported functions is unnecessary. But the other place I had to
>>>>>>>>>> change
>>>>>>>>>> was in GlobalIsNeeded where it walks through the function and
>>>>>>>>>> recursively marks any referenced global as needed. Here there
>>>>>>>>>> was
>>>>>>>>>> no
>>>>>>>>>> guard against marking a global that is available externally as
>>>>>>>>>> needed
>>>>>>>>>> if it is referenced. I had added a check here to not mark
>>>>>>>>>> imported
>>>>>>>>>> functions as needed on reference unless they were discardable
>>>>>>>>>> (i.e.
>>>>>>>>>> link once). Is this a bug - should this have a guard against
>>>>>>>>>> marking
>>>>>>>>>> available externally function refs as needed?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Duncan's probably got a better idea of what the right answer is
>>>>>>>>> here. I
>>>>>>>>> suspect "yes".
>>>>>>>>>
>>>>>>>>> The trick with available_externally is to ensure we keep these
>>>>>>>>> around
>>>>>>>>> for
>>>>>>>>> long enough that their definitions are useful (for inlining,
>>>>>>>>> constant
>>>>>>>>> prop,
>>>>>>>>> all that good stuff) but remove them before we actually do too
>>>>>>>>> much
>>>>>>>>> work
>>>>>>>>> on
>>>>>>>>> them, or at least before we emit them into the final object file.
>>>>>>>>
>>>>>>>> Yep, and that is exactly how long I want my imported functions to
>>>>>>>> stick around. Currently I have the ThinLTO import pass added at the
>>>>>>>> start of addLTOOptimizationPasses, just after the AA passes. So the
>>>>>>>> imported functions stick around through global opt, constant merge,
>>>>>>>> combining, and inlining. The call to GlobalDCE is just after
>>>>>>>> inlining
>>>>>>>> and a second globalopt.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I imagine if GlobalDCE isn't removing available_externally
>>>>>>>>> functions
>>>>>>>>> it's
>>>>>>>>> because they're still useful in the optimization pipeline and
>>>>>>>>> something
>>>>>>>>> further down the pipe removes them (because they do, ultimately,
>>>>>>>>> get
>>>>>>>>> removed).
>>>>>>>>
>>>>>>>> That would be good to know, since if they are useful later on in
>>>>>>>> the
>>>>>>>> pipe then presumably my imported functions could be as well. But I
>>>>>>>> wonder who is removing them since GlobalDCE isn't (if there are
>>>>>>>> refs)?
>>>>>>>
>>>>>>>
>>>>>>> Yep - seems no one removes them (well, probably if there are no
>>>>>>> calls
>>>>>>> to the
>>>>>>> function at all they get removed as usual (eg: if all the calls are
>>>>>>> inlined
>>>>>>> we probably drop the function as we would for a linkonce_odr
>>>>>>> function)). If
>>>>>>> they're still around come codegen time, we just skip them:
>>>>>>>
>>>>>>> lib/CodeGen/MachineFunctionPass.cpp,
>>>>>>> MachineFunctionPass::runOnFunction:
>>>>>>>
>>>>>>> 33| bool MachineFunctionPass::runOnFunction(Function &F) {
>>>>>>> 34|   // Do not codegen any 'available_externally' functions at all,
>>>>>>> they
>>>>>>> have
>>>>>>> 35|   // definitions outside the translation unit.
>>>>>>> 36|   if (F.hasAvailableExternallyLinkage())
>>>>>>> 37|     return false;
>>>>>>> 38|
>>>>>>> 39|   MachineFunction &MF =
>>>>>>> getAnalysis<MachineFunctionAnalysis>().getMF();
>>>>>>> 40|   return runOnMachineFunction(MF);
>>>>>>> 41| }
>>>>>>
>>>>>> Ok, thanks for digging this up. I guess that would work for the
>>>>>> imported functions as well, but it seems like a waste of compile time
>>>>>> if they are not useful after inlining/global opt. Is there any reason
>>>>>> to keep them past GlobalDCE or should I try changing GlobalDCE so
>>>>>> that
>>>>>> it removes them?
>>>>>>
>>>>>
>>>>> I found allowing GlobalDCE to remove referenced
>>>>> AvailableExternallyLinkage values earlier passes all the clang/llvm
>>>>> tests (see patch below).
>>>>
>>>>
>>>> The LLVM regression tests might not catch the sort of regression this
>>>> change
>>>> could cause. Usually we test each pass in isolation and tend towards
>>>> white
>>>> box testing (so we test the cases that are "interesting" according to
>>>> the
>>>> algorithm) - in this case, you're adding a condition (adding an
>>>> "interesting" case that wasn't interesting before - so didn't need to be
>>>> tested explicitly) that changes the behavior of GlobalDCE. This behavior
>>>> may've been depended upon by further down-stream optimizations for
>>>> overall
>>>> performance.
>>>>
>>>>>
>>>>> AFAICT, it shouldn't be necessary to keep
>>>>> these functions around past the first call to GlobalDCE (which is
>>>>> after inlining), but let me know if I missed something.
>>>>
>>>>
>>>> It's possible they're still useful for optimizations other than
>>>> inlining. We
>>>> can still do constant propagation through such functions (hey, look,
>>>> this
>>>> function always returns 3, etc).
>>>
>>> I thought this was done by passes such as IPSCCPPass and GlobalOpt
>>> which happen earlier.
>>
>>
>> You might well be right - I know next to nothing about these things. Having
>> someone a double-check/review from someone more familiar would be nice.
>>
>> I'll leave the rest of the review to any such someone who might be
>> sufficiently informed about the pass pipeline to decide whether this is the
>> right call. (and/or perf numbers that demonstrate it)
>>
>> Hopefully my naive review has at least covered the basics.
>
> I just did some SPEC cpu2006 performance testing with this change, at
> -O2 both with and without LTO. There were no measurable performance
> differences. In fact, the final binaries were largely identical
> (always the same size before and after, generated code looks
> identical, slight differences in label naming are the only differences
> in the nm and objdump output for a couple benchmarks - most are
> totally identical).
>
> Can someone take a look at the patch and approve if possible? While
> currently this doesn't affect large numbers of functions, with ThinLTO
> we want to avoid keeping around the larger numbers of
> available-externally imported objects unnecessarily. This change
> avoids the need of special casing here for ThinLTO imported
> functions/variables.

I'd be interested in performance of a clang bootstrap or of Chromium (or
some other large C++ program built with -flto).

Moreover, I suspect this will cause a problem for full LTO, where a lot
of inlining takes place at link time.  There have even been proposals
(but no one has collected any data to share yet) to run just the
always-inliner at compile time, deferring almost all inlining until
later.

But if we can have different pass pipelines for "normal" vs. "LTO"
compiles, then "ThinLTO" can have its own pass pipeline as well.  Seems
like it already needs one.  Why not add a late pass there to drop
available_externally functions?


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)

Teresa Johnson
On Wed, Jun 3, 2015 at 2:07 PM, Duncan P. N. Exon Smith
<[hidden email]> wrote:

>
>> On 2015-Jun-03, at 09:56, Teresa Johnson <[hidden email]> wrote:
>>
>> On Tue, May 19, 2015 at 1:18 PM, David Blaikie <[hidden email]> wrote:
>>>
>>>
>>> On Tue, May 19, 2015 at 1:05 PM, Teresa Johnson <[hidden email]>
>>> wrote:
>>>>
>>>> On Tue, May 19, 2015 at 12:41 PM, David Blaikie <[hidden email]>
>>>> wrote:
>>>>>
>>>>>
>>>>> On Mon, May 18, 2015 at 9:09 PM, Teresa Johnson <[hidden email]>
>>>>> wrote:
>>>>>>
>>>>>> On Fri, May 15, 2015 at 11:20 PM, Teresa Johnson <[hidden email]>
>>>>>> wrote:
>>>>>>> On Fri, May 15, 2015 at 2:52 PM, David Blaikie <[hidden email]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, May 15, 2015 at 1:15 PM, Teresa Johnson
>>>>>>>> <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> On Fri, May 15, 2015 at 10:04 AM, David Blaikie
>>>>>>>>> <[hidden email]>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, May 15, 2015 at 9:53 AM, Teresa Johnson
>>>>>>>>>> <[hidden email]>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Looking back through my GlobalDCE changes, it looks like one of
>>>>>>>>>>> the
>>>>>>>>>>> places I had changed (where we mark defined globals in
>>>>>>>>>>> runOnModule)
>>>>>>>>>>> already has a guard for !hasAvailableExternallyLinkage and
>>>>>>>>>>> !isDiscardableIfUnused, so my additional guard against marking
>>>>>>>>>>> imported functions is unnecessary. But the other place I had to
>>>>>>>>>>> change
>>>>>>>>>>> was in GlobalIsNeeded where it walks through the function and
>>>>>>>>>>> recursively marks any referenced global as needed. Here there
>>>>>>>>>>> was
>>>>>>>>>>> no
>>>>>>>>>>> guard against marking a global that is available externally as
>>>>>>>>>>> needed
>>>>>>>>>>> if it is referenced. I had added a check here to not mark
>>>>>>>>>>> imported
>>>>>>>>>>> functions as needed on reference unless they were discardable
>>>>>>>>>>> (i.e.
>>>>>>>>>>> link once). Is this a bug - should this have a guard against
>>>>>>>>>>> marking
>>>>>>>>>>> available externally function refs as needed?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Duncan's probably got a better idea of what the right answer is
>>>>>>>>>> here. I
>>>>>>>>>> suspect "yes".
>>>>>>>>>>
>>>>>>>>>> The trick with available_externally is to ensure we keep these
>>>>>>>>>> around
>>>>>>>>>> for
>>>>>>>>>> long enough that their definitions are useful (for inlining,
>>>>>>>>>> constant
>>>>>>>>>> prop,
>>>>>>>>>> all that good stuff) but remove them before we actually do too
>>>>>>>>>> much
>>>>>>>>>> work
>>>>>>>>>> on
>>>>>>>>>> them, or at least before we emit them into the final object file.
>>>>>>>>>
>>>>>>>>> Yep, and that is exactly how long I want my imported functions to
>>>>>>>>> stick around. Currently I have the ThinLTO import pass added at the
>>>>>>>>> start of addLTOOptimizationPasses, just after the AA passes. So the
>>>>>>>>> imported functions stick around through global opt, constant merge,
>>>>>>>>> combining, and inlining. The call to GlobalDCE is just after
>>>>>>>>> inlining
>>>>>>>>> and a second globalopt.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I imagine if GlobalDCE isn't removing available_externally
>>>>>>>>>> functions
>>>>>>>>>> it's
>>>>>>>>>> because they're still useful in the optimization pipeline and
>>>>>>>>>> something
>>>>>>>>>> further down the pipe removes them (because they do, ultimately,
>>>>>>>>>> get
>>>>>>>>>> removed).
>>>>>>>>>
>>>>>>>>> That would be good to know, since if they are useful later on in
>>>>>>>>> the
>>>>>>>>> pipe then presumably my imported functions could be as well. But I
>>>>>>>>> wonder who is removing them since GlobalDCE isn't (if there are
>>>>>>>>> refs)?
>>>>>>>>
>>>>>>>>
>>>>>>>> Yep - seems no one removes them (well, probably if there are no
>>>>>>>> calls
>>>>>>>> to the
>>>>>>>> function at all they get removed as usual (eg: if all the calls are
>>>>>>>> inlined
>>>>>>>> we probably drop the function as we would for a linkonce_odr
>>>>>>>> function)). If
>>>>>>>> they're still around come codegen time, we just skip them:
>>>>>>>>
>>>>>>>> lib/CodeGen/MachineFunctionPass.cpp,
>>>>>>>> MachineFunctionPass::runOnFunction:
>>>>>>>>
>>>>>>>> 33| bool MachineFunctionPass::runOnFunction(Function &F) {
>>>>>>>> 34|   // Do not codegen any 'available_externally' functions at all,
>>>>>>>> they
>>>>>>>> have
>>>>>>>> 35|   // definitions outside the translation unit.
>>>>>>>> 36|   if (F.hasAvailableExternallyLinkage())
>>>>>>>> 37|     return false;
>>>>>>>> 38|
>>>>>>>> 39|   MachineFunction &MF =
>>>>>>>> getAnalysis<MachineFunctionAnalysis>().getMF();
>>>>>>>> 40|   return runOnMachineFunction(MF);
>>>>>>>> 41| }
>>>>>>>
>>>>>>> Ok, thanks for digging this up. I guess that would work for the
>>>>>>> imported functions as well, but it seems like a waste of compile time
>>>>>>> if they are not useful after inlining/global opt. Is there any reason
>>>>>>> to keep them past GlobalDCE or should I try changing GlobalDCE so
>>>>>>> that
>>>>>>> it removes them?
>>>>>>>
>>>>>>
>>>>>> I found allowing GlobalDCE to remove referenced
>>>>>> AvailableExternallyLinkage values earlier passes all the clang/llvm
>>>>>> tests (see patch below).
>>>>>
>>>>>
>>>>> The LLVM regression tests might not catch the sort of regression this
>>>>> change
>>>>> could cause. Usually we test each pass in isolation and tend towards
>>>>> white
>>>>> box testing (so we test the cases that are "interesting" according to
>>>>> the
>>>>> algorithm) - in this case, you're adding a condition (adding an
>>>>> "interesting" case that wasn't interesting before - so didn't need to be
>>>>> tested explicitly) that changes the behavior of GlobalDCE. This behavior
>>>>> may've been depended upon by further down-stream optimizations for
>>>>> overall
>>>>> performance.
>>>>>
>>>>>>
>>>>>> AFAICT, it shouldn't be necessary to keep
>>>>>> these functions around past the first call to GlobalDCE (which is
>>>>>> after inlining), but let me know if I missed something.
>>>>>
>>>>>
>>>>> It's possible they're still useful for optimizations other than
>>>>> inlining. We
>>>>> can still do constant propagation through such functions (hey, look,
>>>>> this
>>>>> function always returns 3, etc).
>>>>
>>>> I thought this was done by passes such as IPSCCPPass and GlobalOpt
>>>> which happen earlier.
>>>
>>>
>>> You might well be right - I know next to nothing about these things. Having
>>> someone a double-check/review from someone more familiar would be nice.
>>>
>>> I'll leave the rest of the review to any such someone who might be
>>> sufficiently informed about the pass pipeline to decide whether this is the
>>> right call. (and/or perf numbers that demonstrate it)
>>>
>>> Hopefully my naive review has at least covered the basics.
>>
>> I just did some SPEC cpu2006 performance testing with this change, at
>> -O2 both with and without LTO. There were no measurable performance
>> differences. In fact, the final binaries were largely identical
>> (always the same size before and after, generated code looks
>> identical, slight differences in label naming are the only differences
>> in the nm and objdump output for a couple benchmarks - most are
>> totally identical).
>>
>> Can someone take a look at the patch and approve if possible? While
>> currently this doesn't affect large numbers of functions, with ThinLTO
>> we want to avoid keeping around the larger numbers of
>> available-externally imported objects unnecessarily. This change
>> avoids the need of special casing here for ThinLTO imported
>> functions/variables.
>
> I'd be interested in performance of a clang bootstrap or of Chromium (or
> some other large C++ program built with -flto).
>
> Moreover, I suspect this will cause a problem for full LTO, where a lot
> of inlining takes place at link time.  There have even been proposals
> (but no one has collected any data to share yet) to run just the
> always-inliner at compile time, deferring almost all inlining until
> later.

Note that one of the sets of SPEC cpu2006 data I collected was with
full LTO. I'm trying to understand where this would be an issue for
full LTO, since GlobalDCE is running after inlining in the link time
LTO pipeline.

Just to be clear, are you concerned about the following:

1) GlobalDCE run after inlining in the "-flto -O2 -c" build removes an
unreferenced available externally function
2) LTO link step would have inlined that available externally function
somewhere (must be a different module since it was unreferenced in
step 1) but now can't

I'm not sure when 2) can happen since it had to be unreferenced in its
module to be removed with my changes. For an available externally
function, any reference from a different module would have needed its
own copy to start with (at least that is the case with C inline
functions). It looks like available externally C inline functions are
already removed when unreferenced by the compile step (e.g. when
inlined into that module's references during the "-flto -O2 -c"
compile).

>
> But if we can have different pass pipelines for "normal" vs. "LTO"
> compiles, then "ThinLTO" can have its own pass pipeline as well.  Seems
> like it already needs one.  Why not add a late pass there to drop
> available_externally functions?
>

Sure that's possible, but I am trying to understand what circumstance
there is where a full LTO would need to keep around available
externally functions after GlobalDCE where ThinLTO wouldn't.
Presumably if they are useful for optimization after GlobalDCE then we
would want to keep them around longer for ThinLTO as well. I just
haven't identified any need to do so.

Thanks,
Teresa


--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)

Duncan P. N. Exon Smith

> On 2015-Jun-04, at 07:10, Teresa Johnson <[hidden email]> wrote:
>
> On Wed, Jun 3, 2015 at 2:07 PM, Duncan P. N. Exon Smith
> <[hidden email]> wrote:
>>
>>> On 2015-Jun-03, at 09:56, Teresa Johnson <[hidden email]> wrote:
>>>
>>> On Tue, May 19, 2015 at 1:18 PM, David Blaikie <[hidden email]> wrote:
>>>
>>>> You might well be right - I know next to nothing about these things. Having
>>>> someone a double-check/review from someone more familiar would be nice.
>>>>
>>>> I'll leave the rest of the review to any such someone who might be
>>>> sufficiently informed about the pass pipeline to decide whether this is the
>>>> right call. (and/or perf numbers that demonstrate it)
>>>>
>>>> Hopefully my naive review has at least covered the basics.
>>>
>>> I just did some SPEC cpu2006 performance testing with this change, at
>>> -O2 both with and without LTO. There were no measurable performance
>>> differences. In fact, the final binaries were largely identical
>>> (always the same size before and after, generated code looks
>>> identical, slight differences in label naming are the only differences
>>> in the nm and objdump output for a couple benchmarks - most are
>>> totally identical).
>>>
>>> Can someone take a look at the patch and approve if possible? While
>>> currently this doesn't affect large numbers of functions, with ThinLTO
>>> we want to avoid keeping around the larger numbers of
>>> available-externally imported objects unnecessarily. This change
>>> avoids the need of special casing here for ThinLTO imported
>>> functions/variables.
>>
>> I'd be interested in performance of a clang bootstrap or of Chromium (or
>> some other large C++ program built with -flto).
>>
>> Moreover, I suspect this will cause a problem for full LTO, where a lot
>> of inlining takes place at link time.  There have even been proposals
>> (but no one has collected any data to share yet) to run just the
>> always-inliner at compile time, deferring almost all inlining until
>> later.
>
> Note that one of the sets of SPEC cpu2006 data I collected was with
> full LTO. I'm trying to understand where this would be an issue for
> full LTO, since GlobalDCE is running after inlining in the link time
> LTO pipeline.
>
> Just to be clear, are you concerned about the following:
>
> 1) GlobalDCE run after inlining in the "-flto -O2 -c" build removes an
> unreferenced available externally function
> 2) LTO link step would have inlined that available externally function
> somewhere (must be a different module since it was unreferenced in
> step 1) but now can't
>
> I'm not sure when 2) can happen since it had to be unreferenced in its
> module to be removed with my changes.

That's not what it looks like to me.  Here's what I think the relevant
part of your patch is:

On 2015-May-18, at 21:09, Teresa Johnson <[hidden email]> wrote:

>
> @@ -231,8 +237,10 @@ void GlobalDCE::GlobalIsNeeded(GlobalValue *G) {
>     for (Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB)
>       for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I)
>         for (User::op_iterator U = I->op_begin(), E = I->op_end(); U != E; ++U)
> -          if (GlobalValue *GV = dyn_cast<GlobalValue>(*U))
> -            GlobalIsNeeded(GV);
> +          if (GlobalValue *GV = dyn_cast<GlobalValue>(*U)) {
> +            if (!GV->hasAvailableExternallyLinkage())
> +              GlobalIsNeeded(GV);
> +          }
>           else if (Constant *C = dyn_cast<Constant>(*U))
>             MarkUsedGlobalsAsNeeded(C);
>   }


IIUC, this changes the logic from "if it's referenced, keep it" to
"if it's referenced and not available_externally, keep it".

In particular, I'm worried about GlobalDCE removing a *referenced*
available_externally function, particularly if/when we stop inlining at
the compile step when -flto.

> For an available externally
> function, any reference from a different module would have needed its
> own copy to start with (at least that is the case with C inline
> functions). It looks like available externally C inline functions are
> already removed when unreferenced by the compile step (e.g. when
> inlined into that module's references during the "-flto -O2 -c"
> compile).
>
>>
>> But if we can have different pass pipelines for "normal" vs. "LTO"
>> compiles, then "ThinLTO" can have its own pass pipeline as well.  Seems
>> like it already needs one.  Why not add a late pass there to drop
>> available_externally functions?
>>
>
> Sure that's possible, but I am trying to understand what circumstance
> there is where a full LTO would need to keep around available
> externally functions after GlobalDCE where ThinLTO wouldn't.
> Presumably if they are useful for optimization after GlobalDCE then we
> would want to keep them around longer for ThinLTO as well. I just
> haven't identified any need to do so.
>
> Thanks,
> Teresa
>
>
> --
> Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)

Teresa Johnson
On Thu, Jun 4, 2015 at 9:51 AM, Duncan P. N. Exon Smith
<[hidden email]> wrote:

>
>> On 2015-Jun-04, at 07:10, Teresa Johnson <[hidden email]> wrote:
>>
>> On Wed, Jun 3, 2015 at 2:07 PM, Duncan P. N. Exon Smith
>> <[hidden email]> wrote:
>>>
>>>> On 2015-Jun-03, at 09:56, Teresa Johnson <[hidden email]> wrote:
>>>>
>>>> On Tue, May 19, 2015 at 1:18 PM, David Blaikie <[hidden email]> wrote:
>>>>
>>>>> You might well be right - I know next to nothing about these things. Having
>>>>> someone a double-check/review from someone more familiar would be nice.
>>>>>
>>>>> I'll leave the rest of the review to any such someone who might be
>>>>> sufficiently informed about the pass pipeline to decide whether this is the
>>>>> right call. (and/or perf numbers that demonstrate it)
>>>>>
>>>>> Hopefully my naive review has at least covered the basics.
>>>>
>>>> I just did some SPEC cpu2006 performance testing with this change, at
>>>> -O2 both with and without LTO. There were no measurable performance
>>>> differences. In fact, the final binaries were largely identical
>>>> (always the same size before and after, generated code looks
>>>> identical, slight differences in label naming are the only differences
>>>> in the nm and objdump output for a couple benchmarks - most are
>>>> totally identical).
>>>>
>>>> Can someone take a look at the patch and approve if possible? While
>>>> currently this doesn't affect large numbers of functions, with ThinLTO
>>>> we want to avoid keeping around the larger numbers of
>>>> available-externally imported objects unnecessarily. This change
>>>> avoids the need of special casing here for ThinLTO imported
>>>> functions/variables.
>>>
>>> I'd be interested in performance of a clang bootstrap or of Chromium (or
>>> some other large C++ program built with -flto).
>>>
>>> Moreover, I suspect this will cause a problem for full LTO, where a lot
>>> of inlining takes place at link time.  There have even been proposals
>>> (but no one has collected any data to share yet) to run just the
>>> always-inliner at compile time, deferring almost all inlining until
>>> later.
>>
>> Note that one of the sets of SPEC cpu2006 data I collected was with
>> full LTO. I'm trying to understand where this would be an issue for
>> full LTO, since GlobalDCE is running after inlining in the link time
>> LTO pipeline.
>>
>> Just to be clear, are you concerned about the following:
>>
>> 1) GlobalDCE run after inlining in the "-flto -O2 -c" build removes an
>> unreferenced available externally function
>> 2) LTO link step would have inlined that available externally function
>> somewhere (must be a different module since it was unreferenced in
>> step 1) but now can't
>>
>> I'm not sure when 2) can happen since it had to be unreferenced in its
>> module to be removed with my changes.
>
> That's not what it looks like to me.  Here's what I think the relevant
> part of your patch is:
>
> On 2015-May-18, at 21:09, Teresa Johnson <[hidden email]> wrote:
>>
>> @@ -231,8 +237,10 @@ void GlobalDCE::GlobalIsNeeded(GlobalValue *G) {
>>     for (Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB)
>>       for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I)
>>         for (User::op_iterator U = I->op_begin(), E = I->op_end(); U != E; ++U)
>> -          if (GlobalValue *GV = dyn_cast<GlobalValue>(*U))
>> -            GlobalIsNeeded(GV);
>> +          if (GlobalValue *GV = dyn_cast<GlobalValue>(*U)) {
>> +            if (!GV->hasAvailableExternallyLinkage())
>> +              GlobalIsNeeded(GV);
>> +          }
>>           else if (Constant *C = dyn_cast<Constant>(*U))
>>             MarkUsedGlobalsAsNeeded(C);
>>   }
>
>
> IIUC, this changes the logic from "if it's referenced, keep it" to
> "if it's referenced and not available_externally, keep it".

Sorry, you're right, I conflated a couple of different things when I
looked at this again this morning. For ThinLTO I don't in fact need
this change for anything that is fully inlined after importing.

>
> In particular, I'm worried about GlobalDCE removing a *referenced*
> available_externally function, particularly if/when we stop inlining at
> the compile step when -flto.

Ok, that makes sense. In that case, I agree that doing this for
ThinLTO imported functions is the right way to go. My initial approach
was to mark the functions that were ThinLTO-imported and check that
here. I think we will likely want to mark the imported functions as
such regardless, since we may want to apply different optimization
thresholds for imported functions, or at least use for debugging and
tracing, so we could consider using that approach here.

Another possibility as you mentioned was to drop in a new pass in the
ThinLTO backend optimization pipeline just for this. Or perhaps the
GlobalDCE pass invocation could take a flag indicating that it should
remove the avail extern functions, that we can set in the ThinLTO
backend case. It seems like most of the code would be shared between
the current GlobalDCE and any new ThinLTO-specific version that
setting GlobalDCE up to do this optionally (either per-imported
function or with a configuration flag on the pass) made sense.

Teresa

>
>> For an available externally
>> function, any reference from a different module would have needed its
>> own copy to start with (at least that is the case with C inline
>> functions). It looks like available externally C inline functions are
>> already removed when unreferenced by the compile step (e.g. when
>> inlined into that module's references during the "-flto -O2 -c"
>> compile).
>>
>>>
>>> But if we can have different pass pipelines for "normal" vs. "LTO"
>>> compiles, then "ThinLTO" can have its own pass pipeline as well.  Seems
>>> like it already needs one.  Why not add a late pass there to drop
>>> available_externally functions?
>>>
>>
>> Sure that's possible, but I am trying to understand what circumstance
>> there is where a full LTO would need to keep around available
>> externally functions after GlobalDCE where ThinLTO wouldn't.
>> Presumably if they are useful for optimization after GlobalDCE then we
>> would want to keep them around longer for ThinLTO as well. I just
>> haven't identified any need to do so.
>>
>> Thanks,
>> Teresa
>>
>>
>> --
>> Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413
>



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)

Duncan P. N. Exon Smith

> On 2015-Jun-04, at 10:13, Teresa Johnson <[hidden email]> wrote:
>
> On Thu, Jun 4, 2015 at 9:51 AM, Duncan P. N. Exon Smith
> <[hidden email]> wrote:
>>
>>> On 2015-Jun-04, at 07:10, Teresa Johnson <[hidden email]> wrote:
>>>
>>> On Wed, Jun 3, 2015 at 2:07 PM, Duncan P. N. Exon Smith
>>> <[hidden email]> wrote:
>>>>
>>>>> On 2015-Jun-03, at 09:56, Teresa Johnson <[hidden email]> wrote:
>>>>>
>>>>> On Tue, May 19, 2015 at 1:18 PM, David Blaikie <[hidden email]> wrote:
>>>>>
>>>>>> You might well be right - I know next to nothing about these things. Having
>>>>>> someone a double-check/review from someone more familiar would be nice.
>>>>>>
>>>>>> I'll leave the rest of the review to any such someone who might be
>>>>>> sufficiently informed about the pass pipeline to decide whether this is the
>>>>>> right call. (and/or perf numbers that demonstrate it)
>>>>>>
>>>>>> Hopefully my naive review has at least covered the basics.
>>>>>
>>>>> I just did some SPEC cpu2006 performance testing with this change, at
>>>>> -O2 both with and without LTO. There were no measurable performance
>>>>> differences. In fact, the final binaries were largely identical
>>>>> (always the same size before and after, generated code looks
>>>>> identical, slight differences in label naming are the only differences
>>>>> in the nm and objdump output for a couple benchmarks - most are
>>>>> totally identical).
>>>>>
>>>>> Can someone take a look at the patch and approve if possible? While
>>>>> currently this doesn't affect large numbers of functions, with ThinLTO
>>>>> we want to avoid keeping around the larger numbers of
>>>>> available-externally imported objects unnecessarily. This change
>>>>> avoids the need of special casing here for ThinLTO imported
>>>>> functions/variables.
>>>>
>>>> I'd be interested in performance of a clang bootstrap or of Chromium (or
>>>> some other large C++ program built with -flto).
>>>>
>>>> Moreover, I suspect this will cause a problem for full LTO, where a lot
>>>> of inlining takes place at link time.  There have even been proposals
>>>> (but no one has collected any data to share yet) to run just the
>>>> always-inliner at compile time, deferring almost all inlining until
>>>> later.
>>>
>>> Note that one of the sets of SPEC cpu2006 data I collected was with
>>> full LTO. I'm trying to understand where this would be an issue for
>>> full LTO, since GlobalDCE is running after inlining in the link time
>>> LTO pipeline.
>>>
>>> Just to be clear, are you concerned about the following:
>>>
>>> 1) GlobalDCE run after inlining in the "-flto -O2 -c" build removes an
>>> unreferenced available externally function
>>> 2) LTO link step would have inlined that available externally function
>>> somewhere (must be a different module since it was unreferenced in
>>> step 1) but now can't
>>>
>>> I'm not sure when 2) can happen since it had to be unreferenced in its
>>> module to be removed with my changes.
>>
>> That's not what it looks like to me.  Here's what I think the relevant
>> part of your patch is:
>>
>> On 2015-May-18, at 21:09, Teresa Johnson <[hidden email]> wrote:
>>>
>>> @@ -231,8 +237,10 @@ void GlobalDCE::GlobalIsNeeded(GlobalValue *G) {
>>>    for (Function::iterator BB = F->begin(), E = F->end(); BB != E; ++BB)
>>>      for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I)
>>>        for (User::op_iterator U = I->op_begin(), E = I->op_end(); U != E; ++U)
>>> -          if (GlobalValue *GV = dyn_cast<GlobalValue>(*U))
>>> -            GlobalIsNeeded(GV);
>>> +          if (GlobalValue *GV = dyn_cast<GlobalValue>(*U)) {
>>> +            if (!GV->hasAvailableExternallyLinkage())
>>> +              GlobalIsNeeded(GV);
>>> +          }
>>>          else if (Constant *C = dyn_cast<Constant>(*U))
>>>            MarkUsedGlobalsAsNeeded(C);
>>>  }
>>
>>
>> IIUC, this changes the logic from "if it's referenced, keep it" to
>> "if it's referenced and not available_externally, keep it".
>
> Sorry, you're right, I conflated a couple of different things when I
> looked at this again this morning. For ThinLTO I don't in fact need
> this change for anything that is fully inlined after importing.
>
>>
>> In particular, I'm worried about GlobalDCE removing a *referenced*
>> available_externally function, particularly if/when we stop inlining at
>> the compile step when -flto.
>
> Ok, that makes sense. In that case, I agree that doing this for
> ThinLTO imported functions is the right way to go. My initial approach
> was to mark the functions that were ThinLTO-imported and check that
> here. I think we will likely want to mark the imported functions as
> such regardless, since we may want to apply different optimization
> thresholds for imported functions, or at least use for debugging and
> tracing, so we could consider using that approach here.
>
> Another possibility as you mentioned was to drop in a new pass in the
> ThinLTO backend optimization pipeline just for this. Or perhaps the
> GlobalDCE pass invocation could take a flag indicating that it should
> remove the avail extern functions, that we can set in the ThinLTO
> backend case. It seems like most of the code would be shared between
> the current GlobalDCE and any new ThinLTO-specific version that
> setting GlobalDCE up to do this optionally (either per-imported
> function or with a configuration flag on the pass) made sense.

Since the compiler is always free to delete available_externally
functions, I think you could just add a pass to the -flto=thin pipeline
that deletes all of them (referenced or not) -- it's just a single loop
through all the functions deleting the bodies of those with the right
linkage.  I imagine there are other pass pipelines that might want to do
something similar.  I don't really like having GlobalDCE delete them
(even behind a flag) because they aren't actually dead, and I think a
separate pass makes it easier to test and all that.  (I haven't actually
worked much with pass pipelines, though, so there's a chance I'm on my
own here?)

You make an interesting point about applying different thresholds to
imported functions, but is there any reason not to change all
available_externally functions in the some way?  Moreover, it feels like
thin-LTO's imported functions are a new source of functions with
available_externally linkage, but otherwise they aren't interestingly
different.


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)

Teresa Johnson
On Thu, Jun 4, 2015 at 11:27 AM, Duncan P. N. Exon Smith
<[hidden email]> wrote:

>
>> On 2015-Jun-04, at 10:13, Teresa Johnson <[hidden email]> wrote:
>>
>> On Thu, Jun 4, 2015 at 9:51 AM, Duncan P. N. Exon Smith
>> <[hidden email]> wrote:
>>>
>>> In particular, I'm worried about GlobalDCE removing a *referenced*
>>> available_externally function, particularly if/when we stop inlining at
>>> the compile step when -flto.
>>
>> Ok, that makes sense. In that case, I agree that doing this for
>> ThinLTO imported functions is the right way to go. My initial approach
>> was to mark the functions that were ThinLTO-imported and check that
>> here. I think we will likely want to mark the imported functions as
>> such regardless, since we may want to apply different optimization
>> thresholds for imported functions, or at least use for debugging and
>> tracing, so we could consider using that approach here.
>>
>> Another possibility as you mentioned was to drop in a new pass in the
>> ThinLTO backend optimization pipeline just for this. Or perhaps the
>> GlobalDCE pass invocation could take a flag indicating that it should
>> remove the avail extern functions, that we can set in the ThinLTO
>> backend case. It seems like most of the code would be shared between
>> the current GlobalDCE and any new ThinLTO-specific version that
>> setting GlobalDCE up to do this optionally (either per-imported
>> function or with a configuration flag on the pass) made sense.
>
> Since the compiler is always free to delete available_externally
> functions, I think you could just add a pass to the -flto=thin pipeline
> that deletes all of them (referenced or not) -- it's just a single loop
> through all the functions deleting the bodies of those with the right
> linkage.  I imagine there are other pass pipelines that might want to do
> something similar.  I don't really like having GlobalDCE delete them
> (even behind a flag) because they aren't actually dead, and I think a
> separate pass makes it easier to test and all that.  (I haven't actually
> worked much with pass pipelines, though, so there's a chance I'm on my
> own here?)

Ok, sounds reasonable to do as a separate ThinLTO specific pass.

>
> You make an interesting point about applying different thresholds to
> imported functions, but is there any reason not to change all
> available_externally functions in the some way?  Moreover, it feels like
> thin-LTO's imported functions are a new source of functions with
> available_externally linkage, but otherwise they aren't interestingly
> different.
>

I suspect those cases are going to be due to summary information
recorded for the imported functions anyway, which we haven't defined
the format of yet anyway. So I think we can separate that from this
aspect of imported function handling, which can be based on the
linkage type alone, and defer that discussion until later.

Thanks for your help,
Teresa


--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)

Reid Kleckner-2
In reply to this post by Duncan P. N. Exon Smith
On Thu, Jun 4, 2015 at 11:27 AM, Duncan P. N. Exon Smith <[hidden email]> wrote:
Since the compiler is always free to delete available_externally
functions, I think you could just add a pass to the -flto=thin pipeline
that deletes all of them (referenced or not) -- it's just a single loop
through all the functions deleting the bodies of those with the right
linkage.  I imagine there are other pass pipelines that might want to do
something similar.  I don't really like having GlobalDCE delete them
(even behind a flag) because they aren't actually dead, and I think a
separate pass makes it easier to test and all that.  (I haven't actually
worked much with pass pipelines, though, so there's a chance I'm on my
own here?)

It's possible to get into situations where available_externally functions or globals (dllimported vftable) reference linkonce_odr functions (virtual destructor thunks), so a single pass to drop available_externally function bodies isn't as strong as adding a flag to GlobalDCE.

Personally, I think the right approach is to add a bool to createGlobalDCEPass defaulting to true named something like IsAfterInlining. In most standard pass pipelines, GlobalDCE runs after inlining for obvious reasons, so the default makes sense. The special case is the LTO pass pipeline, where the code will be reloaded and reoptimized later. IMO it's natural to put the customization there.

You make an interesting point about applying different thresholds to
imported functions, but is there any reason not to change all
available_externally functions in the some way?  Moreover, it feels like
thin-LTO's imported functions are a new source of functions with
available_externally linkage, but otherwise they aren't interestingly
different.

Right, ThinLTO functions and C99 inline functions aren't interestingly different. Having GlobalDCE drop bodies of available_externally functions in the standard -O2 pipeline avoids wasting time running IR passes like CodeGenPrepare on them.

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)

Duncan P. N. Exon Smith

> On 2015 Jun 4, at 14:37, Reid Kleckner <[hidden email]> wrote:
>
> On Thu, Jun 4, 2015 at 11:27 AM, Duncan P. N. Exon Smith <[hidden email]> wrote:
> Since the compiler is always free to delete available_externally
> functions, I think you could just add a pass to the -flto=thin pipeline
> that deletes all of them (referenced or not) -- it's just a single loop
> through all the functions deleting the bodies of those with the right
> linkage.  I imagine there are other pass pipelines that might want to do
> something similar.  I don't really like having GlobalDCE delete them
> (even behind a flag) because they aren't actually dead, and I think a
> separate pass makes it easier to test and all that.  (I haven't actually
> worked much with pass pipelines, though, so there's a chance I'm on my
> own here?)
>
> It's possible to get into situations where available_externally functions or globals (dllimported vftable) reference linkonce_odr functions (virtual destructor thunks), so a single pass to drop available_externally function bodies isn't as strong as adding a flag to GlobalDCE.

Unless I'm missing something, running the pass to drop
available_externally function bodies just before running GlobalDCE
is equivalent to incorporating it into GlobalDCE.

>
> Personally, I think the right approach is to add a bool to createGlobalDCEPass defaulting to true named something like IsAfterInlining. In most standard pass pipelines, GlobalDCE runs after inlining for obvious reasons, so the default makes sense. The special case is the LTO pass pipeline, where the code will be reloaded and reoptimized later. IMO it's natural to put the customization there.

That's a good point.  But I still like these as separate passes.
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)

Xinliang David Li-2
In reply to this post by Reid Kleckner-2


On Thu, Jun 4, 2015 at 2:37 PM, Reid Kleckner <[hidden email]> wrote:
On Thu, Jun 4, 2015 at 11:27 AM, Duncan P. N. Exon Smith <[hidden email]> wrote:
Since the compiler is always free to delete available_externally
functions, I think you could just add a pass to the -flto=thin pipeline
that deletes all of them (referenced or not) -- it's just a single loop
through all the functions deleting the bodies of those with the right
linkage.  I imagine there are other pass pipelines that might want to do
something similar.  I don't really like having GlobalDCE delete them
(even behind a flag) because they aren't actually dead, and I think a
separate pass makes it easier to test and all that.  (I haven't actually
worked much with pass pipelines, though, so there's a chance I'm on my
own here?)

It's possible to get into situations where available_externally functions or globals (dllimported vftable) reference linkonce_odr functions (virtual destructor thunks), so a single pass to drop available_externally function bodies isn't as strong as adding a flag to GlobalDCE.

Personally, I think the right approach is to add a bool to createGlobalDCEPass defaulting to true named something like IsAfterInlining. In most standard pass pipelines, GlobalDCE runs after inlining for obvious reasons, so the default makes sense. The special case is the LTO pass pipeline, where the code will be reloaded and reoptimized later. IMO it's natural to put the customization there.


+1. That is what I was going to suggest too. The flag can be more general to indicate bodies are needed for referenced functions. For instance if there are cloning phase after inlining, then DCE before the cloning still needs to keep the bodies.
 


You make an interesting point about applying different thresholds to
imported functions, but is there any reason not to change all
available_externally functions in the some way?  Moreover, it feels like
thin-LTO's imported functions are a new source of functions with
available_externally linkage, but otherwise they aren't interestingly
different.

Right, ThinLTO functions and C99 inline functions aren't interestingly different

I think they are more like GNU extern inline functions.
 
. Having GlobalDCE drop bodies of available_externally functions in the standard -O2 pipeline avoids wasting time running IR passes like CodeGenPrepare on them.

yes.

thanks,

David

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)

Reid Kleckner-2
In reply to this post by Duncan P. N. Exon Smith
On Thu, Jun 4, 2015 at 3:58 PM, Duncan P. N. Exon Smith <[hidden email]> wrote:
> Personally, I think the right approach is to add a bool to createGlobalDCEPass defaulting to true named something like IsAfterInlining. In most standard pass pipelines, GlobalDCE runs after inlining for obvious reasons, so the default makes sense. The special case is the LTO pass pipeline, where the code will be reloaded and reoptimized later. IMO it's natural to put the customization there.

That's a good point.  But I still like these as separate passes.

Hm, yeah, that's pretty sane. So, add a pass to downgrade available_externally global definitions to declarations with external linkage, add it to the standard -O2 pass pipeline, ensure that it's not part of LTO, and then call it done?

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)

Duncan P. N. Exon Smith

> On 2015 Jun 4, at 16:51, Reid Kleckner <[hidden email]> wrote:
>
> On Thu, Jun 4, 2015 at 3:58 PM, Duncan P. N. Exon Smith <[hidden email]> wrote:
> > Personally, I think the right approach is to add a bool to createGlobalDCEPass defaulting to true named something like IsAfterInlining. In most standard pass pipelines, GlobalDCE runs after inlining for obvious reasons, so the default makes sense. The special case is the LTO pass pipeline, where the code will be reloaded and reoptimized later. IMO it's natural to put the customization there.
>
> That's a good point.  But I still like these as separate passes.
>
> Hm, yeah, that's pretty sane. So, add a pass to downgrade available_externally global definitions to declarations with external linkage, add it to the standard -O2 pass pipeline, ensure that it's not part of LTO, and then call it done?

SGTM.
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)

Teresa Johnson
On Thu, Jun 4, 2015 at 5:02 PM, Duncan P. N. Exon Smith
<[hidden email]> wrote:

>
>> On 2015 Jun 4, at 16:51, Reid Kleckner <[hidden email]> wrote:
>>
>> On Thu, Jun 4, 2015 at 3:58 PM, Duncan P. N. Exon Smith <[hidden email]> wrote:
>> > Personally, I think the right approach is to add a bool to createGlobalDCEPass defaulting to true named something like IsAfterInlining. In most standard pass pipelines, GlobalDCE runs after inlining for obvious reasons, so the default makes sense. The special case is the LTO pass pipeline, where the code will be reloaded and reoptimized later. IMO it's natural to put the customization there.
>>
>> That's a good point.  But I still like these as separate passes.
>>
>> Hm, yeah, that's pretty sane. So, add a pass to downgrade available_externally global definitions to declarations with external linkage, add it to the standard -O2 pass pipeline, ensure that it's not part of LTO, and then call it done?
>
> SGTM.

Agreed. Although I assume you mean invoke the new pass under a
ThinLTO-only option so that avail extern are not dropped in the
compile pass before the LTO link?

Teresa

--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)

Reid Kleckner-2
On Thu, Jun 4, 2015 at 5:17 PM, Teresa Johnson <[hidden email]> wrote:
Agreed. Although I assume you mean invoke the new pass under a
ThinLTO-only option so that avail extern are not dropped in the
compile pass before the LTO link?

No, this pass would actually be an improvement to the standard -O2 pipeline. The special case is the regular LTO pass pipeline, which wants to run GlobalDCE but doesn't want to drop available_externally function definitions until after linker-stage inlining.

Consider this test case:

declare void @blah()
define i32 @main() {
  call void @foo()
  ret i32 0
}
define available_externally void @foo() noinline {
  call void @bar()
  ret void
}
define linkonce_odr void @bar() noinline {
  call void @blah()
  ret void
}

If I run opt -O2 on this and feed it to llc, it actually generates code for bar, even though there are no actual references to bar in the final code:

main:                                   # @main
        pushq   %rax
        callq   foo
        xorl    %eax, %eax
        popq    %rdx
        retq

bar:                                    # @bar
        jmp     blah                    # TAILCALL

This corner case happens to come up organically with dllimported classes, which is why I happened to think of it. :) I'm happy with a flag to globaldce for LTO and the original patch, honestly.

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)

Teresa Johnson
On Thu, Jun 4, 2015 at 5:33 PM, Reid Kleckner <[hidden email]> wrote:

> On Thu, Jun 4, 2015 at 5:17 PM, Teresa Johnson <[hidden email]> wrote:
>>
>> Agreed. Although I assume you mean invoke the new pass under a
>> ThinLTO-only option so that avail extern are not dropped in the
>> compile pass before the LTO link?
>
>
> No, this pass would actually be an improvement to the standard -O2 pipeline.
> The special case is the regular LTO pass pipeline, which wants to run
> GlobalDCE but doesn't want to drop available_externally function definitions
> until after linker-stage inlining.

Ok. It looks to me like the LTO compile step with -O2 -flto -c uses
this same -O2 optimization pipeline as without LTO. Clang communicates
the fact that we are doing an LTO compile to llvm via the
-emit-llvm-bc flag, which just tells it to emit bitcode and exit
before codegen. So I would either need to key off of that or setup a
new flag to indicate to llvm that we are doing an LTO -c compile. Or
is there some other way that I am missing?

Incidentally, we'll also want to skip this new pass and keep any
referenced avail extern functions in the ThinLTO -c compile step for
the same reasons (and there are no imported functions yet at that
point).

Teresa

>
> Consider this test case:
>
> declare void @blah()
> define i32 @main() {
>   call void @foo()
>   ret i32 0
> }
> define available_externally void @foo() noinline {
>   call void @bar()
>   ret void
> }
> define linkonce_odr void @bar() noinline {
>   call void @blah()
>   ret void
> }
>
> If I run opt -O2 on this and feed it to llc, it actually generates code for
> bar, even though there are no actual references to bar in the final code:
>
> main:                                   # @main
>         pushq   %rax
>         callq   foo
>         xorl    %eax, %eax
>         popq    %rdx
>         retq
>
> bar:                                    # @bar
>         jmp     blah                    # TAILCALL
>
> This corner case happens to come up organically with dllimported classes,
> which is why I happened to think of it. :) I'm happy with a flag to
> globaldce for LTO and the original patch, honestly.



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Removing AvailableExternal values in GlobalDCE (was Re: RFC: ThinLTO Impementation Plan)

Eric Christopher


On Thu, Jun 4, 2015 at 8:17 PM Teresa Johnson <[hidden email]> wrote:
On Thu, Jun 4, 2015 at 5:33 PM, Reid Kleckner <[hidden email]> wrote:
> On Thu, Jun 4, 2015 at 5:17 PM, Teresa Johnson <[hidden email]> wrote:
>>
>> Agreed. Although I assume you mean invoke the new pass under a
>> ThinLTO-only option so that avail extern are not dropped in the
>> compile pass before the LTO link?
>
>
> No, this pass would actually be an improvement to the standard -O2 pipeline.
> The special case is the regular LTO pass pipeline, which wants to run
> GlobalDCE but doesn't want to drop available_externally function definitions
> until after linker-stage inlining.

Ok. It looks to me like the LTO compile step with -O2 -flto -c uses
this same -O2 optimization pipeline as without LTO. Clang communicates
the fact that we are doing an LTO compile to llvm via the
-emit-llvm-bc flag, which just tells it to emit bitcode and exit
before codegen. So I would either need to key off of that or setup a
new flag to indicate to llvm that we are doing an LTO -c compile. Or
is there some other way that I am missing?

Incidentally, we'll also want to skip this new pass and keep any
referenced avail extern functions in the ThinLTO -c compile step for
the same reasons (and there are no imported functions yet at that
point).


Ultimately for any planned LTO build we're going to want to do a different pass pipeline, it's probably worth considering what should be done before and during LTO.

-eric
 
Teresa

>
> Consider this test case:
>
> declare void @blah()
> define i32 @main() {
>   call void @foo()
>   ret i32 0
> }
> define available_externally void @foo() noinline {
>   call void @bar()
>   ret void
> }
> define linkonce_odr void @bar() noinline {
>   call void @blah()
>   ret void
> }
>
> If I run opt -O2 on this and feed it to llc, it actually generates code for
> bar, even though there are no actual references to bar in the final code:
>
> main:                                   # @main
>         pushq   %rax
>         callq   foo
>         xorl    %eax, %eax
>         popq    %rdx
>         retq
>
> bar:                                    # @bar
>         jmp     blah                    # TAILCALL
>
> This corner case happens to come up organically with dllimported classes,
> which is why I happened to think of it. :) I'm happy with a flag to
> globaldce for LTO and the original patch, honestly.



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
12