[llvm-dev] Cleaning up ‘br i1 false’ cases in CodeGenPrepare

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] Cleaning up ‘br i1 false’ cases in CodeGenPrepare

Bruce Hoult via llvm-dev
Hi,

I have come across a couple of cases where the code generated after
CodeGenPrepare pass has "br i1 false .." with both true and false
conditions preserved and this propagates further and remains the same
in the final assembly code/executable.

In CodeGenPrepare::runOnFunction, ConstantFoldTerminator (which
handles the br i1 false condition) is called only once and if after
the transformation of code by ConstantFoldTerminator() and
DeleteDeadBlock() we end up with code like "br i1 false", there is no
further opportunity to clean them up. So calling this code under
(!DisableBranchOpts) in a loop until no more transformations are made
fixes this issue. Is this reasonable ?

My simple fix (without any indentation changes) is:

--- a/llvm/lib/CodeGen/CodeGenPrepare.cpp
+++ b/llvm/lib/CodeGen/CodeGenPrepare.cpp

@@ -316,7 +316,9 @@ bool CodeGenPrepare::runOnFunction(Function &F) {

   SunkAddrs.clear();
   if (!DisableBranchOpts) {
+  MadeChange = true;
+  while (MadeChange) {
     MadeChange = false;
     SmallPtrSet<BasicBlock*, 8> WorkList;

     for (BasicBlock &BB : F) {

@@ -352,6 +354,7 @@ bool CodeGenPrepare::runOnFunction(Function &F) {
     EverMadeChange |= MadeChange;
   }
+ }
   if (!DisableGCOpts) {
   SmallVector<Instruction *, 2> Statepoints;

Testing the patch, I got a regression with
llvm/test/CodeGen/AMDGPU/nested-loop-conditions.ll.  I am unsure if
this requires remastering the test case to adjust with the new results
or if this is a real issue.  I don't have any expertise with GPU and
so any inputs in this regard would be very helpful.

Attached:
testcase.ll : Original test case for this issue
base.s and new.s: llc output for the failing case
(nested-loop-conditions.ll) before and after applying the above patch.

Thanks,
Bharathi

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

testcase.ll.txt (6K) Download Attachment
base.s (7K) Download Attachment
new.s (6K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Cleaning up ‘br i1 false’ cases in CodeGenPrepare

Bruce Hoult via llvm-dev
On 6/28/2018 9:44 PM, Bharathi Seshadri via llvm-dev wrote:

> Hi,
>
> I have come across a couple of cases where the code generated after
> CodeGenPrepare pass has "br i1 false .." with both true and false
> conditions preserved and this propagates further and remains the same
> in the final assembly code/executable.
>
> In CodeGenPrepare::runOnFunction, ConstantFoldTerminator (which
> handles the br i1 false condition) is called only once and if after
> the transformation of code by ConstantFoldTerminator() and
> DeleteDeadBlock() we end up with code like "br i1 false", there is no
> further opportunity to clean them up. So calling this code under
> (!DisableBranchOpts) in a loop until no more transformations are made
> fixes this issue. Is this reasonable ?

I would expect the precise case you're running into is rare: the second
iteration of the loop does nothing useful unless the IR specifically has
an i1 phi node in a block whose predecessors were erased.  And the
default optimization pipeline runs SimplifyCFG at the very end, which is
close to CodeGenPrepare, so the CFG simplification will usually be a
no-op anyway.

We really shouldn't be doing this sort of folding in CodeGenPrepare in
the first place.  It looks like it was added to work around the fact
that we we lower llvm.objectsize later than we should.

-Eli

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Cleaning up ‘br i1 false’ cases in CodeGenPrepare

Bruce Hoult via llvm-dev
we lower llvm.objectsize later than we should

Is there a well-accepted best (or even just better) place to lower objectsize? I ask because I sorta fear that these kinds of problems will become more pronounced as llvm.is.constant, which is also lowered in CGP, gains popularity.

(To be clear, I think it totally makes sense to lower is.constant and objectsize in the same place. I'm just saying that if the ideal piece of code to do that isn't CGP, ...)

On Fri, Jun 29, 2018 at 12:21 PM Friedman, Eli via llvm-dev <[hidden email]> wrote:
On 6/28/2018 9:44 PM, Bharathi Seshadri via llvm-dev wrote:
> Hi,
>
> I have come across a couple of cases where the code generated after
> CodeGenPrepare pass has "br i1 false .." with both true and false
> conditions preserved and this propagates further and remains the same
> in the final assembly code/executable.
>
> In CodeGenPrepare::runOnFunction, ConstantFoldTerminator (which
> handles the br i1 false condition) is called only once and if after
> the transformation of code by ConstantFoldTerminator() and
> DeleteDeadBlock() we end up with code like "br i1 false", there is no
> further opportunity to clean them up. So calling this code under
> (!DisableBranchOpts) in a loop until no more transformations are made
> fixes this issue. Is this reasonable ?

I would expect the precise case you're running into is rare: the second
iteration of the loop does nothing useful unless the IR specifically has
an i1 phi node in a block whose predecessors were erased.  And the
default optimization pipeline runs SimplifyCFG at the very end, which is
close to CodeGenPrepare, so the CFG simplification will usually be a
no-op anyway.

We really shouldn't be doing this sort of folding in CodeGenPrepare in
the first place.  It looks like it was added to work around the fact
that we we lower llvm.objectsize later than we should.

-Eli

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Cleaning up ‘br i1 false’ cases in CodeGenPrepare

Bruce Hoult via llvm-dev
On 6/29/2018 3:25 PM, George Burgess IV wrote:
we lower llvm.objectsize later than we should

Is there a well-accepted best (or even just better) place to lower objectsize? I ask because I sorta fear that these kinds of problems will become more pronounced as llvm.is.constant, which is also lowered in CGP, gains popularity.

After the "simplification" part of the optimization pipeline (after we've finished inlining and the associated function simplification passes have run), we're unlikely to find new information that would help simplify an llvm.objectsize or llvm.is.constant call.  So roughly around EP_VectorizerStart is probably appropriate.  But someone should measure before we move it.

-Eli


(To be clear, I think it totally makes sense to lower is.constant and objectsize in the same place. I'm just saying that if the ideal piece of code to do that isn't CGP, ...)

On Fri, Jun 29, 2018 at 12:21 PM Friedman, Eli via llvm-dev <[hidden email]> wrote:
On 6/28/2018 9:44 PM, Bharathi Seshadri via llvm-dev wrote:
> Hi,
>
> I have come across a couple of cases where the code generated after
> CodeGenPrepare pass has "br i1 false .." with both true and false
> conditions preserved and this propagates further and remains the same
> in the final assembly code/executable.
>
> In CodeGenPrepare::runOnFunction, ConstantFoldTerminator (which
> handles the br i1 false condition) is called only once and if after
> the transformation of code by ConstantFoldTerminator() and
> DeleteDeadBlock() we end up with code like "br i1 false", there is no
> further opportunity to clean them up. So calling this code under
> (!DisableBranchOpts) in a loop until no more transformations are made
> fixes this issue. Is this reasonable ?

I would expect the precise case you're running into is rare: the second
iteration of the loop does nothing useful unless the IR specifically has
an i1 phi node in a block whose predecessors were erased.  And the
default optimization pipeline runs SimplifyCFG at the very end, which is
close to CodeGenPrepare, so the CFG simplification will usually be a
no-op anyway.

We really shouldn't be doing this sort of folding in CodeGenPrepare in
the first place.  It looks like it was added to work around the fact
that we we lower llvm.objectsize later than we should.

-Eli

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Cleaning up ‘br i1 false’ cases in CodeGenPrepare

Bruce Hoult via llvm-dev
> But someone should measure before we move it.

I ran numbers on a large, varied codebase with an up-to-date clang-based FORTIFY implementation. With the current forced lowering in CGP, we lowered 64,662 calls to @llvm.objectsize to non-failure values and lowered 111,224 to failure values. Making the instcombine iteration after EP_VectorizerStart require that all objectsize intrinsics are lowered, we found successful values for 64,552 llvm.objectsize intrinsics, and returned failure values for 120,616.

Taken literally, we fail to lower 63.2% of calls with "successful" values today, and this change makes us fail to lower 65.1%. However, given that the earlier lowering makes us lower a little over 9,000 additional intrinsics, I'd imagine that most of these 'new' failures got DCE'd away before hitting CGP in the past.

In any case, I'd like to note that these numbers don't include calls to __builtin_object_size that clang is able to lower itself, so from a clang user's perspective, any degradation mentioned here is likely an overstatement.

Given the above, I've no issues with forcing @llvm.objectsize lowering to earlier in the pipeline. I have a patch to do this as part of InstCombine. Happy to make a LowerBestEffortPostOptimizationIntrinsicsPass (or whatever) specifically for this, if that would be preferable. Also happy to dig into where some of those additional objectsizes appear from if people really want.

On Fri, Jun 29, 2018 at 3:59 PM Friedman, Eli <[hidden email]> wrote:
On 6/29/2018 3:25 PM, George Burgess IV wrote:
we lower llvm.objectsize later than we should

Is there a well-accepted best (or even just better) place to lower objectsize? I ask because I sorta fear that these kinds of problems will become more pronounced as llvm.is.constant, which is also lowered in CGP, gains popularity.

After the "simplification" part of the optimization pipeline (after we've finished inlining and the associated function simplification passes have run), we're unlikely to find new information that would help simplify an llvm.objectsize or llvm.is.constant call.  So roughly around EP_VectorizerStart is probably appropriate.  But someone should measure before we move it.

-Eli


(To be clear, I think it totally makes sense to lower is.constant and objectsize in the same place. I'm just saying that if the ideal piece of code to do that isn't CGP, ...)

On Fri, Jun 29, 2018 at 12:21 PM Friedman, Eli via llvm-dev <[hidden email]> wrote:
On 6/28/2018 9:44 PM, Bharathi Seshadri via llvm-dev wrote:
> Hi,
>
> I have come across a couple of cases where the code generated after
> CodeGenPrepare pass has "br i1 false .." with both true and false
> conditions preserved and this propagates further and remains the same
> in the final assembly code/executable.
>
> In CodeGenPrepare::runOnFunction, ConstantFoldTerminator (which
> handles the br i1 false condition) is called only once and if after
> the transformation of code by ConstantFoldTerminator() and
> DeleteDeadBlock() we end up with code like "br i1 false", there is no
> further opportunity to clean them up. So calling this code under
> (!DisableBranchOpts) in a loop until no more transformations are made
> fixes this issue. Is this reasonable ?

I would expect the precise case you're running into is rare: the second
iteration of the loop does nothing useful unless the IR specifically has
an i1 phi node in a block whose predecessors were erased.  And the
default optimization pipeline runs SimplifyCFG at the very end, which is
close to CodeGenPrepare, so the CFG simplification will usually be a
no-op anyway.

We really shouldn't be doing this sort of folding in CodeGenPrepare in
the first place.  It looks like it was added to work around the fact
that we we lower llvm.objectsize later than we should.

-Eli

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev