[llvm-dev] Question about canonicalizing cmp+select

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] Question about canonicalizing cmp+select

Tom Stellard via llvm-dev
Hi, Sanjay/all,

  I noticed in rL331486 that some compare-select optimizations are disabled in favor of providing canonicalized cmp+select to the backend.

  I am currently working on a private backend target, and the target has a small code size limit.  With this change, some of the apps went over the codesize limit.  As an example,

C code:
  b = (a > -1) ? 4 : 5;

ll code:
Before rL331486:
  %0 = lshr i16 %a.0.a.0., 15
  %1 = or i16 %0, 4

After rL331486:
  %cmp = icmp sgt i16 %a.0.a.0., -1
  %cond = select i1 %cmp, i16 4, i16 5

  With the various encoding restrictions of my particular target, the cmp/select generated slightly larger code size.  However, because the apps were very close to the code size limit, this slight change pushed them over the limit.

  If I still prefer to have this optimization performed, then is my best strategy moving forward to implement this optimization as a peephole opt in my backend?

Thanks,
--Yuan

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Question about canonicalizing cmp+select

Tom Stellard via llvm-dev
On Tue, Jul 3, 2018 at 9:06 PM, Yuan Lin via llvm-dev
<[hidden email]> wrote:

> Hi, Sanjay/all,
>
>   I noticed in rL331486 that some compare-select optimizations are disabled
> in favor of providing canonicalized cmp+select to the backend.
>
>   I am currently working on a private backend target, and the target has a
> small code size limit.  With this change, some of the apps went over the
> codesize limit.  As an example,
>
> C code:
>   b = (a > -1) ? 4 : 5;
>
> ll code:
> Before rL331486:
>   %0 = lshr i16 %a.0.a.0., 15
>   %1 = or i16 %0, 4
>
> After rL331486:
>   %cmp = icmp sgt i16 %a.0.a.0., -1
>   %cond = select i1 %cmp, i16 4, i16 5
>
>   With the various encoding restrictions of my particular target, the
> cmp/select generated slightly larger code size.  However, because the apps
> were very close to the code size limit, this slight change pushed them over
> the limit.
>
>   If I still prefer to have this optimization performed, then is my best
> strategy moving forward to implement this optimization as a peephole opt in
> my backend?
I personally think it should probably be a DAGCombine transform,
driven by a Target Transform Info hook (e.g. like hasAndNot()).

> Thanks,
> --Yuan
Roman.

> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Question about canonicalizing cmp+select

Tom Stellard via llvm-dev
Do you run DAGCombiner? And are you overriding TLI.convertSelectOfConstantsToMath(VT) for your target?

For the stated example (true val and false val constants in the select differ by 1), that should already be converted.


On Tue, Jul 3, 2018 at 12:13 PM, Roman Lebedev <[hidden email]> wrote:
On Tue, Jul 3, 2018 at 9:06 PM, Yuan Lin via llvm-dev
<[hidden email]> wrote:
> Hi, Sanjay/all,
>
>   I noticed in rL331486 that some compare-select optimizations are disabled
> in favor of providing canonicalized cmp+select to the backend.
>
>   I am currently working on a private backend target, and the target has a
> small code size limit.  With this change, some of the apps went over the
> codesize limit.  As an example,
>
> C code:
>   b = (a > -1) ? 4 : 5;
>
> ll code:
> Before rL331486:
>   %0 = lshr i16 %a.0.a.0., 15
>   %1 = or i16 %0, 4
>
> After rL331486:
>   %cmp = icmp sgt i16 %a.0.a.0., -1
>   %cond = select i1 %cmp, i16 4, i16 5
>
>   With the various encoding restrictions of my particular target, the
> cmp/select generated slightly larger code size.  However, because the apps
> were very close to the code size limit, this slight change pushed them over
> the limit.
>
>   If I still prefer to have this optimization performed, then is my best
> strategy moving forward to implement this optimization as a peephole opt in
> my backend?
I personally think it should probably be a DAGCombine transform,
driven by a Target Transform Info hook (e.g. like hasAndNot()).

> Thanks,
> --Yuan
Roman.

> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Question about canonicalizing cmp+select

Tom Stellard via llvm-dev
[adding back llvm-dev and cc'ing Craig]

I think you are asking if we are missing a fold (or your target is missing enabling another hook) to transform the sext+add into shift+or? I think the answer is 'yes'. We probably should add that fold. This seems like a similar case as the recent: https://reviews.llvm.org/D48466

Note that on x86, the sext+add becomes zext+sub:
        t20: i8 = setcc t3, Constant:i16<-1>, setgt:ch
      t24: i16 = zero_extend t20
    t17: i16 = sub Constant:i16<5>, t24

Would that transform help your target?

On Tue, Jul 3, 2018 at 3:55 PM, Yuan Lin <[hidden email]> wrote:
Hi, Roman and Sanjay,

  Thank you for your reply!  We currently do run DAGCombiner, but didn't implement this specific transformation.  I just tried turning on convertSelectOfCosntantsToMath() in our ISelLowering, but that doesn't quite work because it generated a sign_extend op from i1 to i16, which our backend currently doesn't support.  

  Does the DAGCombiner already has this transformation implemented?

Thanks,
--Yuan

On Tue, Jul 3, 2018 at 11:22 AM Sanjay Patel <[hidden email]> wrote:
Do you run DAGCombiner? And are you overriding TLI.convertSelectOfConstantsToMath(VT) for your target?

For the stated example (true val and false val constants in the select differ by 1), that should already be converted.


On Tue, Jul 3, 2018 at 12:13 PM, Roman Lebedev <[hidden email]> wrote:
On Tue, Jul 3, 2018 at 9:06 PM, Yuan Lin via llvm-dev
<[hidden email]> wrote:
> Hi, Sanjay/all,
>
>   I noticed in rL331486 that some compare-select optimizations are disabled
> in favor of providing canonicalized cmp+select to the backend.
>
>   I am currently working on a private backend target, and the target has a
> small code size limit.  With this change, some of the apps went over the
> codesize limit.  As an example,
>
> C code:
>   b = (a > -1) ? 4 : 5;
>
> ll code:
> Before rL331486:
>   %0 = lshr i16 %a.0.a.0., 15
>   %1 = or i16 %0, 4
>
> After rL331486:
>   %cmp = icmp sgt i16 %a.0.a.0., -1
>   %cond = select i1 %cmp, i16 4, i16 5
>
>   With the various encoding restrictions of my particular target, the
> cmp/select generated slightly larger code size.  However, because the apps
> were very close to the code size limit, this slight change pushed them over
> the limit.
>
>   If I still prefer to have this optimization performed, then is my best
> strategy moving forward to implement this optimization as a peephole opt in
> my backend?
I personally think it should probably be a DAGCombine transform,
driven by a Target Transform Info hook (e.g. like hasAndNot()).

> Thanks,
> --Yuan
Roman.

> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Question about canonicalizing cmp+select

Tom Stellard via llvm-dev
I linked the wrong patch review. Here's the patch that was actually committed:

On Tue, Jul 3, 2018 at 4:39 PM, Sanjay Patel <[hidden email]> wrote:
[adding back llvm-dev and cc'ing Craig]

I think you are asking if we are missing a fold (or your target is missing enabling another hook) to transform the sext+add into shift+or? I think the answer is 'yes'. We probably should add that fold. This seems like a similar case as the recent: https://reviews.llvm.org/D48466

Note that on x86, the sext+add becomes zext+sub:
        t20: i8 = setcc t3, Constant:i16<-1>, setgt:ch
      t24: i16 = zero_extend t20
    t17: i16 = sub Constant:i16<5>, t24

Would that transform help your target?

On Tue, Jul 3, 2018 at 3:55 PM, Yuan Lin <[hidden email]> wrote:
Hi, Roman and Sanjay,

  Thank you for your reply!  We currently do run DAGCombiner, but didn't implement this specific transformation.  I just tried turning on convertSelectOfCosntantsToMath() in our ISelLowering, but that doesn't quite work because it generated a sign_extend op from i1 to i16, which our backend currently doesn't support.  

  Does the DAGCombiner already has this transformation implemented?

Thanks,
--Yuan

On Tue, Jul 3, 2018 at 11:22 AM Sanjay Patel <[hidden email]> wrote:
Do you run DAGCombiner? And are you overriding TLI.convertSelectOfConstantsToMath(VT) for your target?

For the stated example (true val and false val constants in the select differ by 1), that should already be converted.


On Tue, Jul 3, 2018 at 12:13 PM, Roman Lebedev <[hidden email]> wrote:
On Tue, Jul 3, 2018 at 9:06 PM, Yuan Lin via llvm-dev
<[hidden email]> wrote:
> Hi, Sanjay/all,
>
>   I noticed in rL331486 that some compare-select optimizations are disabled
> in favor of providing canonicalized cmp+select to the backend.
>
>   I am currently working on a private backend target, and the target has a
> small code size limit.  With this change, some of the apps went over the
> codesize limit.  As an example,
>
> C code:
>   b = (a > -1) ? 4 : 5;
>
> ll code:
> Before rL331486:
>   %0 = lshr i16 %a.0.a.0., 15
>   %1 = or i16 %0, 4
>
> After rL331486:
>   %cmp = icmp sgt i16 %a.0.a.0., -1
>   %cond = select i1 %cmp, i16 4, i16 5
>
>   With the various encoding restrictions of my particular target, the
> cmp/select generated slightly larger code size.  However, because the apps
> were very close to the code size limit, this slight change pushed them over
> the limit.
>
>   If I still prefer to have this optimization performed, then is my best
> strategy moving forward to implement this optimization as a peephole opt in
> my backend?
I personally think it should probably be a DAGCombine transform,
driven by a Target Transform Info hook (e.g. like hasAndNot()).

> Thanks,
> --Yuan
Roman.

> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Question about canonicalizing cmp+select

Tom Stellard via llvm-dev
While its true select convertSelectOfCosntantsToMath() creates add+sext i1, there's a transform in DAGCombiner::visitADDLike that does "add (sext i1), X -> sub X, (zext i1)". Why didn't that fire for your target?

~Craig


On Tue, Jul 3, 2018 at 3:44 PM Sanjay Patel <[hidden email]> wrote:
I linked the wrong patch review. Here's the patch that was actually committed:

On Tue, Jul 3, 2018 at 4:39 PM, Sanjay Patel <[hidden email]> wrote:
[adding back llvm-dev and cc'ing Craig]

I think you are asking if we are missing a fold (or your target is missing enabling another hook) to transform the sext+add into shift+or? I think the answer is 'yes'. We probably should add that fold. This seems like a similar case as the recent: https://reviews.llvm.org/D48466

Note that on x86, the sext+add becomes zext+sub:
        t20: i8 = setcc t3, Constant:i16<-1>, setgt:ch
      t24: i16 = zero_extend t20
    t17: i16 = sub Constant:i16<5>, t24

Would that transform help your target?

On Tue, Jul 3, 2018 at 3:55 PM, Yuan Lin <[hidden email]> wrote:
Hi, Roman and Sanjay,

  Thank you for your reply!  We currently do run DAGCombiner, but didn't implement this specific transformation.  I just tried turning on convertSelectOfCosntantsToMath() in our ISelLowering, but that doesn't quite work because it generated a sign_extend op from i1 to i16, which our backend currently doesn't support.  

  Does the DAGCombiner already has this transformation implemented?

Thanks,
--Yuan

On Tue, Jul 3, 2018 at 11:22 AM Sanjay Patel <[hidden email]> wrote:
Do you run DAGCombiner? And are you overriding TLI.convertSelectOfConstantsToMath(VT) for your target?

For the stated example (true val and false val constants in the select differ by 1), that should already be converted.


On Tue, Jul 3, 2018 at 12:13 PM, Roman Lebedev <[hidden email]> wrote:
On Tue, Jul 3, 2018 at 9:06 PM, Yuan Lin via llvm-dev
<[hidden email]> wrote:
> Hi, Sanjay/all,
>
>   I noticed in rL331486 that some compare-select optimizations are disabled
> in favor of providing canonicalized cmp+select to the backend.
>
>   I am currently working on a private backend target, and the target has a
> small code size limit.  With this change, some of the apps went over the
> codesize limit.  As an example,
>
> C code:
>   b = (a > -1) ? 4 : 5;
>
> ll code:
> Before rL331486:
>   %0 = lshr i16 %a.0.a.0., 15
>   %1 = or i16 %0, 4
>
> After rL331486:
>   %cmp = icmp sgt i16 %a.0.a.0., -1
>   %cond = select i1 %cmp, i16 4, i16 5
>
>   With the various encoding restrictions of my particular target, the
> cmp/select generated slightly larger code size.  However, because the apps
> were very close to the code size limit, this slight change pushed them over
> the limit.
>
>   If I still prefer to have this optimization performed, then is my best
> strategy moving forward to implement this optimization as a peephole opt in
> my backend?
I personally think it should probably be a DAGCombine transform,
driven by a Target Transform Info hook (e.g. like hasAndNot()).

> Thanks,
> --Yuan
Roman.

> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Question about canonicalizing cmp+select

Tom Stellard via llvm-dev
Hi,

  "While its true select convertSelectOfCosntantsToMath() creates add+sext i1, there's a transform in DAGCombiner::visitADDLike that does "add (sext i1), X -> sub X, (zext i1)". Why didn't that fire for your target?"

  That's a good question! We never explicitly marked sext-on-i1 as a custom op.  And I think the default behavior is "legal" (Am I right on that?)  So the following transform never gets executed because it thinks sext-on-i1 as legal op.

  // add (sext i1), X -> sub X, (zext i1)
  if (N0.getOpcode() == ISD::SIGN_EXTEND &&
      N0.getOperand(0).getValueType() == MVT::i1 &&
      !TLI.isOperationLegal(ISD::SIGN_EXTEND, MVT::i1)) {
    SDValue ZExt = DAG.getNode(ISD::ZERO_EXTEND, DL, VT, N0.getOperand(0));
    return DAG.getNode(ISD::SUB, DL, VT, N1, ZExt);
  }

  I will try to change the isel behavior sext-on-i1, and see if that fixes the issue.


On Tue, Jul 3, 2018 at 3:47 PM Craig Topper <[hidden email]> wrote:
While its true select convertSelectOfCosntantsToMath() creates add+sext i1, there's a transform in DAGCombiner::visitADDLike that does "add (sext i1), X -> sub X, (zext i1)". Why didn't that fire for your target?

~Craig


On Tue, Jul 3, 2018 at 3:44 PM Sanjay Patel <[hidden email]> wrote:
I linked the wrong patch review. Here's the patch that was actually committed:

On Tue, Jul 3, 2018 at 4:39 PM, Sanjay Patel <[hidden email]> wrote:
[adding back llvm-dev and cc'ing Craig]

I think you are asking if we are missing a fold (or your target is missing enabling another hook) to transform the sext+add into shift+or? I think the answer is 'yes'. We probably should add that fold. This seems like a similar case as the recent: https://reviews.llvm.org/D48466

Note that on x86, the sext+add becomes zext+sub:
        t20: i8 = setcc t3, Constant:i16<-1>, setgt:ch
      t24: i16 = zero_extend t20
    t17: i16 = sub Constant:i16<5>, t24

Would that transform help your target?

On Tue, Jul 3, 2018 at 3:55 PM, Yuan Lin <[hidden email]> wrote:
Hi, Roman and Sanjay,

  Thank you for your reply!  We currently do run DAGCombiner, but didn't implement this specific transformation.  I just tried turning on convertSelectOfCosntantsToMath() in our ISelLowering, but that doesn't quite work because it generated a sign_extend op from i1 to i16, which our backend currently doesn't support.  

  Does the DAGCombiner already has this transformation implemented?

Thanks,
--Yuan

On Tue, Jul 3, 2018 at 11:22 AM Sanjay Patel <[hidden email]> wrote:
Do you run DAGCombiner? And are you overriding TLI.convertSelectOfConstantsToMath(VT) for your target?

For the stated example (true val and false val constants in the select differ by 1), that should already be converted.


On Tue, Jul 3, 2018 at 12:13 PM, Roman Lebedev <[hidden email]> wrote:
On Tue, Jul 3, 2018 at 9:06 PM, Yuan Lin via llvm-dev
<[hidden email]> wrote:
> Hi, Sanjay/all,
>
>   I noticed in rL331486 that some compare-select optimizations are disabled
> in favor of providing canonicalized cmp+select to the backend.
>
>   I am currently working on a private backend target, and the target has a
> small code size limit.  With this change, some of the apps went over the
> codesize limit.  As an example,
>
> C code:
>   b = (a > -1) ? 4 : 5;
>
> ll code:
> Before rL331486:
>   %0 = lshr i16 %a.0.a.0., 15
>   %1 = or i16 %0, 4
>
> After rL331486:
>   %cmp = icmp sgt i16 %a.0.a.0., -1
>   %cond = select i1 %cmp, i16 4, i16 5
>
>   With the various encoding restrictions of my particular target, the
> cmp/select generated slightly larger code size.  However, because the apps
> were very close to the code size limit, this slight change pushed them over
> the limit.
>
>   If I still prefer to have this optimization performed, then is my best
> strategy moving forward to implement this optimization as a peephole opt in
> my backend?
I personally think it should probably be a DAGCombine transform,
driven by a Target Transform Info hook (e.g. like hasAndNot()).

> Thanks,
> --Yuan
Roman.

> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Question about canonicalizing cmp+select

Tom Stellard via llvm-dev
Yuan -

I've added the following DAG transforms that try to improve codegen for cmp+select patterns:

Please let me know if that helped your target and/or if there are other patterns that we need to improve.


On Tue, Jul 3, 2018 at 7:25 PM, Yuan Lin <[hidden email]> wrote:
Hi,

  "While its true select convertSelectOfCosntantsToMath() creates add+sext i1, there's a transform in DAGCombiner::visitADDLike that does "add (sext i1), X -> sub X, (zext i1)". Why didn't that fire for your target?"

  That's a good question! We never explicitly marked sext-on-i1 as a custom op.  And I think the default behavior is "legal" (Am I right on that?)  So the following transform never gets executed because it thinks sext-on-i1 as legal op.

  // add (sext i1), X -> sub X, (zext i1)
  if (N0.getOpcode() == ISD::SIGN_EXTEND &&
      N0.getOperand(0).getValueType() == MVT::i1 &&
      !TLI.isOperationLegal(ISD::SIGN_EXTEND, MVT::i1)) {
    SDValue ZExt = DAG.getNode(ISD::ZERO_EXTEND, DL, VT, N0.getOperand(0));
    return DAG.getNode(ISD::SUB, DL, VT, N1, ZExt);
  }

  I will try to change the isel behavior sext-on-i1, and see if that fixes the issue.


On Tue, Jul 3, 2018 at 3:47 PM Craig Topper <[hidden email]> wrote:
While its true select convertSelectOfCosntantsToMath() creates add+sext i1, there's a transform in DAGCombiner::visitADDLike that does "add (sext i1), X -> sub X, (zext i1)". Why didn't that fire for your target?

~Craig


On Tue, Jul 3, 2018 at 3:44 PM Sanjay Patel <[hidden email]> wrote:
I linked the wrong patch review. Here's the patch that was actually committed:

On Tue, Jul 3, 2018 at 4:39 PM, Sanjay Patel <[hidden email]> wrote:
[adding back llvm-dev and cc'ing Craig]

I think you are asking if we are missing a fold (or your target is missing enabling another hook) to transform the sext+add into shift+or? I think the answer is 'yes'. We probably should add that fold. This seems like a similar case as the recent: https://reviews.llvm.org/D48466

Note that on x86, the sext+add becomes zext+sub:
        t20: i8 = setcc t3, Constant:i16<-1>, setgt:ch
      t24: i16 = zero_extend t20
    t17: i16 = sub Constant:i16<5>, t24

Would that transform help your target?

On Tue, Jul 3, 2018 at 3:55 PM, Yuan Lin <[hidden email]> wrote:
Hi, Roman and Sanjay,

  Thank you for your reply!  We currently do run DAGCombiner, but didn't implement this specific transformation.  I just tried turning on convertSelectOfCosntantsToMath() in our ISelLowering, but that doesn't quite work because it generated a sign_extend op from i1 to i16, which our backend currently doesn't support.  

  Does the DAGCombiner already has this transformation implemented?

Thanks,
--Yuan

On Tue, Jul 3, 2018 at 11:22 AM Sanjay Patel <[hidden email]> wrote:
Do you run DAGCombiner? And are you overriding TLI.convertSelectOfConstantsToMath(VT) for your target?

For the stated example (true val and false val constants in the select differ by 1), that should already be converted.


On Tue, Jul 3, 2018 at 12:13 PM, Roman Lebedev <[hidden email]> wrote:
On Tue, Jul 3, 2018 at 9:06 PM, Yuan Lin via llvm-dev
<[hidden email]> wrote:
> Hi, Sanjay/all,
>
>   I noticed in rL331486 that some compare-select optimizations are disabled
> in favor of providing canonicalized cmp+select to the backend.
>
>   I am currently working on a private backend target, and the target has a
> small code size limit.  With this change, some of the apps went over the
> codesize limit.  As an example,
>
> C code:
>   b = (a > -1) ? 4 : 5;
>
> ll code:
> Before rL331486:
>   %0 = lshr i16 %a.0.a.0., 15
>   %1 = or i16 %0, 4
>
> After rL331486:
>   %cmp = icmp sgt i16 %a.0.a.0., -1
>   %cond = select i1 %cmp, i16 4, i16 5
>
>   With the various encoding restrictions of my particular target, the
> cmp/select generated slightly larger code size.  However, because the apps
> were very close to the code size limit, this slight change pushed them over
> the limit.
>
>   If I still prefer to have this optimization performed, then is my best
> strategy moving forward to implement this optimization as a peephole opt in
> my backend?
I personally think it should probably be a DAGCombine transform,
driven by a Target Transform Info hook (e.g. like hasAndNot()).

> Thanks,
> --Yuan
Roman.

> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>





_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev