[llvm-dev] InstCombine: Narrow switch instructions using known bits

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] InstCombine: Narrow switch instructions using known bits

韩玉 via llvm-dev

Hello,

 

I found one case where narrowing switch instructions transformation in InstCombine produces worse code.

Let's suppose that I have such code:

              

$ cat a.c

void foo();

void bar();

void zoo();

 

void my_func(unsigned int a) {

    unsigned char b = a & 0xF;

    switch (b) {

        case 0:  foo(); break;

        case 1:  bar(); break;

        case 2:  foo(); break;

        case 3:  foo(); break;

        case 4:  foo(); break;

        case 5:  bar(); break;

        case 6:  foo(); break;

        case 7:  foo(); break;

        case 8:  bar(); break;

        case 9:  foo(); break;

        case 10: foo(); break;

 

        default: zoo();

    }

}

 

Using recent clang:

 

$ clang -O3 -S -c a.c -o a.s

I have the following assembly in the beginning of my_func:

# bad case

        movl    %edi, %eax

        andb    $15, %al

        cmpb    $10, %al

        ja      .LBB0_9                           # jump to the default case

 

        andl    $15, %edi

        jmpq    *.LJTI0_0(,%rdi,8)      # go to jump table

                             

I found that if I disable switch shrinking like shown below:

 

$ git diff

diff --git a/lib/Transforms/InstCombine/InstructionCombining.cpp b/lib/Transforms/InstCombine/InstructionCombining.cpp

index 5d5a9b2..3682b88 100644

--- a/lib/Transforms/InstCombine/InstructionCombining.cpp

+++ b/lib/Transforms/InstCombine/InstructionCombining.cpp

@@ -2429,6 +2429,8 @@ Instruction *InstCombiner::visitSwitchInst(SwitchInst &SI) {

     return &SI;

   }

 

+  return nullptr;

+

   KnownBits Known = computeKnownBits(Cond, 0, &SI);

   unsigned LeadingKnownZeros = Known.countMinLeadingZeros();

   unsigned LeadingKnownOnes = Known.countMinLeadingOnes();

  

I get better assembly (there is no additional MOV and AND instructions):

# good case

        andl    $15, %edi

        cmpl    $10, %edi

        ja      .LBB0_9

 

        jmpq    *.LJTI0_0(,%rdi,8)

 

This transformation was introduced in the commit:

commit 4eb03123dfda2de88a84852834845678833c8c36

Author: Akira Hatanaka <[hidden email]>

Date:   Thu Oct 16 06:00:46 2014 +0000

    Reapply r219832 - InstCombine: Narrow switch instructions using known bits.

 

From IR point of view, after the ‘opt –codegenprepare’ the difference is that in good case we have simple ‘and’ operation (all calculations are made in i32). In the bad case there is 'trunc' to i4 and then ‘zext’ to i8.

 

During instruction selection we expand switch into a jump table. In the bad case we use 2 copies of the value that we are switching on. First is in i8 that we use to determine whether we should jump to default case. The second is in i64, which we use for calculating address in the jump table. In the good case they were combined.

 

But there is still one thing that I don’t understand. What is the bad case that this transformation (narrowing switch instructions) was supposed to fix, i.e. does this transformation still make sense?

 

Best regards,
Denis Bakhvalov.

 


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] InstCombine: Narrow switch instructions using known bits

韩玉 via llvm-dev
On Fri, Sep 21, 2018 at 7:02 AM Bakhvalov, Denis via llvm-dev <[hidden email]> wrote:

Hello,

 

I found one case where narrowing switch instructions transformation in InstCombine produces worse code.

Let's suppose that I have such code:

              

$ cat a.c

void foo();

void bar();

void zoo();

 

void my_func(unsigned int a) {

    unsigned char b = a & 0xF;

    switch (b) {

        case 0:  foo(); break;

        case 1:  bar(); break;

        case 2:  foo(); break;

        case 3:  foo(); break;

        case 4:  foo(); break;

        case 5:  bar(); break;

        case 6:  foo(); break;

        case 7:  foo(); break;

        case 8:  bar(); break;

        case 9:  foo(); break;

        case 10: foo(); break;

 

        default: zoo();

    }

}

 

Using recent clang:

 

$ clang -O3 -S -c a.c -o a.s

I have the following assembly in the beginning of my_func:

# bad case

        movl    %edi, %eax

        andb    $15, %al

        cmpb    $10, %al

        ja      .LBB0_9                           # jump to the default case

 

        andl    $15, %edi

        jmpq    *.LJTI0_0(,%rdi,8)      # go to jump table

                             

I found that if I disable switch shrinking like shown below:

 

$ git diff

diff --git a/lib/Transforms/InstCombine/InstructionCombining.cpp b/lib/Transforms/InstCombine/InstructionCombining.cpp

index 5d5a9b2..3682b88 100644

--- a/lib/Transforms/InstCombine/InstructionCombining.cpp

+++ b/lib/Transforms/InstCombine/InstructionCombining.cpp

@@ -2429,6 +2429,8 @@ Instruction *InstCombiner::visitSwitchInst(SwitchInst &SI) {

     return &SI;

   }

 

+  return nullptr;

+

   KnownBits Known = computeKnownBits(Cond, 0, &SI);

   unsigned LeadingKnownZeros = Known.countMinLeadingZeros();

   unsigned LeadingKnownOnes = Known.countMinLeadingOnes();

  

I get better assembly (there is no additional MOV and AND instructions):

# good case

        andl    $15, %edi

        cmpl    $10, %edi

        ja      .LBB0_9

 

        jmpq    *.LJTI0_0(,%rdi,8)

 

This transformation was introduced in the commit:

commit 4eb03123dfda2de88a84852834845678833c8c36

Author: Akira Hatanaka <[hidden email]>

Date:   Thu Oct 16 06:00:46 2014 +0000

    Reapply r219832 - InstCombine: Narrow switch instructions using known bits.

 

From IR point of view, after the ‘opt –codegenprepare’ the difference is that in good case we have simple ‘and’ operation (all calculations are made in i32). In the bad case there is 'trunc' to i4 and then ‘zext’ to i8.

 

During instruction selection we expand switch into a jump table. In the bad case we use 2 copies of the value that we are switching on. First is in i8 that we use to determine whether we should jump to default case. The second is in i64, which we use for calculating address in the jump table. In the good case they were combined.

 

But there is still one thing that I don’t understand. What is the bad case that this transformation (narrowing switch instructions) was supposed to fix, i.e. does this transformation still make sense?

 


I couldn't find the original test case, but the patch was committed to fix a switch over a 128-bit value that was causing llvm to generate suboptimal code in some cases. I'm not sure whether this optimization is still necessary today.
 

Best regards,
Denis Bakhvalov.

 

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] InstCombine: Narrow switch instructions using known bits

韩玉 via llvm-dev

Hi Akira,

Can you maybe remember or come up with any example where this transformation still helps today?

If no such case and no objections on disabling/removing it, I can start working on that.

 

Best regards,
Denis Bakhvalov.

 

From: Akira Hatanaka [mailto:[hidden email]]
Sent: Wednesday, October 10, 2018 12:46 PM
To: Bakhvalov, Denis <[hidden email]>
Cc: llvm-dev <[hidden email]>; Akira Hatanaka <[hidden email]>
Subject: Re: [llvm-dev] InstCombine: Narrow switch instructions using known bits

 

On Fri, Sep 21, 2018 at 7:02 AM Bakhvalov, Denis via llvm-dev <[hidden email]> wrote:

Hello,

 

I found one case where narrowing switch instructions transformation in InstCombine produces worse code.

Let's suppose that I have such code:

              

$ cat a.c

void foo();

void bar();

void zoo();

 

void my_func(unsigned int a) {

    unsigned char b = a & 0xF;

    switch (b) {

        case 0:  foo(); break;

        case 1:  bar(); break;

        case 2:  foo(); break;

        case 3:  foo(); break;

        case 4:  foo(); break;

        case 5:  bar(); break;

        case 6:  foo(); break;

        case 7:  foo(); break;

        case 8:  bar(); break;

        case 9:  foo(); break;

        case 10: foo(); break;

 

        default: zoo();

    }

}

 

Using recent clang:

 

$ clang -O3 -S -c a.c -o a.s

I have the following assembly in the beginning of my_func:

# bad case

        movl    %edi, %eax

        andb    $15, %al

        cmpb    $10, %al

        ja      .LBB0_9                           # jump to the default case

 

        andl    $15, %edi

        jmpq    *.LJTI0_0(,%rdi,8)      # go to jump table

                             

I found that if I disable switch shrinking like shown below:

 

$ git diff

diff --git a/lib/Transforms/InstCombine/InstructionCombining.cpp b/lib/Transforms/InstCombine/InstructionCombining.cpp

index 5d5a9b2..3682b88 100644

--- a/lib/Transforms/InstCombine/InstructionCombining.cpp

+++ b/lib/Transforms/InstCombine/InstructionCombining.cpp

@@ -2429,6 +2429,8 @@ Instruction *InstCombiner::visitSwitchInst(SwitchInst &SI) {

     return &SI;

   }

 

+  return nullptr;

+

   KnownBits Known = computeKnownBits(Cond, 0, &SI);

   unsigned LeadingKnownZeros = Known.countMinLeadingZeros();

   unsigned LeadingKnownOnes = Known.countMinLeadingOnes();

  

I get better assembly (there is no additional MOV and AND instructions):

# good case

        andl    $15, %edi

        cmpl    $10, %edi

        ja      .LBB0_9

 

        jmpq    *.LJTI0_0(,%rdi,8)

 

This transformation was introduced in the commit:

commit 4eb03123dfda2de88a84852834845678833c8c36

Author: Akira Hatanaka <[hidden email]>

Date:   Thu Oct 16 06:00:46 2014 +0000

    Reapply r219832 - InstCombine: Narrow switch instructions using known bits.

 

From IR point of view, after the ‘opt –codegenprepare’ the difference is that in good case we have simple ‘and’ operation (all calculations are made in i32). In the bad case there is 'trunc' to i4 and then ‘zext’ to i8.

 

During instruction selection we expand switch into a jump table. In the bad case we use 2 copies of the value that we are switching on. First is in i8 that we use to determine whether we should jump to default case. The second is in i64, which we use for calculating address in the jump table. In the good case they were combined.

 

But there is still one thing that I don’t understand. What is the bad case that this transformation (narrowing switch instructions) was supposed to fix, i.e. does this transformation still make sense?

 

 

I couldn't find the original test case, but the patch was committed to fix a switch over a 128-bit value that was causing llvm to generate suboptimal code in some cases. I'm not sure whether this optimization is still necessary today.

 

Best regards,
Denis Bakhvalov.

 

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] InstCombine: Narrow switch instructions using known bits

韩玉 via llvm-dev
On Tue, Oct 23, 2018 at 8:02 PM Bakhvalov, Denis via llvm-dev
<[hidden email]> wrote:

>
> Hi Akira,
>
> Can you maybe remember or come up with any example where this transformation still helps today?
>
> If no such case and no objections on disabling/removing it, I can start working on that.
>
>
>
> Best regards,
> Denis Bakhvalov.
>
>
>
> From: Akira Hatanaka [mailto:[hidden email]]
> Sent: Wednesday, October 10, 2018 12:46 PM
> To: Bakhvalov, Denis <[hidden email]>
> Cc: llvm-dev <[hidden email]>; Akira Hatanaka <[hidden email]>
> Subject: Re: [llvm-dev] InstCombine: Narrow switch instructions using known bits
>
>
>
> On Fri, Sep 21, 2018 at 7:02 AM Bakhvalov, Denis via llvm-dev <[hidden email]> wrote:
>
> Hello,
>
>
>
> I found one case where narrowing switch instructions transformation in InstCombine produces worse code.
>
> Let's suppose that I have such code:
>
>
>
> $ cat a.c
>
> void foo();
>
> void bar();
>
> void zoo();
>
>
>
> void my_func(unsigned int a) {
>
>     unsigned char b = a & 0xF;
>
>     switch (b) {
>
>         case 0:  foo(); break;
>
>         case 1:  bar(); break;
>
>         case 2:  foo(); break;
>
>         case 3:  foo(); break;
>
>         case 4:  foo(); break;
>
>         case 5:  bar(); break;
>
>         case 6:  foo(); break;
>
>         case 7:  foo(); break;
>
>         case 8:  bar(); break;
>
>         case 9:  foo(); break;
>
>         case 10: foo(); break;
>
>
>
>         default: zoo();
>
>     }
>
> }
>
>
>
> Using recent clang:
>
>
>
> $ clang -O3 -S -c a.c -o a.s
>
> I have the following assembly in the beginning of my_func:
>
> # bad case
>
>         movl    %edi, %eax
>
>         andb    $15, %al
>
>         cmpb    $10, %al
>
>         ja      .LBB0_9                           # jump to the default case
>
>
>
>         andl    $15, %edi
>
>         jmpq    *.LJTI0_0(,%rdi,8)      # go to jump table
>
>
>
> I found that if I disable switch shrinking like shown below:
>
>
>
> $ git diff
>
> diff --git a/lib/Transforms/InstCombine/InstructionCombining.cpp b/lib/Transforms/InstCombine/InstructionCombining.cpp
>
> index 5d5a9b2..3682b88 100644
>
> --- a/lib/Transforms/InstCombine/InstructionCombining.cpp
>
> +++ b/lib/Transforms/InstCombine/InstructionCombining.cpp
>
> @@ -2429,6 +2429,8 @@ Instruction *InstCombiner::visitSwitchInst(SwitchInst &SI) {
>
>      return &SI;
>
>    }
>
>
>
> +  return nullptr;
>
> +
>
>    KnownBits Known = computeKnownBits(Cond, 0, &SI);
>
>    unsigned LeadingKnownZeros = Known.countMinLeadingZeros();
>
>    unsigned LeadingKnownOnes = Known.countMinLeadingOnes();
>
>
>
> I get better assembly (there is no additional MOV and AND instructions):
>
> # good case
>
>         andl    $15, %edi
>
>         cmpl    $10, %edi
>
>         ja      .LBB0_9
>
>
>
>         jmpq    *.LJTI0_0(,%rdi,8)
It's not always good to compare assembly when talking about middle-end
transforms.
If something is good for middle-end (which is *likely* the case here),
and is bad for back-ends,
then it is usually a back-end problem to deal with it.
In other words, when talking about middle-end, it might be best to
look at the LLVM IR.

> This transformation was introduced in the commit:
>
> commit 4eb03123dfda2de88a84852834845678833c8c36
>
> Author: Akira Hatanaka <[hidden email]>
>
> Date:   Thu Oct 16 06:00:46 2014 +0000
>
>     Reapply r219832 - InstCombine: Narrow switch instructions using known bits.
>
>
>
> From IR point of view, after the ‘opt –codegenprepare’ the difference is that in good case we have simple ‘and’ operation (all calculations are made in i32). In the bad case there is 'trunc' to i4 and then ‘zext’ to i8.
>
>
>
> During instruction selection we expand switch into a jump table. In the bad case we use 2 copies of the value that we are switching on. First is in i8 that we use to determine whether we should jump to default case. The second is in i64, which we use for calculating address in the jump table. In the good case they were combined.
>
>
>
> But there is still one thing that I don’t understand. What is the bad case that this transformation (narrowing switch instructions) was supposed to fix, i.e. does this transformation still make sense?
>
>
>
>
>
> I couldn't find the original test case, but the patch was committed to fix a switch over a 128-bit value that was causing llvm to generate suboptimal code in some cases. I'm not sure whether this optimization is still necessary today.
>
>
>
> Best regards,
> Denis Bakhvalov.
Roman.

> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] InstCombine: Narrow switch instructions using known bits

韩玉 via llvm-dev
Sorry for not seeing this sooner. I made this transform more aggressive here:

But the backend still doesn't deal with strange types optimally:

So at the least, I think we should limit the transform as it was in an earlier draft of D12965 to not shrink to weird illegal types (the code should use "shouldChangeType()").

I agree with Roman that we should look at the IR diffs if we remove this completely. But this is a potentially unusual instcombine IIRC because it can create an extra instruction (the truncate) that didn't exist in the original IR.

Looking at some of the linked patches/bugs should tell us if we are regressing any of the motivating cases. My guess is that improvements to SimplifyCFG since then have made this instcombine transform less likely/necessary.

On Tue, Oct 23, 2018 at 11:12 AM Roman Lebedev via llvm-dev <[hidden email]> wrote:
On Tue, Oct 23, 2018 at 8:02 PM Bakhvalov, Denis via llvm-dev
<[hidden email]> wrote:
>
> Hi Akira,
>
> Can you maybe remember or come up with any example where this transformation still helps today?
>
> If no such case and no objections on disabling/removing it, I can start working on that.
>
>
>
> Best regards,
> Denis Bakhvalov.
>
>
>
> From: Akira Hatanaka [mailto:[hidden email]]
> Sent: Wednesday, October 10, 2018 12:46 PM
> To: Bakhvalov, Denis <[hidden email]>
> Cc: llvm-dev <[hidden email]>; Akira Hatanaka <[hidden email]>
> Subject: Re: [llvm-dev] InstCombine: Narrow switch instructions using known bits
>
>
>
> On Fri, Sep 21, 2018 at 7:02 AM Bakhvalov, Denis via llvm-dev <[hidden email]> wrote:
>
> Hello,
>
>
>
> I found one case where narrowing switch instructions transformation in InstCombine produces worse code.
>
> Let's suppose that I have such code:
>
>
>
> $ cat a.c
>
> void foo();
>
> void bar();
>
> void zoo();
>
>
>
> void my_func(unsigned int a) {
>
>     unsigned char b = a & 0xF;
>
>     switch (b) {
>
>         case 0:  foo(); break;
>
>         case 1:  bar(); break;
>
>         case 2:  foo(); break;
>
>         case 3:  foo(); break;
>
>         case 4:  foo(); break;
>
>         case 5:  bar(); break;
>
>         case 6:  foo(); break;
>
>         case 7:  foo(); break;
>
>         case 8:  bar(); break;
>
>         case 9:  foo(); break;
>
>         case 10: foo(); break;
>
>
>
>         default: zoo();
>
>     }
>
> }
>
>
>
> Using recent clang:
>
>
>
> $ clang -O3 -S -c a.c -o a.s
>
> I have the following assembly in the beginning of my_func:
>
> # bad case
>
>         movl    %edi, %eax
>
>         andb    $15, %al
>
>         cmpb    $10, %al
>
>         ja      .LBB0_9                           # jump to the default case
>
>
>
>         andl    $15, %edi
>
>         jmpq    *.LJTI0_0(,%rdi,8)      # go to jump table
>
>
>
> I found that if I disable switch shrinking like shown below:
>
>
>
> $ git diff
>
> diff --git a/lib/Transforms/InstCombine/InstructionCombining.cpp b/lib/Transforms/InstCombine/InstructionCombining.cpp
>
> index 5d5a9b2..3682b88 100644
>
> --- a/lib/Transforms/InstCombine/InstructionCombining.cpp
>
> +++ b/lib/Transforms/InstCombine/InstructionCombining.cpp
>
> @@ -2429,6 +2429,8 @@ Instruction *InstCombiner::visitSwitchInst(SwitchInst &SI) {
>
>      return &SI;
>
>    }
>
>
>
> +  return nullptr;
>
> +
>
>    KnownBits Known = computeKnownBits(Cond, 0, &SI);
>
>    unsigned LeadingKnownZeros = Known.countMinLeadingZeros();
>
>    unsigned LeadingKnownOnes = Known.countMinLeadingOnes();
>
>
>
> I get better assembly (there is no additional MOV and AND instructions):
>
> # good case
>
>         andl    $15, %edi
>
>         cmpl    $10, %edi
>
>         ja      .LBB0_9
>
>
>
>         jmpq    *.LJTI0_0(,%rdi,8)
It's not always good to compare assembly when talking about middle-end
transforms.
If something is good for middle-end (which is *likely* the case here),
and is bad for back-ends,
then it is usually a back-end problem to deal with it.
In other words, when talking about middle-end, it might be best to
look at the LLVM IR.

> This transformation was introduced in the commit:
>
> commit 4eb03123dfda2de88a84852834845678833c8c36
>
> Author: Akira Hatanaka <[hidden email]>
>
> Date:   Thu Oct 16 06:00:46 2014 +0000
>
>     Reapply r219832 - InstCombine: Narrow switch instructions using known bits.
>
>
>
> From IR point of view, after the ‘opt –codegenprepare’ the difference is that in good case we have simple ‘and’ operation (all calculations are made in i32). In the bad case there is 'trunc' to i4 and then ‘zext’ to i8.
>
>
>
> During instruction selection we expand switch into a jump table. In the bad case we use 2 copies of the value that we are switching on. First is in i8 that we use to determine whether we should jump to default case. The second is in i64, which we use for calculating address in the jump table. In the good case they were combined.
>
>
>
> But there is still one thing that I don’t understand. What is the bad case that this transformation (narrowing switch instructions) was supposed to fix, i.e. does this transformation still make sense?
>
>
>
>
>
> I couldn't find the original test case, but the patch was committed to fix a switch over a 128-bit value that was causing llvm to generate suboptimal code in some cases. I'm not sure whether this optimization is still necessary today.
>
>
>
> Best regards,
> Denis Bakhvalov.
Roman.

> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] InstCombine: Narrow switch instructions using known bits

韩玉 via llvm-dev
In reply to this post by 韩玉 via llvm-dev
I don't have an IR that you can compile but the original IR we saw was something like this:

  %1 = trunc i128 %0 to i64
  %2 = zext i64 %1 to i128
  switch i128 %2, label %3 [
    i128 0, label %15
    i128 2, label %15
    i128 4, label %15
  ]

When I tested the patch four years ago, the optimization did reduce code size on x86 in some cases, but it's possible that this optimization isn't needed anymore today.


On Tue, Oct 23, 2018 at 10:01 AM Bakhvalov, Denis <[hidden email]> wrote:

Hi Akira,

Can you maybe remember or come up with any example where this transformation still helps today?

If no such case and no objections on disabling/removing it, I can start working on that.

 

Best regards,
Denis Bakhvalov.

 

From: Akira Hatanaka [mailto:[hidden email]]
Sent: Wednesday, October 10, 2018 12:46 PM
To: Bakhvalov, Denis <[hidden email]>
Cc: llvm-dev <[hidden email]>; Akira Hatanaka <[hidden email]>
Subject: Re: [llvm-dev] InstCombine: Narrow switch instructions using known bits

 

On Fri, Sep 21, 2018 at 7:02 AM Bakhvalov, Denis via llvm-dev <[hidden email]> wrote:

Hello,

 

I found one case where narrowing switch instructions transformation in InstCombine produces worse code.

Let's suppose that I have such code:

              

$ cat a.c

void foo();

void bar();

void zoo();

 

void my_func(unsigned int a) {

    unsigned char b = a & 0xF;

    switch (b) {

        case 0:  foo(); break;

        case 1:  bar(); break;

        case 2:  foo(); break;

        case 3:  foo(); break;

        case 4:  foo(); break;

        case 5:  bar(); break;

        case 6:  foo(); break;

        case 7:  foo(); break;

        case 8:  bar(); break;

        case 9:  foo(); break;

        case 10: foo(); break;

 

        default: zoo();

    }

}

 

Using recent clang:

 

$ clang -O3 -S -c a.c -o a.s

I have the following assembly in the beginning of my_func:

# bad case

        movl    %edi, %eax

        andb    $15, %al

        cmpb    $10, %al

        ja      .LBB0_9                           # jump to the default case

 

        andl    $15, %edi

        jmpq    *.LJTI0_0(,%rdi,8)      # go to jump table

                             

I found that if I disable switch shrinking like shown below:

 

$ git diff

diff --git a/lib/Transforms/InstCombine/InstructionCombining.cpp b/lib/Transforms/InstCombine/InstructionCombining.cpp

index 5d5a9b2..3682b88 100644

--- a/lib/Transforms/InstCombine/InstructionCombining.cpp

+++ b/lib/Transforms/InstCombine/InstructionCombining.cpp

@@ -2429,6 +2429,8 @@ Instruction *InstCombiner::visitSwitchInst(SwitchInst &SI) {

     return &SI;

   }

 

+  return nullptr;

+

   KnownBits Known = computeKnownBits(Cond, 0, &SI);

   unsigned LeadingKnownZeros = Known.countMinLeadingZeros();

   unsigned LeadingKnownOnes = Known.countMinLeadingOnes();

  

I get better assembly (there is no additional MOV and AND instructions):

# good case

        andl    $15, %edi

        cmpl    $10, %edi

        ja      .LBB0_9

 

        jmpq    *.LJTI0_0(,%rdi,8)

 

This transformation was introduced in the commit:

commit 4eb03123dfda2de88a84852834845678833c8c36

Author: Akira Hatanaka <[hidden email]>

Date:   Thu Oct 16 06:00:46 2014 +0000

    Reapply r219832 - InstCombine: Narrow switch instructions using known bits.

 

From IR point of view, after the ‘opt –codegenprepare’ the difference is that in good case we have simple ‘and’ operation (all calculations are made in i32). In the bad case there is 'trunc' to i4 and then ‘zext’ to i8.

 

During instruction selection we expand switch into a jump table. In the bad case we use 2 copies of the value that we are switching on. First is in i8 that we use to determine whether we should jump to default case. The second is in i64, which we use for calculating address in the jump table. In the good case they were combined.

 

But there is still one thing that I don’t understand. What is the bad case that this transformation (narrowing switch instructions) was supposed to fix, i.e. does this transformation still make sense?

 

 

I couldn't find the original test case, but the patch was committed to fix a switch over a 128-bit value that was causing llvm to generate suboptimal code in some cases. I'm not sure whether this optimization is still necessary today.

 

Best regards,
Denis Bakhvalov.

 

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] InstCombine: Narrow switch instructions using known bits

韩玉 via llvm-dev
In reply to this post by 韩玉 via llvm-dev

Hello Sanjay, Akira,

From IR pov this transformation creates additional truncate instruction that didn’t exist in the original IR (as Sanjay wrote).

I went through the links that Sanjay provided and now I’m stuck. :)
Should we now focus on backend (as described in Bug 29009)?

Or we say that this instcombine transform makes no sense today and we do something about it?

 

Best regards,
Denis Bakhvalov.

 

From: Sanjay Patel [mailto:[hidden email]]
Sent: Tuesday, October 23, 2018 11:20 AM
To: Roman Lebedev <[hidden email]>
Cc: Bakhvalov, Denis <[hidden email]>; llvm-dev <[hidden email]>; Akira Hatanaka <[hidden email]>
Subject: Re: [llvm-dev] InstCombine: Narrow switch instructions using known bits

 

Sorry for not seeing this sooner. I made this transform more aggressive here:

 

But the backend still doesn't deal with strange types optimally:

 

So at the least, I think we should limit the transform as it was in an earlier draft of D12965 to not shrink to weird illegal types (the code should use "shouldChangeType()").

 

I agree with Roman that we should look at the IR diffs if we remove this completely. But this is a potentially unusual instcombine IIRC because it can create an extra instruction (the truncate) that didn't exist in the original IR.

 

Looking at some of the linked patches/bugs should tell us if we are regressing any of the motivating cases. My guess is that improvements to SimplifyCFG since then have made this instcombine transform less likely/necessary.

 

On Tue, Oct 23, 2018 at 11:12 AM Roman Lebedev via llvm-dev <[hidden email]> wrote:

On Tue, Oct 23, 2018 at 8:02 PM Bakhvalov, Denis via llvm-dev
<
[hidden email]> wrote:
>
> Hi Akira,
>
> Can you maybe remember or come up with any example where this transformation still helps today?
>
> If no such case and no objections on disabling/removing it, I can start working on that.
>
>
>
> Best regards,
> Denis Bakhvalov.
>
>
>
> From: Akira Hatanaka [mailto:
[hidden email]]
> Sent: Wednesday, October 10, 2018 12:46 PM
> To: Bakhvalov, Denis <
[hidden email]>
> Cc: llvm-dev <
[hidden email]>; Akira Hatanaka <[hidden email]>
> Subject: Re: [llvm-dev] InstCombine: Narrow switch instructions using known bits
>
>
>
> On Fri, Sep 21, 2018 at 7:02 AM Bakhvalov, Denis via llvm-dev <
[hidden email]> wrote:
>
> Hello,
>
>
>
> I found one case where narrowing switch instructions transformation in InstCombine produces worse code.
>
> Let's suppose that I have such code:
>
>
>
> $ cat a.c
>
> void foo();
>
> void bar();
>
> void zoo();
>
>
>
> void my_func(unsigned int a) {
>
>     unsigned char b = a & 0xF;
>
>     switch (b) {
>
>         case 0:  foo(); break;
>
>         case 1:  bar(); break;
>
>         case 2:  foo(); break;
>
>         case 3:  foo(); break;
>
>         case 4:  foo(); break;
>
>         case 5:  bar(); break;
>
>         case 6:  foo(); break;
>
>         case 7:  foo(); break;
>
>         case 8:  bar(); break;
>
>         case 9:  foo(); break;
>
>         case 10: foo(); break;
>
>
>
>         default: zoo();
>
>     }
>
> }
>
>
>
> Using recent clang:
>
>
>
> $ clang -O3 -S -c a.c -o a.s
>
> I have the following assembly in the beginning of my_func:
>
> # bad case
>
>         movl    %edi, %eax
>
>         andb    $15, %al
>
>         cmpb    $10, %al
>
>         ja      .LBB0_9                           # jump to the default case
>
>
>
>         andl    $15, %edi
>
>         jmpq    *.LJTI0_0(,%rdi,8)      # go to jump table
>
>
>
> I found that if I disable switch shrinking like shown below:
>
>
>
> $ git diff
>
> diff --git a/lib/Transforms/InstCombine/InstructionCombining.cpp b/lib/Transforms/InstCombine/InstructionCombining.cpp
>
> index 5d5a9b2..3682b88 100644
>
> --- a/lib/Transforms/InstCombine/InstructionCombining.cpp
>
> +++ b/lib/Transforms/InstCombine/InstructionCombining.cpp
>
> @@ -2429,6 +2429,8 @@ Instruction *InstCombiner::visitSwitchInst(SwitchInst &SI) {
>
>      return &SI;
>
>    }
>
>
>
> +  return nullptr;
>
> +
>
>    KnownBits Known = computeKnownBits(Cond, 0, &SI);
>
>    unsigned LeadingKnownZeros = Known.countMinLeadingZeros();
>
>    unsigned LeadingKnownOnes = Known.countMinLeadingOnes();
>
>
>
> I get better assembly (there is no additional MOV and AND instructions):
>
> # good case
>
>         andl    $15, %edi
>
>         cmpl    $10, %edi
>
>         ja      .LBB0_9
>
>
>
>         jmpq    *.LJTI0_0(,%rdi,8)
It's not always good to compare assembly when talking about middle-end
transforms.
If something is good for middle-end (which is *likely* the case here),
and is bad for back-ends,
then it is usually a back-end problem to deal with it.
In other words, when talking about middle-end, it might be best to
look at the LLVM IR.

> This transformation was introduced in the commit:
>
> commit 4eb03123dfda2de88a84852834845678833c8c36
>
> Author: Akira Hatanaka <
[hidden email]>
>
> Date:   Thu Oct 16 06:00:46 2014 +0000
>
>     Reapply r219832 - InstCombine: Narrow switch instructions using known bits.
>
>
>
> From IR point of view, after the ‘opt –codegenprepare’ the difference is that in good case we have simple ‘and’ operation (all calculations are made in i32). In the bad case there is 'trunc' to i4 and then ‘zext’ to i8.
>
>
>
> During instruction selection we expand switch into a jump table. In the bad case we use 2 copies of the value that we are switching on. First is in i8 that we use to determine whether we should jump to default case. The second is in i64, which we use for calculating address in the jump table. In the good case they were combined.
>
>
>
> But there is still one thing that I don’t understand. What is the bad case that this transformation (narrowing switch instructions) was supposed to fix, i.e. does this transformation still make sense?
>
>
>
>
>
> I couldn't find the original test case, but the patch was committed to fix a switch over a 128-bit value that was causing llvm to generate suboptimal code in some cases. I'm not sure whether this optimization is still necessary today.
>
>
>
> Best regards,
> Denis Bakhvalov.
Roman.

> _______________________________________________
> LLVM Developers mailing list
>
[hidden email]
>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
>
[hidden email]
>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] InstCombine: Narrow switch instructions using known bits

韩玉 via llvm-dev
I'd try to change this in 2-3 steps (if necessary):
1. Change instcombine to use shouldChangeType() - narrowing to a crazy type was too ambitious. The backend isn't likely to produce good code for that anytime soon. That should be enough to solve bug 29009, and maybe the example you've shown here? I don't think there's anything controversial about this change.
2. If #1 is not enough, remove the instcombine transform completely and see if SimplifyCFG or some other pass now picks up those cases.
3. If we show codegen regressions from #2, then try to solve those in the backend.

On Wed, Oct 24, 2018 at 12:20 PM Bakhvalov, Denis <[hidden email]> wrote:

Hello Sanjay, Akira,

From IR pov this transformation creates additional truncate instruction that didn’t exist in the original IR (as Sanjay wrote).

I went through the links that Sanjay provided and now I’m stuck. :)
Should we now focus on backend (as described in Bug 29009)?

Or we say that this instcombine transform makes no sense today and we do something about it?

 

Best regards,
Denis Bakhvalov.

 

From: Sanjay Patel [mailto:[hidden email]]
Sent: Tuesday, October 23, 2018 11:20 AM
To: Roman Lebedev <[hidden email]>
Cc: Bakhvalov, Denis <[hidden email]>; llvm-dev <[hidden email]>; Akira Hatanaka <[hidden email]>
Subject: Re: [llvm-dev] InstCombine: Narrow switch instructions using known bits

 

Sorry for not seeing this sooner. I made this transform more aggressive here:

 

But the backend still doesn't deal with strange types optimally:

 

So at the least, I think we should limit the transform as it was in an earlier draft of D12965 to not shrink to weird illegal types (the code should use "shouldChangeType()").

 

I agree with Roman that we should look at the IR diffs if we remove this completely. But this is a potentially unusual instcombine IIRC because it can create an extra instruction (the truncate) that didn't exist in the original IR.

 

Looking at some of the linked patches/bugs should tell us if we are regressing any of the motivating cases. My guess is that improvements to SimplifyCFG since then have made this instcombine transform less likely/necessary.

 

On Tue, Oct 23, 2018 at 11:12 AM Roman Lebedev via llvm-dev <[hidden email]> wrote:

On Tue, Oct 23, 2018 at 8:02 PM Bakhvalov, Denis via llvm-dev
<
[hidden email]> wrote:
>
> Hi Akira,
>
> Can you maybe remember or come up with any example where this transformation still helps today?
>
> If no such case and no objections on disabling/removing it, I can start working on that.
>
>
>
> Best regards,
> Denis Bakhvalov.
>
>
>
> From: Akira Hatanaka [mailto:
[hidden email]]
> Sent: Wednesday, October 10, 2018 12:46 PM
> To: Bakhvalov, Denis <
[hidden email]>
> Cc: llvm-dev <
[hidden email]>; Akira Hatanaka <[hidden email]>
> Subject: Re: [llvm-dev] InstCombine: Narrow switch instructions using known bits
>
>
>
> On Fri, Sep 21, 2018 at 7:02 AM Bakhvalov, Denis via llvm-dev <
[hidden email]> wrote:
>
> Hello,
>
>
>
> I found one case where narrowing switch instructions transformation in InstCombine produces worse code.
>
> Let's suppose that I have such code:
>
>
>
> $ cat a.c
>
> void foo();
>
> void bar();
>
> void zoo();
>
>
>
> void my_func(unsigned int a) {
>
>     unsigned char b = a & 0xF;
>
>     switch (b) {
>
>         case 0:  foo(); break;
>
>         case 1:  bar(); break;
>
>         case 2:  foo(); break;
>
>         case 3:  foo(); break;
>
>         case 4:  foo(); break;
>
>         case 5:  bar(); break;
>
>         case 6:  foo(); break;
>
>         case 7:  foo(); break;
>
>         case 8:  bar(); break;
>
>         case 9:  foo(); break;
>
>         case 10: foo(); break;
>
>
>
>         default: zoo();
>
>     }
>
> }
>
>
>
> Using recent clang:
>
>
>
> $ clang -O3 -S -c a.c -o a.s
>
> I have the following assembly in the beginning of my_func:
>
> # bad case
>
>         movl    %edi, %eax
>
>         andb    $15, %al
>
>         cmpb    $10, %al
>
>         ja      .LBB0_9                           # jump to the default case
>
>
>
>         andl    $15, %edi
>
>         jmpq    *.LJTI0_0(,%rdi,8)      # go to jump table
>
>
>
> I found that if I disable switch shrinking like shown below:
>
>
>
> $ git diff
>
> diff --git a/lib/Transforms/InstCombine/InstructionCombining.cpp b/lib/Transforms/InstCombine/InstructionCombining.cpp
>
> index 5d5a9b2..3682b88 100644
>
> --- a/lib/Transforms/InstCombine/InstructionCombining.cpp
>
> +++ b/lib/Transforms/InstCombine/InstructionCombining.cpp
>
> @@ -2429,6 +2429,8 @@ Instruction *InstCombiner::visitSwitchInst(SwitchInst &SI) {
>
>      return &SI;
>
>    }
>
>
>
> +  return nullptr;
>
> +
>
>    KnownBits Known = computeKnownBits(Cond, 0, &SI);
>
>    unsigned LeadingKnownZeros = Known.countMinLeadingZeros();
>
>    unsigned LeadingKnownOnes = Known.countMinLeadingOnes();
>
>
>
> I get better assembly (there is no additional MOV and AND instructions):
>
> # good case
>
>         andl    $15, %edi
>
>         cmpl    $10, %edi
>
>         ja      .LBB0_9
>
>
>
>         jmpq    *.LJTI0_0(,%rdi,8)
It's not always good to compare assembly when talking about middle-end
transforms.
If something is good for middle-end (which is *likely* the case here),
and is bad for back-ends,
then it is usually a back-end problem to deal with it.
In other words, when talking about middle-end, it might be best to
look at the LLVM IR.

> This transformation was introduced in the commit:
>
> commit 4eb03123dfda2de88a84852834845678833c8c36
>
> Author: Akira Hatanaka <
[hidden email]>
>
> Date:   Thu Oct 16 06:00:46 2014 +0000
>
>     Reapply r219832 - InstCombine: Narrow switch instructions using known bits.
>
>
>
> From IR point of view, after the ‘opt –codegenprepare’ the difference is that in good case we have simple ‘and’ operation (all calculations are made in i32). In the bad case there is 'trunc' to i4 and then ‘zext’ to i8.
>
>
>
> During instruction selection we expand switch into a jump table. In the bad case we use 2 copies of the value that we are switching on. First is in i8 that we use to determine whether we should jump to default case. The second is in i64, which we use for calculating address in the jump table. In the good case they were combined.
>
>
>
> But there is still one thing that I don’t understand. What is the bad case that this transformation (narrowing switch instructions) was supposed to fix, i.e. does this transformation still make sense?
>
>
>
>
>
> I couldn't find the original test case, but the patch was committed to fix a switch over a 128-bit value that was causing llvm to generate suboptimal code in some cases. I'm not sure whether this optimization is still necessary today.
>
>
>
> Best regards,
> Denis Bakhvalov.
Roman.

> _______________________________________________
> LLVM Developers mailing list
>
[hidden email]
>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
>
[hidden email]
>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev