[llvm-dev] [RFC] Optimizing Comparisons Chains

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[llvm-dev] [RFC] Optimizing Comparisons Chains

Gerolf Hoflehner via llvm-dev

Hi all,


I'm working on a new pass to optimize comparison chains.


Motivation

 

Clang currently generates inefficient code when dealing with contiguous member-by-member structural equality. Consider:

 

struct A {

 bool operator==(const A& o) const { return i == o.i && j == o.j; }

 uint32 i;

 uint32 j;

};

 

This generates:

 

 mov     eax, dword ptr [rdi]

 cmp     eax, dword ptr [rsi]

 jne     .LBB0_1

 mov     eax, dword ptr [rdi + 4]

 cmp     eax, dword ptr [rsi + 4]

 sete    al

 ret

.LBB0_1:

 xor     eax, eax

 ret

 

I’ve been working on an LLVM pass that detects this pattern at IR level and turns it into a memcmp() call. This generates more efficient code:

 

 mov     rax, qword ptr [rdi]

 cmp     rax, qword ptr [rsi]

 sete    al

 ret

 

And thanks to recent improvements in the memcmp codegen, this can be made to work for all sizes.

 

Impact of the change

 

I’ve measured the change on std:pair/std::tuple. The pass typically makes the code 2-3 times faster with code that’s typically 2-3x times smaller.

 

A more detailed description can be found here and a proof of concept can be seen here.

 

Do you see any aspect of this that I may have missed?

For now I’ve implemented this as a separate pass. Would there be a better way to integrate it?


Thanks !



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] [RFC] Optimizing Comparisons Chains

Gerolf Hoflehner via llvm-dev
Hi Clement -

I started looking at CGP memcmp expansion for x86 more closely yesterday with:
https://reviews.llvm.org/D33963

And just made another change here:
https://reviews.llvm.org/rL304923
So we want to enable the CGP expansion without regressing the optimal x86 memcmp codegen for the power-of-2 cases that are currently handled by SDAG builder. If this works out, we'll abandon the memcmp SDAG transforms for x86 (and hopefully other targets too) because we'll take care of all memcmp expansion in CGP.

I didn't look closely at your new pass proposal, but I think you'll see bigger improvements once we have the optimal x86 memcmp expansion in place for all sizes.


On Wed, Jun 7, 2017 at 10:00 AM, Clement Courbet via llvm-dev <[hidden email]> wrote:

Hi all,


I'm working on a new pass to optimize comparison chains.


Motivation

 

Clang currently generates inefficient code when dealing with contiguous member-by-member structural equality. Consider:

 

struct A {

 bool operator==(const A& o) const { return i == o.i && j == o.j; }

 uint32 i;

 uint32 j;

};

 

This generates:

 

 mov     eax, dword ptr [rdi]

 cmp     eax, dword ptr [rsi]

 jne     .LBB0_1

 mov     eax, dword ptr [rdi + 4]

 cmp     eax, dword ptr [rsi + 4]

 sete    al

 ret

.LBB0_1:

 xor     eax, eax

 ret

 

I’ve been working on an LLVM pass that detects this pattern at IR level and turns it into a memcmp() call. This generates more efficient code:

 

 mov     rax, qword ptr [rdi]

 cmp     rax, qword ptr [rsi]

 sete    al

 ret

 

And thanks to recent improvements in the memcmp codegen, this can be made to work for all sizes.

 

Impact of the change

 

I’ve measured the change on std:pair/std::tuple. The pass typically makes the code 2-3 times faster with code that’s typically 2-3x times smaller.

 

A more detailed description can be found here and a proof of concept can be seen here.

 

Do you see any aspect of this that I may have missed?

For now I’ve implemented this as a separate pass. Would there be a better way to integrate it?


Thanks !



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] [RFC] Optimizing Comparisons Chains

Gerolf Hoflehner via llvm-dev
That sounds perfect, thanks. Indeed my pass currently improves performance only for small powers of two, and I'm waiting for the CGP approach to be enabled to make it work for all sizes !

On Wed, Jun 7, 2017 at 7:03 PM, Sanjay Patel <[hidden email]> wrote:
Hi Clement -

I started looking at CGP memcmp expansion for x86 more closely yesterday with:
https://reviews.llvm.org/D33963

And just made another change here:
https://reviews.llvm.org/rL304923
So we want to enable the CGP expansion without regressing the optimal x86 memcmp codegen for the power-of-2 cases that are currently handled by SDAG builder. If this works out, we'll abandon the memcmp SDAG transforms for x86 (and hopefully other targets too) because we'll take care of all memcmp expansion in CGP.

I didn't look closely at your new pass proposal, but I think you'll see bigger improvements once we have the optimal x86 memcmp expansion in place for all sizes.


On Wed, Jun 7, 2017 at 10:00 AM, Clement Courbet via llvm-dev <[hidden email]> wrote:

Hi all,


I'm working on a new pass to optimize comparison chains.


Motivation

 

Clang currently generates inefficient code when dealing with contiguous member-by-member structural equality. Consider:

 

struct A {

 bool operator==(const A& o) const { return i == o.i && j == o.j; }

 uint32 i;

 uint32 j;

};

 

This generates:

 

 mov     eax, dword ptr [rdi]

 cmp     eax, dword ptr [rsi]

 jne     .LBB0_1

 mov     eax, dword ptr [rdi + 4]

 cmp     eax, dword ptr [rsi + 4]

 sete    al

 ret

.LBB0_1:

 xor     eax, eax

 ret

 

I’ve been working on an LLVM pass that detects this pattern at IR level and turns it into a memcmp() call. This generates more efficient code:

 

 mov     rax, qword ptr [rdi]

 cmp     rax, qword ptr [rsi]

 sete    al

 ret

 

And thanks to recent improvements in the memcmp codegen, this can be made to work for all sizes.

 

Impact of the change

 

I’ve measured the change on std:pair/std::tuple. The pass typically makes the code 2-3 times faster with code that’s typically 2-3x times smaller.

 

A more detailed description can be found here and a proof of concept can be seen here.

 

Do you see any aspect of this that I may have missed?

For now I’ve implemented this as a separate pass. Would there be a better way to integrate it?


Thanks !



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] [RFC] Optimizing Comparisons Chains

Gerolf Hoflehner via llvm-dev
This sounds useful and I've subscribed to the patch to watch it as I'd like to try it out on PPC as well. We'd certainly get better code gen on PPC with memcmp than we currently do. Thanks for working on this.

On Thu, Jun 8, 2017 at 8:42 AM, Clement Courbet via llvm-dev <[hidden email]> wrote:
That sounds perfect, thanks. Indeed my pass currently improves performance only for small powers of two, and I'm waiting for the CGP approach to be enabled to make it work for all sizes !

On Wed, Jun 7, 2017 at 7:03 PM, Sanjay Patel <[hidden email]> wrote:
Hi Clement -

I started looking at CGP memcmp expansion for x86 more closely yesterday with:
https://reviews.llvm.org/D33963

And just made another change here:
https://reviews.llvm.org/rL304923
So we want to enable the CGP expansion without regressing the optimal x86 memcmp codegen for the power-of-2 cases that are currently handled by SDAG builder. If this works out, we'll abandon the memcmp SDAG transforms for x86 (and hopefully other targets too) because we'll take care of all memcmp expansion in CGP.

I didn't look closely at your new pass proposal, but I think you'll see bigger improvements once we have the optimal x86 memcmp expansion in place for all sizes.


On Wed, Jun 7, 2017 at 10:00 AM, Clement Courbet via llvm-dev <[hidden email]> wrote:

Hi all,


I'm working on a new pass to optimize comparison chains.


Motivation

 

Clang currently generates inefficient code when dealing with contiguous member-by-member structural equality. Consider:

 

struct A {

 bool operator==(const A& o) const { return i == o.i && j == o.j; }

 uint32 i;

 uint32 j;

};

 

This generates:

 

 mov     eax, dword ptr [rdi]

 cmp     eax, dword ptr [rsi]

 jne     .LBB0_1

 mov     eax, dword ptr [rdi + 4]

 cmp     eax, dword ptr [rsi + 4]

 sete    al

 ret

.LBB0_1:

 xor     eax, eax

 ret

 

I’ve been working on an LLVM pass that detects this pattern at IR level and turns it into a memcmp() call. This generates more efficient code:

 

 mov     rax, qword ptr [rdi]

 cmp     rax, qword ptr [rsi]

 sete    al

 ret

 

And thanks to recent improvements in the memcmp codegen, this can be made to work for all sizes.

 

Impact of the change

 

I’ve measured the change on std:pair/std::tuple. The pass typically makes the code 2-3 times faster with code that’s typically 2-3x times smaller.

 

A more detailed description can be found here and a proof of concept can be seen here.

 

Do you see any aspect of this that I may have missed?

For now I’ve implemented this as a separate pass. Would there be a better way to integrate it?


Thanks !



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Loading...