[llvm-dev] Where's the optimiser gone? (part 5.b): missed tail calls, and more...

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] Where's the optimiser gone? (part 5.b): missed tail calls, and more...

Alberto Barbaro via llvm-dev
Compile the following functions with "-O3 -target i386"
(see <https://godbolt.org/z/VmKlXL>):

long long div(long long foo, long long bar)
{
    return foo / bar;
}

On the left the generated code; on the right the expected,
properly optimised code:

div: # @div
    push  ebp                     |
    mov   ebp, esp                |
    push  dword ptr [ebp + 20]    |
    push  dword ptr [ebp + 16]    |
    push  dword ptr [ebp + 12]    |
    push  dword ptr [ebp + 8]     |
    call  __divdi3                |    jmp   __divdi3
    add   esp, 16                 |
    pop   ebp                     |
    ret                           |


long long mod(long long foo, long long bar)
{
    return foo % bar;
}

mod: # @mod
    push  ebp                     |
    mov   ebp, esp                |
    push  dword ptr [ebp + 20]    |
    push  dword ptr [ebp + 16]    |
    push  dword ptr [ebp + 12]    |
    push  dword ptr [ebp + 8]     |
    call  __moddi3                |    jmp   __moddi3
    add   esp, 16                 |
    pop   ebp                     |
    ret                           |


long long mul(long long foo, long long bar)
{
    return foo * bar;
}

mul: # @mul
    push  ebp
    mov   ebp, esp
    push  esi
    mov   ecx, dword ptr [ebp + 16]
    mov   esi, dword ptr [ebp + 8]
    mov   eax, ecx
    imul  ecx, dword ptr [ebp + 12]
    mul   esi
    imul  esi, dword ptr [ebp + 20]
    add   edx, ecx
    add   edx, esi
    pop   esi
    pop   ebp
    ret
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Where's the optimiser gone? (part 5.b): missed tail calls, and more...

Alberto Barbaro via llvm-dev
Clang's -target option is supposed to take a cpu type and an operating system. So "-target i386" is giving it no operatiing system. This is preventing frame pointer elimination which is why ebp is being updated. If you pass "-target i386-linux" you get sightly better code.

The division/remainder operations are turned into library calls as part of instruction selection. This code is somewhat independent of how other calls are handled. We probably don't support tail calls in it. Is it really realistic that a user would have a non-inlined function that contains just a division? Why should we optimize for that case?

~Craig


On Sat, Dec 1, 2018 at 9:37 AM Stefan Kanthak via llvm-dev <[hidden email]> wrote:
Compile the following functions with "-O3 -target i386"
(see <https://godbolt.org/z/VmKlXL>):

long long div(long long foo, long long bar)
{
    return foo / bar;
}

On the left the generated code; on the right the expected,
properly optimised code:

div: # @div
    push  ebp                     |
    mov   ebp, esp                |
    push  dword ptr [ebp + 20]    |
    push  dword ptr [ebp + 16]    |
    push  dword ptr [ebp + 12]    |
    push  dword ptr [ebp + 8]     |
    call  __divdi3                |    jmp   __divdi3
    add   esp, 16                 |
    pop   ebp                     |
    ret                           |


long long mod(long long foo, long long bar)
{
    return foo % bar;
}

mod: # @mod
    push  ebp                     |
    mov   ebp, esp                |
    push  dword ptr [ebp + 20]    |
    push  dword ptr [ebp + 16]    |
    push  dword ptr [ebp + 12]    |
    push  dword ptr [ebp + 8]     |
    call  __moddi3                |    jmp   __moddi3
    add   esp, 16                 |
    pop   ebp                     |
    ret                           |


long long mul(long long foo, long long bar)
{
    return foo * bar;
}

mul: # @mul
    push  ebp
    mov   ebp, esp
    push  esi
    mov   ecx, dword ptr [ebp + 16]
    mov   esi, dword ptr [ebp + 8]
    mov   eax, ecx
    imul  ecx, dword ptr [ebp + 12]
    mul   esi
    imul  esi, dword ptr [ebp + 20]
    add   edx, ecx
    add   edx, esi
    pop   esi
    pop   ebp
    ret
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Where's the optimiser gone? (part 5.b): missed tail calls, and more...

Alberto Barbaro via llvm-dev
"Craig Topper" <[hidden email]> wrote:


> Clang's -target option is supposed to take a cpu type and an operating
> system. So "-target i386" is giving it no operatiing system. This is
> preventing frame pointer elimination which is why ebp is being updated. If
> you pass "-target i386-linux" you get sightly better code.

The frame pointer is but not the point here.

> The division/remainder operations are turned into library calls as part of
> instruction selection. This code is somewhat independent of how other calls
> are handled. We probably don't support tail calls in it. Is it really
> realistic that a user would have a non-inlined function that contains just
> a division? Why should we optimize for that case?

I've seen quite some libraries which implement such functions, calling
just another function having the same prototype, as target-independent
wrappers.
So the question is not whether it's just a division, but in general the
call of a function having the same prototype.

regards
Stefan

> On Sat, Dec 1, 2018 at 9:37 AM Stefan Kanthak via llvm-dev <
> [hidden email]> wrote:
>
>> Compile the following functions with "-O3 -target i386"
>> (see <https://godbolt.org/z/VmKlXL>):
>>
>> long long div(long long foo, long long bar)
>> {
>>     return foo / bar;
>> }
>>
>> On the left the generated code; on the right the expected,
>> properly optimised code:
>>
>> div: # @div
>>     push  ebp                     |
>>     mov   ebp, esp                |
>>     push  dword ptr [ebp + 20]    |
>>     push  dword ptr [ebp + 16]    |
>>     push  dword ptr [ebp + 12]    |
>>     push  dword ptr [ebp + 8]     |
>>     call  __divdi3                |    jmp   __divdi3
>>     add   esp, 16                 |
>>     pop   ebp                     |
>>     ret                           |
>>
>>
>> long long mod(long long foo, long long bar)
>> {
>>     return foo % bar;
>> }
>>
>> mod: # @mod
>>     push  ebp                     |
>>     mov   ebp, esp                |
>>     push  dword ptr [ebp + 20]    |
>>     push  dword ptr [ebp + 16]    |
>>     push  dword ptr [ebp + 12]    |
>>     push  dword ptr [ebp + 8]     |
>>     call  __moddi3                |    jmp   __moddi3
>>     add   esp, 16                 |
>>     pop   ebp                     |
>>     ret                           |
>>
>>
>> long long mul(long long foo, long long bar)
>> {
>>     return foo * bar;
>> }
>>
>> mul: # @mul
>>     push  ebp
>>     mov   ebp, esp
>>     push  esi
>>     mov   ecx, dword ptr [ebp + 16]
>>     mov   esi, dword ptr [ebp + 8]
>>     mov   eax, ecx
>>     imul  ecx, dword ptr [ebp + 12]
>>     mul   esi
>>     imul  esi, dword ptr [ebp + 20]
>>     add   edx, ecx
>>     add   edx, esi
>>     pop   esi
>>     pop   ebp
>>     ret
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Where's the optimiser gone? (part 5.b): missed tail calls, and more...

Alberto Barbaro via llvm-dev
On Sat, Dec 1, 2018 at 12:05 PM Stefan Kanthak <[hidden email]> wrote:
"Craig Topper" <[hidden email]> wrote:


> Clang's -target option is supposed to take a cpu type and an operating
> system. So "-target i386" is giving it no operatiing system. This is
> preventing frame pointer elimination which is why ebp is being updated. If
> you pass "-target i386-linux" you get sightly better code.

The frame pointer is but not the point here.

You didn't provide what you think the improved code would be for the multiply. So I wasn't sure.
 

> The division/remainder operations are turned into library calls as part of
> instruction selection. This code is somewhat independent of how other calls
> are handled. We probably don't support tail calls in it. Is it really
> realistic that a user would have a non-inlined function that contains just
> a division? Why should we optimize for that case?

I've seen quite some libraries which implement such functions, calling
just another function having the same prototype, as target-independent
wrappers.
So the question is not whether it's just a division, but in general the
call of a function having the same prototype.

We do support that when there is a call in the original source code. The division/remainder case is special because we're turning an arithmetic operation into a call. This for example works.

long long foo(long long x, long long y) {
  return bar(foo, bar);
}
 

regards
Stefan

> On Sat, Dec 1, 2018 at 9:37 AM Stefan Kanthak via llvm-dev <
> [hidden email]> wrote:
>
>> Compile the following functions with "-O3 -target i386"
>> (see <https://godbolt.org/z/VmKlXL>):
>>
>> long long div(long long foo, long long bar)
>> {
>>     return foo / bar;
>> }
>>
>> On the left the generated code; on the right the expected,
>> properly optimised code:
>>
>> div: # @div
>>     push  ebp                     |
>>     mov   ebp, esp                |
>>     push  dword ptr [ebp + 20]    |
>>     push  dword ptr [ebp + 16]    |
>>     push  dword ptr [ebp + 12]    |
>>     push  dword ptr [ebp + 8]     |
>>     call  __divdi3                |    jmp   __divdi3
>>     add   esp, 16                 |
>>     pop   ebp                     |
>>     ret                           |
>>
>>
>> long long mod(long long foo, long long bar)
>> {
>>     return foo % bar;
>> }
>>
>> mod: # @mod
>>     push  ebp                     |
>>     mov   ebp, esp                |
>>     push  dword ptr [ebp + 20]    |
>>     push  dword ptr [ebp + 16]    |
>>     push  dword ptr [ebp + 12]    |
>>     push  dword ptr [ebp + 8]     |
>>     call  __moddi3                |    jmp   __moddi3
>>     add   esp, 16                 |
>>     pop   ebp                     |
>>     ret                           |
>>
>>
>> long long mul(long long foo, long long bar)
>> {
>>     return foo * bar;
>> }
>>
>> mul: # @mul
>>     push  ebp
>>     mov   ebp, esp
>>     push  esi
>>     mov   ecx, dword ptr [ebp + 16]
>>     mov   esi, dword ptr [ebp + 8]
>>     mov   eax, ecx
>>     imul  ecx, dword ptr [ebp + 12]
>>     mul   esi
>>     imul  esi, dword ptr [ebp + 20]
>>     add   edx, ecx
>>     add   edx, esi
>>     pop   esi
>>     pop   ebp
>>     ret
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev