[llvm-dev] understanding llvm's codegen for function forwarding

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] understanding llvm's codegen for function forwarding

Alberto Barbaro via llvm-dev
When compiling this LLVM IR with -O0 (no optimizations)

define internal fastcc void @bar2(%Bar* nonnull sret) unnamed_addr #2 !dbg !74 {
Entry:
  call fastcc void @bar(%Bar* sret %0), !dbg !79
  ret void, !dbg !81
}

why does this generate this?

0000000000000090 <bar2>:
  90:    55                       push   %rbp
  91:    48 89 e5                 mov    %rsp,%rbp
  94:    48 83 ec 10              sub    $0x10,%rsp
  98:    48 89 f8                 mov    %rdi,%rax
  9b:    48 89 45 f8              mov    %rax,-0x8(%rbp)
  9f:    e8 0c 00 00 00           callq  b0 <bar>
  a4:    48 8b 45 f8              mov    -0x8(%rbp),%rax
  a8:    48 83 c4 10              add    $0x10,%rsp
  ac:    5d                       pop    %rbp
  ad:    c3                       retq
  ae:    66 90                    xchg   %ax,%ax


instead of something like this?

0000000000000090 <bar2>:
  9f:    e8 0c 00 00 00           callq  b0 <bar>
  ad:    c3                       retq

when I add `musttail` to the IR it gives me this assembly:

00000000000000a0 <bar2>:
  a0:    55                       push   %rbp
  a1:    48 89 e5                 mov    %rsp,%rbp
  a4:    48 83 ec 10              sub    $0x10,%rsp
  a8:    48 89 f8                 mov    %rdi,%rax
  ab:    48 89 45 f8              mov    %rax,-0x8(%rbp)
  af:    48 83 c4 10              add    $0x10,%rsp
  b3:    5d                       pop    %rbp
  b4:    e9 07 00 00 00           jmpq   c0 <bar>
  b9:    0f 1f 80 00 00 00 00     nopl   0x0(%rax)

which does not have a call instruction but it has prologue that I
would not expect.

What's going on here? Is this something that can not really be
improved without optimization passes?
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] understanding llvm's codegen for function forwarding

Alberto Barbaro via llvm-dev
I'd assume -fomit-frame-pointer would make a difference.

Cheers,
Nicolai

On 23.11.18 20:49, Andrew Kelley via llvm-dev wrote:

> When compiling this LLVM IR with -O0 (no optimizations)
>
> define internal fastcc void @bar2(%Bar* nonnull sret) unnamed_addr #2 !dbg !74 {
> Entry:
>    call fastcc void @bar(%Bar* sret %0), !dbg !79
>    ret void, !dbg !81
> }
>
> why does this generate this?
>
> 0000000000000090 <bar2>:
>    90:    55                       push   %rbp
>    91:    48 89 e5                 mov    %rsp,%rbp
>    94:    48 83 ec 10              sub    $0x10,%rsp
>    98:    48 89 f8                 mov    %rdi,%rax
>    9b:    48 89 45 f8              mov    %rax,-0x8(%rbp)
>    9f:    e8 0c 00 00 00           callq  b0 <bar>
>    a4:    48 8b 45 f8              mov    -0x8(%rbp),%rax
>    a8:    48 83 c4 10              add    $0x10,%rsp
>    ac:    5d                       pop    %rbp
>    ad:    c3                       retq
>    ae:    66 90                    xchg   %ax,%ax
>
>
> instead of something like this?
>
> 0000000000000090 <bar2>:
>    9f:    e8 0c 00 00 00           callq  b0 <bar>
>    ad:    c3                       retq
>
> when I add `musttail` to the IR it gives me this assembly:
>
> 00000000000000a0 <bar2>:
>    a0:    55                       push   %rbp
>    a1:    48 89 e5                 mov    %rsp,%rbp
>    a4:    48 83 ec 10              sub    $0x10,%rsp
>    a8:    48 89 f8                 mov    %rdi,%rax
>    ab:    48 89 45 f8              mov    %rax,-0x8(%rbp)
>    af:    48 83 c4 10              add    $0x10,%rsp
>    b3:    5d                       pop    %rbp
>    b4:    e9 07 00 00 00           jmpq   c0 <bar>
>    b9:    0f 1f 80 00 00 00 00     nopl   0x0(%rax)
>
> which does not have a call instruction but it has prologue that I
> would not expect.
>
> What's going on here? Is this something that can not really be
> improved without optimization passes?
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] understanding llvm's codegen for function forwarding

Alberto Barbaro via llvm-dev
In reply to this post by Alberto Barbaro via llvm-dev
Re-adding llvm-dev -- silly phones not defaulting to reply-all...

There are several things here. The first one is -fno-omit-frame-pointer is causing the generation of "push %rbp ; mov %rsp, %rbp". This would be required for accurate stack traces, so we can't simplify to just "call / ret" as you suggest, without changing the option.

The less obvious one is the spilling of RDI to stack memory and reloading it into RAX, which is what I was raising. The Sys V ABI requires that the address of a struct returned by pointer be returned in RAX, and LLVM complies. It looks like I misremembered. We've always returned RDI in RAX for sret functions, since 2008 / r50075. However, we never did the right thing in 32-bit. I fixed that in https://bugs.llvm.org/show_bug.cgi?id=23491r237639. We don't yet implement the general optimization of avoiding such spills by reusing the value returned in RAX, which is why we don't get the simple "call / ret" code you suggest.

Finally, we miss the tail call opportunity because today we just give up if sret is present on either the caller of the callee. I think we could refine that to check for, do they agree, does the sret parameter match.

On Sat, Nov 24, 2018 at 9:20 AM Andrew Kelley <[hidden email]> wrote:
On Sat, Nov 24, 2018 at 12:11 PM Reid Kleckner <[hidden email]> wrote:
>
> Llvm is trying to return RDI in RAX. It doesn't trust the callee to do it, because that was a bug that we fixed long ago.

You're saying these extra instructions are working around a bug that
no longer exists? Can they be removed now?

What was the bug? Why can't the callee be trusted?

>
> On Fri, Nov 23, 2018, 11:49 AM Andrew Kelley via llvm-dev <[hidden email] wrote:
>>
>> When compiling this LLVM IR with -O0 (no optimizations)
>>
>> define internal fastcc void @bar2(%Bar* nonnull sret) unnamed_addr #2 !dbg !74 {
>> Entry:
>>   call fastcc void @bar(%Bar* sret %0), !dbg !79
>>   ret void, !dbg !81
>> }
>>
>> why does this generate this?
>>
>> 0000000000000090 <bar2>:
>>   90:    55                       push   %rbp
>>   91:    48 89 e5                 mov    %rsp,%rbp
>>   94:    48 83 ec 10              sub    $0x10,%rsp
>>   98:    48 89 f8                 mov    %rdi,%rax
>>   9b:    48 89 45 f8              mov    %rax,-0x8(%rbp)
>>   9f:    e8 0c 00 00 00           callq  b0 <bar>
>>   a4:    48 8b 45 f8              mov    -0x8(%rbp),%rax
>>   a8:    48 83 c4 10              add    $0x10,%rsp
>>   ac:    5d                       pop    %rbp
>>   ad:    c3                       retq
>>   ae:    66 90                    xchg   %ax,%ax
>>
>>
>> instead of something like this?
>>
>> 0000000000000090 <bar2>:
>>   9f:    e8 0c 00 00 00           callq  b0 <bar>
>>   ad:    c3                       retq
>>
>> when I add `musttail` to the IR it gives me this assembly:
>>
>> 00000000000000a0 <bar2>:
>>   a0:    55                       push   %rbp
>>   a1:    48 89 e5                 mov    %rsp,%rbp
>>   a4:    48 83 ec 10              sub    $0x10,%rsp
>>   a8:    48 89 f8                 mov    %rdi,%rax
>>   ab:    48 89 45 f8              mov    %rax,-0x8(%rbp)
>>   af:    48 83 c4 10              add    $0x10,%rsp
>>   b3:    5d                       pop    %rbp
>>   b4:    e9 07 00 00 00           jmpq   c0 <bar>
>>   b9:    0f 1f 80 00 00 00 00     nopl   0x0(%rax)
>>
>> which does not have a call instruction but it has prologue that I
>> would not expect.
>>
>> What's going on here? Is this something that can not really be
>> improved without optimization passes?
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev