[llvm-dev] Inline assembly in intel syntax mishandling i constraint

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] Inline assembly in intel syntax mishandling i constraint

Jeremy Morse via llvm-dev
Hi all,

I'm getting rather odd behavior from a call asm inteldialect(). TL;DR is "mov reg, $0" with a "i" constraint on $0 is behaving identical to "mov reg, dword ptr [$0]" and differently from "movl $0, reg" in AT&T syntax.


I'm not sure how to get clang to emit an inteldialect, so for this example, I'm emitting llvm and then modifying the resultant .ll file. (I get similar behavior with rust's asm!(… : "intel") so I'm assuming that's what rust is using, although I didn't verify this).

Here's the example

static int foo;
static int bar;

void _start(void) {
  asm volatile("movl %0, %%eax" : : "i"(&foo));
  asm volatile("movl %0, %%ebx" : : "i"(&bar));
}

This produces
define void @_start() #0 {
  call void asm sideeffect "movl $0, %eax", "i,~{dirflag},~{fpsr},~{flags}"(i32* @foo) #1, !srcloc !3
  call void asm sideeffect "movl $0, %ebx", "i,~{dirflag},~{fpsr},~{flags}"(i32* @bar) #1, !srcloc !4
  ret void
}

When assembled, I get the expected output
 80480a3: b8 b0 90 04 08       mov    eax,0x80490b0
 80480a8: bb b4 90 04 08       mov    ebx,0x80490b4


After modifying the second one to be
  call void asm sideeffect inteldialect "mov ebx, $0", "i,~{dirflag},~{fpsr},~{flags}"(i32* @bar) #1, !srcloc !4
and assembling, I get the unexpected output
 80480a3: b8 b0 90 04 08       mov    eax,0x80490b0
 80480a8: 8b 1d b4 90 04 08     mov    ebx,DWORD PTR ds:0x80490b4

This is identical to the output I get if I change the assembly template to "mov ebx, dword ptr [$0]"

I think the underlying issue here is that whichever variant of Intel syntax this supports (MASM?) treats
mov reg, symbol
as a load and it wants
mov reg, offset symbol

E.g., if I ask Clang to output assembly in Intel syntax via -mllvm --x86-asm-syntax=intel, I get
        #APP
        mov eax, offset foo
        #NO_APP
        #APP

        mov ebx, dword ptr [bar]

        #NO_APP

(I have no idea where those extra newlines are coming from.)

If I try to change the assembly template to "mov ebx, offset $0" it complains about multiple symbols being present:
<inline asm>:2:18: error: cannot use more than one symbol in memory operand
        mov ebx, offset bar

I attached my source file and my modified .ll file. I compiled the source file with

    clang -m32 a.c -ffreestanding -nostdlib -S -emit-llvm

$ clang --version
clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

Is this an LLVM bug or am I misusing inteldialect?

Thank you,

Steve

--
Stephen Checkoway






_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

a.c (158 bytes) Download Attachment
bug.ll (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Inline assembly in intel syntax mishandling i constraint

Jeremy Morse via llvm-dev
What version of llvm are you using? This looks like it may be fixed on trunk.

~Craig


On Tue, Jan 7, 2020 at 2:44 PM Stephen Checkoway via llvm-dev <[hidden email]> wrote:
Hi all,

I'm getting rather odd behavior from a call asm inteldialect(). TL;DR is "mov reg, $0" with a "i" constraint on $0 is behaving identical to "mov reg, dword ptr [$0]" and differently from "movl $0, reg" in AT&T syntax.


I'm not sure how to get clang to emit an inteldialect, so for this example, I'm emitting llvm and then modifying the resultant .ll file. (I get similar behavior with rust's asm!(… : "intel") so I'm assuming that's what rust is using, although I didn't verify this).

Here's the example

static int foo;
static int bar;

void _start(void) {
  asm volatile("movl %0, %%eax" : : "i"(&foo));
  asm volatile("movl %0, %%ebx" : : "i"(&bar));
}

This produces
define void @_start() #0 {
  call void asm sideeffect "movl $0, %eax", "i,~{dirflag},~{fpsr},~{flags}"(i32* @foo) #1, !srcloc !3
  call void asm sideeffect "movl $0, %ebx", "i,~{dirflag},~{fpsr},~{flags}"(i32* @bar) #1, !srcloc !4
  ret void
}

When assembled, I get the expected output
 80480a3:       b8 b0 90 04 08          mov    eax,0x80490b0
 80480a8:       bb b4 90 04 08          mov    ebx,0x80490b4


After modifying the second one to be
  call void asm sideeffect inteldialect "mov ebx, $0", "i,~{dirflag},~{fpsr},~{flags}"(i32* @bar) #1, !srcloc !4
and assembling, I get the unexpected output
 80480a3:       b8 b0 90 04 08          mov    eax,0x80490b0
 80480a8:       8b 1d b4 90 04 08       mov    ebx,DWORD PTR ds:0x80490b4

This is identical to the output I get if I change the assembly template to "mov ebx, dword ptr [$0]"

I think the underlying issue here is that whichever variant of Intel syntax this supports (MASM?) treats
mov reg, symbol
as a load and it wants
mov reg, offset symbol

E.g., if I ask Clang to output assembly in Intel syntax via -mllvm --x86-asm-syntax=intel, I get
        #APP
        mov     eax, offset foo
        #NO_APP
        #APP

        mov     ebx, dword ptr [bar]

        #NO_APP

(I have no idea where those extra newlines are coming from.)

If I try to change the assembly template to "mov ebx, offset $0" it complains about multiple symbols being present:
<inline asm>:2:18: error: cannot use more than one symbol in memory operand
        mov ebx, offset bar

I attached my source file and my modified .ll file. I compiled the source file with

    clang -m32 a.c -ffreestanding -nostdlib -S -emit-llvm

$ clang --version
clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

Is this an LLVM bug or am I misusing inteldialect?

Thank you,

Steve

--
Stephen Checkoway




_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Inline assembly in intel syntax mishandling i constraint

Jeremy Morse via llvm-dev


> On Jan 7, 2020, at 18:41, Craig Topper <[hidden email]> wrote:
>
> What version of llvm are you using? This looks like it may be fixed on trunk.

After poking at my installation of rust, I'm not entirely sure what version of LLVM it uses. Looking at the GitHub page, it looks like Rust maintains their own copy of llvm and cherry picks commits. The C example was compiled with 6.0.

If it's fixed in LLVM, then I'll file a bug with Rust.

Thanks,

Steve

>
> ~Craig
>
>
> On Tue, Jan 7, 2020 at 2:44 PM Stephen Checkoway via llvm-dev <[hidden email]> wrote:
> Hi all,
>
> I'm getting rather odd behavior from a call asm inteldialect(). TL;DR is "mov reg, $0" with a "i" constraint on $0 is behaving identical to "mov reg, dword ptr [$0]" and differently from "movl $0, reg" in AT&T syntax.
>
>
> I'm not sure how to get clang to emit an inteldialect, so for this example, I'm emitting llvm and then modifying the resultant .ll file. (I get similar behavior with rust's asm!(… : "intel") so I'm assuming that's what rust is using, although I didn't verify this).
>
> Here's the example
>
> static int foo;
> static int bar;
>
> void _start(void) {
>   asm volatile("movl %0, %%eax" : : "i"(&foo));
>   asm volatile("movl %0, %%ebx" : : "i"(&bar));
> }
>
> This produces
> define void @_start() #0 {
>   call void asm sideeffect "movl $0, %eax", "i,~{dirflag},~{fpsr},~{flags}"(i32* @foo) #1, !srcloc !3
>   call void asm sideeffect "movl $0, %ebx", "i,~{dirflag},~{fpsr},~{flags}"(i32* @bar) #1, !srcloc !4
>   ret void
> }
>
> When assembled, I get the expected output
>  80480a3:       b8 b0 90 04 08          mov    eax,0x80490b0
>  80480a8:       bb b4 90 04 08          mov    ebx,0x80490b4
>
>
> After modifying the second one to be
>   call void asm sideeffect inteldialect "mov ebx, $0", "i,~{dirflag},~{fpsr},~{flags}"(i32* @bar) #1, !srcloc !4
> and assembling, I get the unexpected output
>  80480a3:       b8 b0 90 04 08          mov    eax,0x80490b0
>  80480a8:       8b 1d b4 90 04 08       mov    ebx,DWORD PTR ds:0x80490b4
>
> This is identical to the output I get if I change the assembly template to "mov ebx, dword ptr [$0]"
>
> I think the underlying issue here is that whichever variant of Intel syntax this supports (MASM?) treats
> mov reg, symbol
> as a load and it wants
> mov reg, offset symbol
>
> E.g., if I ask Clang to output assembly in Intel syntax via -mllvm --x86-asm-syntax=intel, I get
>         #APP
>         mov     eax, offset foo
>         #NO_APP
>         #APP
>
>         mov     ebx, dword ptr [bar]
>
>         #NO_APP
>
> (I have no idea where those extra newlines are coming from.)
>
> If I try to change the assembly template to "mov ebx, offset $0" it complains about multiple symbols being present:
> <inline asm>:2:18: error: cannot use more than one symbol in memory operand
>         mov ebx, offset bar
>
> I attached my source file and my modified .ll file. I compiled the source file with
>
>     clang -m32 a.c -ffreestanding -nostdlib -S -emit-llvm
>
> $ clang --version
> clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
> Target: x86_64-pc-linux-gnu
> Thread model: posix
> InstalledDir: /usr/bin
>
> Is this an LLVM bug or am I misusing inteldialect?
>
> Thank you,
>
> Steve
>
> --
> Stephen Checkoway
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--
Stephen Checkoway



_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Inline assembly in intel syntax mishandling i constraint

Jeremy Morse via llvm-dev
This is in fact fixed on trunk, but after LLVM 9.0 (see https://godbolt.org/z/pKHlqr); I believe the fix was https://reviews.llvm.org/D71436.

On Tue, Jan 7, 2020 at 9:04 PM Stephen Checkoway via llvm-dev <[hidden email]> wrote:


> On Jan 7, 2020, at 18:41, Craig Topper <[hidden email]> wrote:
>
> What version of llvm are you using? This looks like it may be fixed on trunk.

After poking at my installation of rust, I'm not entirely sure what version of LLVM it uses. Looking at the GitHub page, it looks like Rust maintains their own copy of llvm and cherry picks commits. The C example was compiled with 6.0.

If it's fixed in LLVM, then I'll file a bug with Rust.

Thanks,

Steve

>
> ~Craig
>
>
> On Tue, Jan 7, 2020 at 2:44 PM Stephen Checkoway via llvm-dev <[hidden email]> wrote:
> Hi all,
>
> I'm getting rather odd behavior from a call asm inteldialect(). TL;DR is "mov reg, $0" with a "i" constraint on $0 is behaving identical to "mov reg, dword ptr [$0]" and differently from "movl $0, reg" in AT&T syntax.
>
>
> I'm not sure how to get clang to emit an inteldialect, so for this example, I'm emitting llvm and then modifying the resultant .ll file. (I get similar behavior with rust's asm!(… : "intel") so I'm assuming that's what rust is using, although I didn't verify this).
>
> Here's the example
>
> static int foo;
> static int bar;
>
> void _start(void) {
>   asm volatile("movl %0, %%eax" : : "i"(&foo));
>   asm volatile("movl %0, %%ebx" : : "i"(&bar));
> }
>
> This produces
> define void @_start() #0 {
>   call void asm sideeffect "movl $0, %eax", "i,~{dirflag},~{fpsr},~{flags}"(i32* @foo) #1, !srcloc !3
>   call void asm sideeffect "movl $0, %ebx", "i,~{dirflag},~{fpsr},~{flags}"(i32* @bar) #1, !srcloc !4
>   ret void
> }
>
> When assembled, I get the expected output
>  80480a3:       b8 b0 90 04 08          mov    eax,0x80490b0
>  80480a8:       bb b4 90 04 08          mov    ebx,0x80490b4
>
>
> After modifying the second one to be
>   call void asm sideeffect inteldialect "mov ebx, $0", "i,~{dirflag},~{fpsr},~{flags}"(i32* @bar) #1, !srcloc !4
> and assembling, I get the unexpected output
>  80480a3:       b8 b0 90 04 08          mov    eax,0x80490b0
>  80480a8:       8b 1d b4 90 04 08       mov    ebx,DWORD PTR ds:0x80490b4
>
> This is identical to the output I get if I change the assembly template to "mov ebx, dword ptr [$0]"
>
> I think the underlying issue here is that whichever variant of Intel syntax this supports (MASM?) treats
> mov reg, symbol
> as a load and it wants
> mov reg, offset symbol
>
> E.g., if I ask Clang to output assembly in Intel syntax via -mllvm --x86-asm-syntax=intel, I get
>         #APP
>         mov     eax, offset foo
>         #NO_APP
>         #APP
>
>         mov     ebx, dword ptr [bar]
>
>         #NO_APP
>
> (I have no idea where those extra newlines are coming from.)
>
> If I try to change the assembly template to "mov ebx, offset $0" it complains about multiple symbols being present:
> <inline asm>:2:18: error: cannot use more than one symbol in memory operand
>         mov ebx, offset bar
>
> I attached my source file and my modified .ll file. I compiled the source file with
>
>     clang -m32 a.c -ffreestanding -nostdlib -S -emit-llvm
>
> $ clang --version
> clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
> Target: x86_64-pc-linux-gnu
> Thread model: posix
> InstalledDir: /usr/bin
>
> Is this an LLVM bug or am I misusing inteldialect?
>
> Thank you,
>
> Steve
>
> --
> Stephen Checkoway
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--
Stephen Checkoway



_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Inline assembly in intel syntax mishandling i constraint

Jeremy Morse via llvm-dev
Great, thanks! I had no idea I could use Compiler Explorer with LLVM IR directly.

Steve

> On Jan 7, 2020, at 22:45, Eric Astor via llvm-dev <[hidden email]> wrote:
>
> This is in fact fixed on trunk, but after LLVM 9.0 (see https://godbolt.org/z/pKHlqr); I believe the fix was https://reviews.llvm.org/D71436.
>
> On Tue, Jan 7, 2020 at 9:04 PM Stephen Checkoway via llvm-dev <[hidden email]> wrote:
>
>
> > On Jan 7, 2020, at 18:41, Craig Topper <[hidden email]> wrote:
> >
> > What version of llvm are you using? This looks like it may be fixed on trunk.
>
> After poking at my installation of rust, I'm not entirely sure what version of LLVM it uses. Looking at the GitHub page, it looks like Rust maintains their own copy of llvm and cherry picks commits. The C example was compiled with 6.0.
>
> If it's fixed in LLVM, then I'll file a bug with Rust.
>
> Thanks,
>
> Steve
>
> >
> > ~Craig
> >
> >
> > On Tue, Jan 7, 2020 at 2:44 PM Stephen Checkoway via llvm-dev <[hidden email]> wrote:
> > Hi all,
> >
> > I'm getting rather odd behavior from a call asm inteldialect(). TL;DR is "mov reg, $0" with a "i" constraint on $0 is behaving identical to "mov reg, dword ptr [$0]" and differently from "movl $0, reg" in AT&T syntax.
> >
> >
> > I'm not sure how to get clang to emit an inteldialect, so for this example, I'm emitting llvm and then modifying the resultant .ll file. (I get similar behavior with rust's asm!(… : "intel") so I'm assuming that's what rust is using, although I didn't verify this).
> >
> > Here's the example
> >
> > static int foo;
> > static int bar;
> >
> > void _start(void) {
> >   asm volatile("movl %0, %%eax" : : "i"(&foo));
> >   asm volatile("movl %0, %%ebx" : : "i"(&bar));
> > }
> >
> > This produces
> > define void @_start() #0 {
> >   call void asm sideeffect "movl $0, %eax", "i,~{dirflag},~{fpsr},~{flags}"(i32* @foo) #1, !srcloc !3
> >   call void asm sideeffect "movl $0, %ebx", "i,~{dirflag},~{fpsr},~{flags}"(i32* @bar) #1, !srcloc !4
> >   ret void
> > }
> >
> > When assembled, I get the expected output
> >  80480a3:       b8 b0 90 04 08          mov    eax,0x80490b0
> >  80480a8:       bb b4 90 04 08          mov    ebx,0x80490b4
> >
> >
> > After modifying the second one to be
> >   call void asm sideeffect inteldialect "mov ebx, $0", "i,~{dirflag},~{fpsr},~{flags}"(i32* @bar) #1, !srcloc !4
> > and assembling, I get the unexpected output
> >  80480a3:       b8 b0 90 04 08          mov    eax,0x80490b0
> >  80480a8:       8b 1d b4 90 04 08       mov    ebx,DWORD PTR ds:0x80490b4
> >
> > This is identical to the output I get if I change the assembly template to "mov ebx, dword ptr [$0]"
> >
> > I think the underlying issue here is that whichever variant of Intel syntax this supports (MASM?) treats
> > mov reg, symbol
> > as a load and it wants
> > mov reg, offset symbol
> >
> > E.g., if I ask Clang to output assembly in Intel syntax via -mllvm --x86-asm-syntax=intel, I get
> >         #APP
> >         mov     eax, offset foo
> >         #NO_APP
> >         #APP
> >
> >         mov     ebx, dword ptr [bar]
> >
> >         #NO_APP
> >
> > (I have no idea where those extra newlines are coming from.)
> >
> > If I try to change the assembly template to "mov ebx, offset $0" it complains about multiple symbols being present:
> > <inline asm>:2:18: error: cannot use more than one symbol in memory operand
> >         mov ebx, offset bar
> >
> > I attached my source file and my modified .ll file. I compiled the source file with
> >
> >     clang -m32 a.c -ffreestanding -nostdlib -S -emit-llvm
> >
> > $ clang --version
> > clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
> > Target: x86_64-pc-linux-gnu
> > Thread model: posix
> > InstalledDir: /usr/bin
> >
> > Is this an LLVM bug or am I misusing inteldialect?
> >
> > Thank you,
> >
> > Steve
> >
> > --
> > Stephen Checkoway
> >
> >
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > [hidden email]
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> --
> Stephen Checkoway
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--
Stephen Checkoway





_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev