[llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

Alberto Barbaro via llvm-dev
Hi @ll,

targetting i386, LLVM/clang generates wrong code for the following
functions:

unsigned long __bswapsi2 (unsigned long ul)
{
    return (((ul) & 0xff000000ul) >> 3 * 8)
         | (((ul) & 0x00ff0000ul) >>     8)
         | (((ul) & 0x0000ff00ul) <<     8)
         | (((ul) & 0x000000fful) << 3 * 8);
}

unsigned long long __bswapdi2(unsigned long long ull)
{
    return ((ull & 0xff00000000000000ull) >> 7 * 8)
         | ((ull & 0x00ff000000000000ull) >> 5 * 8)
         | ((ull & 0x0000ff0000000000ull) >> 3 * 8)
         | ((ull & 0x000000ff00000000ull) >>     8)
         | ((ull & 0x00000000ff000000ull) <<     8)
         | ((ull & 0x0000000000ff0000ull) << 3 * 8)
         | ((ull & 0x000000000000ff00ull) << 5 * 8)
         | ((ull & 0x00000000000000ffull) << 7 * 8);
}

You can find these sources in "compiler-rt/lib/builtins/bswapsi2.c"
and "compiler-rt/lib/builtins/bswapdi2.c", for example!


Compiled with "-O3 -target i386" this yields the following code
(see <https://godbolt.org/z/F4UIl4>):

__bswapsi2: # @__bswapsi2
    push  ebp
    mov   ebp, esp
    mov   eax, dword ptr [ebp + 8]
    bswap eax
    pop   ebp
    ret

__bswapdi2: # @__bswapdi2
    push  ebp
    mov   ebp, esp
    mov   edx, dword ptr [ebp + 8]
    mov   eax, dword ptr [ebp + 12]
    bswap eax
    bswap edx
    pop   ebp
    ret

__bswapsi2() is correct, but __bswapdi2() NOT: swapping just the
halves of a "long long" is OBVIOUSLY WRONG!

From the C source, the expected result for the input value
0x0123456789ABCDEF is 0xEFCDAB8967452301; the compiled code but
produces 0x67452301EFCDAB89


And compiled for x86-64 this yields the following code (see
<https://godbolt.org/z/uM9nvN>):

__bswapsi2: # @__bswapsi2
    mov   eax, edi
    shr   eax, 24
    mov   rcx, rdi
    shr   rcx, 8
    and   ecx, 65280
    or    rax, rcx
    mov   rcx, rdi
    shl   rcx, 8
    and   ecx, 16711680
    or    rax, rcx
    and   rdi, 255
    shl   rdi, 24
    or    rax, rdi
    ret

__bswapdi2: # @__bswapdi2
    bswap rdi
    mov   rax, rdi
    ret

Both are correct, but __bswapsi2() should of course use BSWAP too!


Stefan Kanthak

PS: for comparision with another compiler, take a look at
    <https://skanthak.homepage.t-online.de/msvc.html#example5>
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

Alberto Barbaro via llvm-dev
bswapsi2 on the x86-64 isn't using the bswap instruction because "unsigned long" is 64-bits on x86-64 linux. But its 32-bits on x86-64 msvc.

Not sure about the bswapdi2 i386 case.


~Craig


On Sun, Nov 25, 2018 at 8:03 AM Stefan Kanthak via llvm-dev <[hidden email]> wrote:
Hi @ll,

targetting i386, LLVM/clang generates wrong code for the following
functions:

unsigned long __bswapsi2 (unsigned long ul)
{
    return (((ul) & 0xff000000ul) >> 3 * 8)
         | (((ul) & 0x00ff0000ul) >>     8)
         | (((ul) & 0x0000ff00ul) <<     8)
         | (((ul) & 0x000000fful) << 3 * 8);
}

unsigned long long __bswapdi2(unsigned long long ull)
{
    return ((ull & 0xff00000000000000ull) >> 7 * 8)
         | ((ull & 0x00ff000000000000ull) >> 5 * 8)
         | ((ull & 0x0000ff0000000000ull) >> 3 * 8)
         | ((ull & 0x000000ff00000000ull) >>     8)
         | ((ull & 0x00000000ff000000ull) <<     8)
         | ((ull & 0x0000000000ff0000ull) << 3 * 8)
         | ((ull & 0x000000000000ff00ull) << 5 * 8)
         | ((ull & 0x00000000000000ffull) << 7 * 8);
}

You can find these sources in "compiler-rt/lib/builtins/bswapsi2.c"
and "compiler-rt/lib/builtins/bswapdi2.c", for example!


Compiled with "-O3 -target i386" this yields the following code
(see <https://godbolt.org/z/F4UIl4>):

__bswapsi2: # @__bswapsi2
    push  ebp
    mov   ebp, esp
    mov   eax, dword ptr [ebp + 8]
    bswap eax
    pop   ebp
    ret

__bswapdi2: # @__bswapdi2
    push  ebp
    mov   ebp, esp
    mov   edx, dword ptr [ebp + 8]
    mov   eax, dword ptr [ebp + 12]
    bswap eax
    bswap edx
    pop   ebp
    ret

__bswapsi2() is correct, but __bswapdi2() NOT: swapping just the
halves of a "long long" is OBVIOUSLY WRONG!

From the C source, the expected result for the input value
0x0123456789ABCDEF is 0xEFCDAB8967452301; the compiled code but
produces 0x67452301EFCDAB89


And compiled for x86-64 this yields the following code (see
<https://godbolt.org/z/uM9nvN>):

__bswapsi2: # @__bswapsi2
    mov   eax, edi
    shr   eax, 24
    mov   rcx, rdi
    shr   rcx, 8
    and   ecx, 65280
    or    rax, rcx
    mov   rcx, rdi
    shl   rcx, 8
    and   ecx, 16711680
    or    rax, rcx
    and   rdi, 255
    shl   rdi, 24
    or    rax, rdi
    ret

__bswapdi2: # @__bswapdi2
    bswap rdi
    mov   rax, rdi
    ret

Both are correct, but __bswapsi2() should of course use BSWAP too!


Stefan Kanthak

PS: for comparision with another compiler, take a look at
    <https://skanthak.homepage.t-online.de/msvc.html#example5>
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

Alberto Barbaro via llvm-dev
bswapdi2 for i386 is correct

Bits 31:0 of the source are loaded into edx. Bits 63:32 are loaded into eax. Those are each bswapped. The ABI for the return is edx contains bits [63:32] and eax contains [31:0]. This is opposite of how the register were loaded.

~Craig


On Sun, Nov 25, 2018 at 10:36 AM Craig Topper <[hidden email]> wrote:
bswapsi2 on the x86-64 isn't using the bswap instruction because "unsigned long" is 64-bits on x86-64 linux. But its 32-bits on x86-64 msvc.

Not sure about the bswapdi2 i386 case.


~Craig


On Sun, Nov 25, 2018 at 8:03 AM Stefan Kanthak via llvm-dev <[hidden email]> wrote:
Hi @ll,

targetting i386, LLVM/clang generates wrong code for the following
functions:

unsigned long __bswapsi2 (unsigned long ul)
{
    return (((ul) & 0xff000000ul) >> 3 * 8)
         | (((ul) & 0x00ff0000ul) >>     8)
         | (((ul) & 0x0000ff00ul) <<     8)
         | (((ul) & 0x000000fful) << 3 * 8);
}

unsigned long long __bswapdi2(unsigned long long ull)
{
    return ((ull & 0xff00000000000000ull) >> 7 * 8)
         | ((ull & 0x00ff000000000000ull) >> 5 * 8)
         | ((ull & 0x0000ff0000000000ull) >> 3 * 8)
         | ((ull & 0x000000ff00000000ull) >>     8)
         | ((ull & 0x00000000ff000000ull) <<     8)
         | ((ull & 0x0000000000ff0000ull) << 3 * 8)
         | ((ull & 0x000000000000ff00ull) << 5 * 8)
         | ((ull & 0x00000000000000ffull) << 7 * 8);
}

You can find these sources in "compiler-rt/lib/builtins/bswapsi2.c"
and "compiler-rt/lib/builtins/bswapdi2.c", for example!


Compiled with "-O3 -target i386" this yields the following code
(see <https://godbolt.org/z/F4UIl4>):

__bswapsi2: # @__bswapsi2
    push  ebp
    mov   ebp, esp
    mov   eax, dword ptr [ebp + 8]
    bswap eax
    pop   ebp
    ret

__bswapdi2: # @__bswapdi2
    push  ebp
    mov   ebp, esp
    mov   edx, dword ptr [ebp + 8]
    mov   eax, dword ptr [ebp + 12]
    bswap eax
    bswap edx
    pop   ebp
    ret

__bswapsi2() is correct, but __bswapdi2() NOT: swapping just the
halves of a "long long" is OBVIOUSLY WRONG!

From the C source, the expected result for the input value
0x0123456789ABCDEF is 0xEFCDAB8967452301; the compiled code but
produces 0x67452301EFCDAB89


And compiled for x86-64 this yields the following code (see
<https://godbolt.org/z/uM9nvN>):

__bswapsi2: # @__bswapsi2
    mov   eax, edi
    shr   eax, 24
    mov   rcx, rdi
    shr   rcx, 8
    and   ecx, 65280
    or    rax, rcx
    mov   rcx, rdi
    shl   rcx, 8
    and   ecx, 16711680
    or    rax, rcx
    and   rdi, 255
    shl   rdi, 24
    or    rax, rdi
    ret

__bswapdi2: # @__bswapdi2
    bswap rdi
    mov   rax, rdi
    ret

Both are correct, but __bswapsi2() should of course use BSWAP too!


Stefan Kanthak

PS: for comparision with another compiler, take a look at
    <https://skanthak.homepage.t-online.de/msvc.html#example5>
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

Alberto Barbaro via llvm-dev
In reply to this post by Alberto Barbaro via llvm-dev
"Craig Topper" <[hidden email]> wrote:

> bswapsi2 on the x86-64 isn't using the bswap instruction because "unsigned
> long" is 64-bits on x86-64 linux.

Believe me: I KNOW THIS!
Now take one more THOROUGH look at the C source of _bswapsi2():
1. ALL 4  & operations clear the higher 32 bits;
2. the 4 >> operations BSWAP the lower 32 bits.

Assuming that the argument was loaded into RDI,  this is a
    BSWAP EDI
in Intel x86-64 machine code!
Remember that in "long mode", writing to a 32-bit register
clears the higher 32 bits of the respective 64-bit register.

AGAIN: why doesn't LLVM generate the BSWAP operation here?

The expected code is

__bswapsi2: # @__bswapsi2
     bswap edi
     mov   rax, rdi
     ret

not amused
Stefan Kanthak

> But its 32-bits on x86-64 msvc.
>
> Not sure about the bswapdi2 i386 case.
>
>
> ~Craig
>
>
> On Sun, Nov 25, 2018 at 8:03 AM Stefan Kanthak via llvm-dev <
> [hidden email]> wrote:
>
>> Hi @ll,
>>
>> targetting i386, LLVM/clang generates wrong code for the following
>> functions:
>>
>> unsigned long __bswapsi2 (unsigned long ul)
>> {
>>     return (((ul) & 0xff000000ul) >> 3 * 8)
>>          | (((ul) & 0x00ff0000ul) >>     8)
>>          | (((ul) & 0x0000ff00ul) <<     8)
>>          | (((ul) & 0x000000fful) << 3 * 8);
>> }
>>
>> unsigned long long __bswapdi2(unsigned long long ull)
>> {
>>     return ((ull & 0xff00000000000000ull) >> 7 * 8)
>>          | ((ull & 0x00ff000000000000ull) >> 5 * 8)
>>          | ((ull & 0x0000ff0000000000ull) >> 3 * 8)
>>          | ((ull & 0x000000ff00000000ull) >>     8)
>>          | ((ull & 0x00000000ff000000ull) <<     8)
>>          | ((ull & 0x0000000000ff0000ull) << 3 * 8)
>>          | ((ull & 0x000000000000ff00ull) << 5 * 8)
>>          | ((ull & 0x00000000000000ffull) << 7 * 8);
>> }
>>
>> You can find these sources in "compiler-rt/lib/builtins/bswapsi2.c"
>> and "compiler-rt/lib/builtins/bswapdi2.c", for example!
>>
>>
>> Compiled with "-O3 -target i386" this yields the following code
>> (see <https://godbolt.org/z/F4UIl4>):
>>
>> __bswapsi2: # @__bswapsi2
>>     push  ebp
>>     mov   ebp, esp
>>     mov   eax, dword ptr [ebp + 8]
>>     bswap eax
>>     pop   ebp
>>     ret
>>
>> __bswapdi2: # @__bswapdi2
>>     push  ebp
>>     mov   ebp, esp
>>     mov   edx, dword ptr [ebp + 8]
>>     mov   eax, dword ptr [ebp + 12]
>>     bswap eax
>>     bswap edx
>>     pop   ebp
>>     ret
>>
>> __bswapsi2() is correct, but __bswapdi2() NOT: swapping just the
>> halves of a "long long" is OBVIOUSLY WRONG!
>>
>> From the C source, the expected result for the input value
>> 0x0123456789ABCDEF is 0xEFCDAB8967452301; the compiled code but
>> produces 0x67452301EFCDAB89
>>
>>
>> And compiled for x86-64 this yields the following code (see
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

>> <https://godbolt.org/z/uM9nvN>):
>>
>> __bswapsi2: # @__bswapsi2
>>     mov   eax, edi
>>     shr   eax, 24
>>     mov   rcx, rdi
>>     shr   rcx, 8
>>     and   ecx, 65280
>>     or    rax, rcx
>>     mov   rcx, rdi
>>     shl   rcx, 8
>>     and   ecx, 16711680
>>     or    rax, rcx
>>     and   rdi, 255
>>     shl   rdi, 24
>>     or    rax, rdi
>>     ret
>>
>> __bswapdi2: # @__bswapdi2
>>     bswap rdi
>>     mov   rax, rdi
>>     ret
>>
>> Both are correct, but __bswapsi2() should of course use BSWAP too!
>>
>>
>> Stefan Kanthak
>>
>> PS: for comparision with another compiler, take a look at
>>     <https://skanthak.homepage.t-online.de/msvc.html#example5>
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

Alberto Barbaro via llvm-dev
In reply to this post by Alberto Barbaro via llvm-dev
"Craig Topper" <[hidden email]> wrote:

> bswapdi2 for i386 is correct

OUCH!

> Bits 31:0 of the source are loaded into edx. Bits 63:32 are loaded into
> eax. Those are each bswapped.

This exchanges the high byte of each 32-bit PART with its low byte, but
NOT the high byte of the whole 64-bit operand with its low byte!

Please get a clue!

> The ABI for the return is edx contains bits [63:32] and eax contains
> [31:0]. This is opposite of how the register were loaded.

My post is NOT about swapping EDX with EAX, but the bytes WITHIN both.

With the 64-bit argument loaded into EDX:EAX, the instruction sequence

    bswap  edx
    bswap  eax
    xchg   eax, edx

is NOT equivalent to

    bswap    rdi

with the 64-bit argument loaded into RDI.

Just run the following code on x86-64:

    mov    rdi, 0123456789abcdefh    ; pass (fake) argument in RDI
; split argument into high and low part
    mov    rdx, rdi
    shr    rdx, 32                   ; high part in EDX
    mov    eax, rdi                  ; low part in EAX
; perform __bswapdi2() as in 32-bit mode
    xchg   eax, edx                  ; swap parts, argument now loaded
                                     ;  like in 32-bit mode
    bswap  edx
    bswap  eax                       ; result like that in 32-bit mode
; load result into 64-bit register
    shl    rdx, 32
    or     rax, rdx
; perform _bswapdi2() in native 64-bit mode
    bswap  rdi
; compare results
    xor    rax, rdi

not amused
Stefan Kanthak

> On Sun, Nov 25, 2018 at 10:36 AM Craig Topper <[hidden email]>
> wrote:
>
>> bswapsi2 on the x86-64 isn't using the bswap instruction because "unsigned
>> long" is 64-bits on x86-64 linux. But its 32-bits on x86-64 msvc.
>>
>> Not sure about the bswapdi2 i386 case.
>>
>>
>> ~Craig
>>
>>
>> On Sun, Nov 25, 2018 at 8:03 AM Stefan Kanthak via llvm-dev <
>> [hidden email]> wrote:
>>
>>> Hi @ll,
>>>
>>> targetting i386, LLVM/clang generates wrong code for the following
>>> functions:
>>>
>>> unsigned long __bswapsi2 (unsigned long ul)
>>> {
>>>     return (((ul) & 0xff000000ul) >> 3 * 8)
>>>          | (((ul) & 0x00ff0000ul) >>     8)
>>>          | (((ul) & 0x0000ff00ul) <<     8)
>>>          | (((ul) & 0x000000fful) << 3 * 8);
>>> }
>>>
>>> unsigned long long __bswapdi2(unsigned long long ull)
>>> {
>>>     return ((ull & 0xff00000000000000ull) >> 7 * 8)
>>>          | ((ull & 0x00ff000000000000ull) >> 5 * 8)
>>>          | ((ull & 0x0000ff0000000000ull) >> 3 * 8)
>>>          | ((ull & 0x000000ff00000000ull) >>     8)
>>>          | ((ull & 0x00000000ff000000ull) <<     8)
>>>          | ((ull & 0x0000000000ff0000ull) << 3 * 8)
>>>          | ((ull & 0x000000000000ff00ull) << 5 * 8)
>>>          | ((ull & 0x00000000000000ffull) << 7 * 8);
>>> }
>>>
>>> You can find these sources in "compiler-rt/lib/builtins/bswapsi2.c"
>>> and "compiler-rt/lib/builtins/bswapdi2.c", for example!
>>>
>>>
>>> Compiled with "-O3 -target i386" this yields the following code
>>> (see <https://godbolt.org/z/F4UIl4>):
>>>
>>> __bswapsi2: # @__bswapsi2
>>>     push  ebp
>>>     mov   ebp, esp
>>>     mov   eax, dword ptr [ebp + 8]
>>>     bswap eax
>>>     pop   ebp
>>>     ret
>>>
>>> __bswapdi2: # @__bswapdi2
>>>     push  ebp
>>>     mov   ebp, esp
>>>     mov   edx, dword ptr [ebp + 8]
>>>     mov   eax, dword ptr [ebp + 12]
>>>     bswap eax
>>>     bswap edx
>>>     pop   ebp
>>>     ret
>>>
>>> __bswapsi2() is correct, but __bswapdi2() NOT: swapping just the
>>> halves of a "long long" is OBVIOUSLY WRONG!
>>>
>>> From the C source, the expected result for the input value
>>> 0x0123456789ABCDEF is 0xEFCDAB8967452301; the compiled code but
>>> produces 0x67452301EFCDAB89
>>>
>>>
>>> And compiled for x86-64 this yields the following code (see
>>> <https://godbolt.org/z/uM9nvN>):
>>>
>>> __bswapsi2: # @__bswapsi2
>>>     mov   eax, edi
>>>     shr   eax, 24
>>>     mov   rcx, rdi
>>>     shr   rcx, 8
>>>     and   ecx, 65280
>>>     or    rax, rcx
>>>     mov   rcx, rdi
>>>     shl   rcx, 8
>>>     and   ecx, 16711680
>>>     or    rax, rcx
>>>     and   rdi, 255
>>>     shl   rdi, 24
>>>     or    rax, rdi
>>>     ret
>>>
>>> __bswapdi2: # @__bswapdi2
>>>     bswap rdi
>>>     mov   rax, rdi
>>>     ret
>>>
>>> Both are correct, but __bswapsi2() should of course use BSWAP too!
>>>
>>>
>>> Stefan Kanthak
>>>
>>> PS: for comparision with another compiler, take a look at
>>>     <https://skanthak.homepage.t-online.de/msvc.html#example5>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> [hidden email]
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

Alberto Barbaro via llvm-dev
I just compiled the two attached files in 32-bit mode and ran it.

It printed efcdab8967452301.

I verified via objdump that the my_bswap function contains the follow assembly which I believe matches the assembly you linked to on godbolt.

_my_bswap:
    1f70: 55 pushl %ebp
    1f71: 89 e5 movl %esp, %ebp
    1f73: 8b 55 08 movl 8(%ebp), %edx
    1f76: 8b 45 0c movl 12(%ebp), %eax
    1f79: 0f c8 bswapl %eax
    1f7b: 0f ca bswapl %edx
    1f7d: 5d popl %ebp
    1f7e: c3 retl


~Craig


On Sun, Nov 25, 2018 at 11:39 AM Stefan Kanthak <[hidden email]> wrote:
"Craig Topper" <[hidden email]> wrote:

> bswapdi2 for i386 is correct

OUCH!

> Bits 31:0 of the source are loaded into edx. Bits 63:32 are loaded into
> eax. Those are each bswapped.

This exchanges the high byte of each 32-bit PART with its low byte, but
NOT the high byte of the whole 64-bit operand with its low byte!

Please get a clue!

> The ABI for the return is edx contains bits [63:32] and eax contains
> [31:0]. This is opposite of how the register were loaded.

My post is NOT about swapping EDX with EAX, but the bytes WITHIN both.

With the 64-bit argument loaded into EDX:EAX, the instruction sequence

    bswap  edx
    bswap  eax
    xchg   eax, edx

is NOT equivalent to

    bswap    rdi

with the 64-bit argument loaded into RDI.

Just run the following code on x86-64:

    mov    rdi, 0123456789abcdefh    ; pass (fake) argument in RDI
; split argument into high and low part
    mov    rdx, rdi
    shr    rdx, 32                   ; high part in EDX
    mov    eax, rdi                  ; low part in EAX
; perform __bswapdi2() as in 32-bit mode
    xchg   eax, edx                  ; swap parts, argument now loaded
                                     ;  like in 32-bit mode
    bswap  edx
    bswap  eax                       ; result like that in 32-bit mode
; load result into 64-bit register
    shl    rdx, 32
    or     rax, rdx
; perform _bswapdi2() in native 64-bit mode
    bswap  rdi
; compare results
    xor    rax, rdi

not amused
Stefan Kanthak

> On Sun, Nov 25, 2018 at 10:36 AM Craig Topper <[hidden email]>
> wrote:
>
>> bswapsi2 on the x86-64 isn't using the bswap instruction because "unsigned
>> long" is 64-bits on x86-64 linux. But its 32-bits on x86-64 msvc.
>>
>> Not sure about the bswapdi2 i386 case.
>>
>>
>> ~Craig
>>
>>
>> On Sun, Nov 25, 2018 at 8:03 AM Stefan Kanthak via llvm-dev <
>> [hidden email]> wrote:
>>
>>> Hi @ll,
>>>
>>> targetting i386, LLVM/clang generates wrong code for the following
>>> functions:
>>>
>>> unsigned long __bswapsi2 (unsigned long ul)
>>> {
>>>     return (((ul) & 0xff000000ul) >> 3 * 8)
>>>          | (((ul) & 0x00ff0000ul) >>     8)
>>>          | (((ul) & 0x0000ff00ul) <<     8)
>>>          | (((ul) & 0x000000fful) << 3 * 8);
>>> }
>>>
>>> unsigned long long __bswapdi2(unsigned long long ull)
>>> {
>>>     return ((ull & 0xff00000000000000ull) >> 7 * 8)
>>>          | ((ull & 0x00ff000000000000ull) >> 5 * 8)
>>>          | ((ull & 0x0000ff0000000000ull) >> 3 * 8)
>>>          | ((ull & 0x000000ff00000000ull) >>     8)
>>>          | ((ull & 0x00000000ff000000ull) <<     8)
>>>          | ((ull & 0x0000000000ff0000ull) << 3 * 8)
>>>          | ((ull & 0x000000000000ff00ull) << 5 * 8)
>>>          | ((ull & 0x00000000000000ffull) << 7 * 8);
>>> }
>>>
>>> You can find these sources in "compiler-rt/lib/builtins/bswapsi2.c"
>>> and "compiler-rt/lib/builtins/bswapdi2.c", for example!
>>>
>>>
>>> Compiled with "-O3 -target i386" this yields the following code
>>> (see <https://godbolt.org/z/F4UIl4>):
>>>
>>> __bswapsi2: # @__bswapsi2
>>>     push  ebp
>>>     mov   ebp, esp
>>>     mov   eax, dword ptr [ebp + 8]
>>>     bswap eax
>>>     pop   ebp
>>>     ret
>>>
>>> __bswapdi2: # @__bswapdi2
>>>     push  ebp
>>>     mov   ebp, esp
>>>     mov   edx, dword ptr [ebp + 8]
>>>     mov   eax, dword ptr [ebp + 12]
>>>     bswap eax
>>>     bswap edx
>>>     pop   ebp
>>>     ret
>>>
>>> __bswapsi2() is correct, but __bswapdi2() NOT: swapping just the
>>> halves of a "long long" is OBVIOUSLY WRONG!
>>>
>>> From the C source, the expected result for the input value
>>> 0x0123456789ABCDEF is 0xEFCDAB8967452301; the compiled code but
>>> produces 0x67452301EFCDAB89
>>>
>>>
>>> And compiled for x86-64 this yields the following code (see
>>> <https://godbolt.org/z/uM9nvN>):
>>>
>>> __bswapsi2: # @__bswapsi2
>>>     mov   eax, edi
>>>     shr   eax, 24
>>>     mov   rcx, rdi
>>>     shr   rcx, 8
>>>     and   ecx, 65280
>>>     or    rax, rcx
>>>     mov   rcx, rdi
>>>     shl   rcx, 8
>>>     and   ecx, 16711680
>>>     or    rax, rcx
>>>     and   rdi, 255
>>>     shl   rdi, 24
>>>     or    rax, rdi
>>>     ret
>>>
>>> __bswapdi2: # @__bswapdi2
>>>     bswap rdi
>>>     mov   rax, rdi
>>>     ret
>>>
>>> Both are correct, but __bswapsi2() should of course use BSWAP too!
>>>
>>>
>>> Stefan Kanthak
>>>
>>> PS: for comparision with another compiler, take a look at
>>>     <https://skanthak.homepage.t-online.de/msvc.html#example5>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> [hidden email]
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

test.c (254 bytes) Download Attachment
my_bswap.c (648 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

Alberto Barbaro via llvm-dev
In reply to this post by Alberto Barbaro via llvm-dev
On 11/25/2018 11:38 AM, Stefan Kanthak via llvm-dev wrote:

> "Craig Topper" <[hidden email]> wrote:
>
>> bswapdi2 for i386 is correct
>
> OUCH!
>
>> Bits 31:0 of the source are loaded into edx. Bits 63:32 are loaded into
>> eax. Those are each bswapped.
>
> This exchanges the high byte of each 32-bit PART with its low byte, but
> NOT the high byte of the whole 64-bit operand with its low byte!

Incorrect.

This reverses the bytes within the low and high 32-bit halves, and also
swaps the two halves, which _is_ equivalent to byte-reversing the entire
64-bit value.

It's a classic divide-and-conquer approach: one way to reverse a
sequence is to cut it into two contiguous pieces (doesn't matter how; in
this case, split at the halfway point), reverse the two pieces
individually (the two BSWAPs), and then combine the two pieces in
reverse order (accomplished implicitly by loading the original top half
into eax and the bottom half into edx).

> Please get a clue!

Throwing insults around will not help you get your problem resolved any
quicker, it will just get you ignored.

> Just run the following code on x86-64:
>
>      mov    rdi, 0123456789abcdefh    ; pass (fake) argument in RDI
> ; split argument into high and low part
>      mov    rdx, rdi
>      shr    rdx, 32                   ; high part in EDX
>      mov    eax, rdi                  ; low part in EAX
> ; perform __bswapdi2() as in 32-bit mode
>      xchg   eax, edx                  ; swap parts, argument now loaded
>                                       ;  like in 32-bit mode
>      bswap  edx
>      bswap  eax                       ; result like that in 32-bit mode
> ; load result into 64-bit register
>      shl    rdx, 32
>      or     rax, rdx
> ; perform _bswapdi2() in native 64-bit mode
>      bswap  rdi
> ; compare results
>      xor    rax, rdi

Syntax error aside (that should be "mov eax, edi" not "mov eax, rdi" up
there), I get:

(gdb) set disassembly-flavor intel
(gdb) x/20i main
    0x4004a0 <main>:     movabs rdi,0x123456789abcdef
    0x4004aa <main+10>:  mov    rdx,rdi
    0x4004ad <main+13>:  shr    rdx,0x20
    0x4004b1 <main+17>:  mov    eax,edi
    0x4004b3 <main+19>:  xchg   edx,eax
    0x4004b4 <main+20>:  bswap  edx
    0x4004b6 <main+22>:  bswap  eax
    0x4004b8 <main+24>:  shl    rdx,0x20
    0x4004bc <main+28>:  or     rax,rdx
    0x4004bf <main+31>:  bswap  rdi
    0x4004c2 <main+34>:  xor    rax,rdi
    0x4004c5 <main+37>:  int3
    0x4004c6 <main+38>:  nop    WORD PTR cs:[rax+rax*1+0x0]
    0x4004d0 <__libc_csu_init>:  push   r15
    0x4004d2 <__libc_csu_init+2>:        mov    r15,rdx
    0x4004d5 <__libc_csu_init+5>:        push   r14
    0x4004d7 <__libc_csu_init+7>:        mov    r14,rsi
    0x4004da <__libc_csu_init+10>:       push   r13
    0x4004dc <__libc_csu_init+12>:       mov    r13d,edi
    0x4004df <__libc_csu_init+15>:       push   r12
(gdb) r
Starting program: /home/fabiang/code/bswap/bswap

Program received signal SIGTRAP, Trace/breakpoint trap.
0x00000000004004c6 in main ()
(gdb) x/i $pc
=> 0x4004c6 <main+38>:  nop    WORD PTR cs:[rax+rax*1+0x0]
(gdb) p $rax
$1 = 0

exactly as it should be. So your point is?

-Fabian
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

Alberto Barbaro via llvm-dev
In reply to this post by Alberto Barbaro via llvm-dev
"Craig Topper" <[hidden email]> wrote:

>I just compiled the two attached files in 32-bit mode and ran it.
>
> It printed efcdab8967452301.
>
> I verified via objdump that the my_bswap function contains the follow
> assembly which I believe matches the assembly you linked to on godbolt.
>
> _my_bswap:
>    1f70: 55 pushl %ebp
>    1f71: 89 e5 movl %esp, %ebp
>    1f73: 8b 55 08 movl 8(%ebp), %edx
>    1f76: 8b 45 0c movl 12(%ebp), %eax
>    1f79: 0f c8 bswapl %eax
>    1f7b: 0f ca bswapl %edx
>    1f7d: 5d popl %ebp
>    1f7e: c3 retl

Contrary to my initial WRONG claim this code is CORRECT: two 32-bit BSWAP
instructions on the swapped halves of a 64-bit register are equivalent to
a 64-bit BSWAP instruction.
Sorry for the confusion, I should have slept another night before responding.

The other part of my post but holds: the following function, compiled for
x86-64, is NOT properly optimized (see <https://godbolt.org/z/uM9nvN>):

unsigned long __bswapsi2 (unsigned long ul)
{
    return (ul >> 3 * 8) & 0xff000000ul
         | (ul >>     8) & 0x00ff0000ul
         | (ul <<     8) & 0x0000ff00ul
         | (ul << 3 * 8) & 0x000000fful;
}

The expected optimized code is

__bswapsi2: # @__bswapsi2
    bswap edi
    mov   eax, edi
    ret

regards
Stefan Kanthak

[ fullquote removed ]
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

Alberto Barbaro via llvm-dev
On 11/26/2018 4:11 AM, Stefan Kanthak via llvm-dev wrote:

> The other part of my post but holds: the following function, compiled for
> x86-64, is NOT properly optimized (see <https://godbolt.org/z/uM9nvN>):
>
> unsigned long __bswapsi2 (unsigned long ul)
> {
>      return (ul >> 3 * 8) & 0xff000000ul
>           | (ul >>     8) & 0x00ff0000ul
>           | (ul <<     8) & 0x0000ff00ul
>           | (ul << 3 * 8) & 0x000000fful;
> }

That looks like another instance of truncation in the wrong places
causing the pattern matches to fail. There have been a couple of those
in the past with other things like rotates.

This is indeed a bug, but if you need a quick workaround, the pattern
does work when you use a 32-bit (e.g. unsigned int or uint32_t) type.

-Fabian
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] BUGS n code generated for target i386 compiling __bswapdi3, and for target x86-64 compiling __bswapsi2()

Alberto Barbaro via llvm-dev
In this case there are no truncates. Everything is a 64-bit type. The bswap pattern is matched by the middle end of the compiler with no target specific knowledge. The middle end sees the type is 64 bits and so doesn't match it as bswap because only the lower half is swapped. The X86 specific backend doesn't do any BSWAP matching so we don't apply the x86 specific knowledge that zeroing the upper bits is free.


~Craig


On Mon, Nov 26, 2018 at 11:11 AM Fabian Giesen via llvm-dev <[hidden email]> wrote:
On 11/26/2018 4:11 AM, Stefan Kanthak via llvm-dev wrote:
> The other part of my post but holds: the following function, compiled for
> x86-64, is NOT properly optimized (see <https://godbolt.org/z/uM9nvN>):
>
> unsigned long __bswapsi2 (unsigned long ul)
> {
>      return (ul >> 3 * 8) & 0xff000000ul
>           | (ul >>     8) & 0x00ff0000ul
>           | (ul <<     8) & 0x0000ff00ul
>           | (ul << 3 * 8) & 0x000000fful;
> }

That looks like another instance of truncation in the wrong places
causing the pattern matches to fail. There have been a couple of those
in the past with other things like rotates.

This is indeed a bug, but if you need a quick workaround, the pattern
does work when you use a 32-bit (e.g. unsigned int or uint32_t) type.

-Fabian
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev