[llvm-dev] BUGS in code generated for target i386-win32

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] BUGS in code generated for target i386-win32

Alberto Barbaro via llvm-dev
Hi @ll,

LLVM/clang generates wrong code for the following program
(see <https://godbolt.org/z/UZrrkG>):

--- sample.c ---
unsigned __fastcall lfsr32(unsigned argument, unsigned polynomial)
{
    __asm
    {
        add ecx, ecx ; ecx = argument << 1
        sbb eax, eax ; eax = CF ? -1 : 0
        and eax, edx ; eax = CF ? polynomial : 0
        xor eax, ecx ; eax = (argument << 1) ^ (CF ? polynomial : 0)
    }
}

int main()
{
    unsigned lfsr = 123456789;
    unsigned period = 0;

    do
    {
        period++;
#ifdef MITIGATE
        lfsr = lfsr & 0x80000000 ? 0x04C11DB7 ^ (lfsr << 1) : lfsr << 1;
#else
        lfsr = lfsr32(lfsr, 0x04C11DB7);
#endif
    } while (lfsr != 123456789);

    return period;
}
--- EOF ---

Compiled with -O2 -target i386-win32 this yields the following code:

_main: # @main

    xor edx, edx

LBB1_1: # =>This Inner Loop Header: Depth=1

    add ecx, ecx
    sbb eax, eax
    and eax, edx
    xor eax, ecx

    inc edx
    cmp eax, 123456789
    jne LBB1_1
    mov eax, edx
    ret

BUG #1: the compiler fails to allocate (EAX for) the variable "lfsr"!
~~~~~~~
It fails to load the result of the inlined function call, held in EAX,
back into ECX, which holds the first argument with the __fastcall
calling convention, inside the loop starting with LBB1_1:

FIX #1: emit a "mov ecx, eax" immediately after LBB1_1:
~~~~~~~

BUG #2: the variable "lfsr" is NOT initialized!
~~~~~~~
The constant 123456789 is NOT loaded into whatever register holds "lfsr".

FIX #2: emit a "mov eax, 123456789" immediately before LBB1_1:
~~~~~~~

BUG #3: the compiler allocates EDX for the variable "period"!
~~~~~~~
EDX is a volatile register, it is not preserved through function calls;
additionally it holds the second argument with the __fastcall calling
convention, so the constant 0x04C11DB7 MUST be loaded into EDX before
label LBB1_1:

The compiler MUST use another register (of course except ECX, which
holds the first argument with __fastcall, and except EAX, which holds
the return value), for example EBX instead of EDX.

Stefan Kanthak

PS: for comparision with another (likewise buggy) compiler, take a look
    at <https://skanthak.homepage.t-online.de/msvc.html#example9>
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] BUGS in code generated for target i386-win32

Alberto Barbaro via llvm-dev
Hi Stefan,

On Mon, 26 Nov 2018 at 12:37, Stefan Kanthak via llvm-dev
<[hidden email]> wrote:
> LLVM/clang generates wrong code for the following program
> (see <https://godbolt.org/z/UZrrkG>):

It looks like all of these issues come down to mismatched expectations
on what goes into and out of an __asm block. If you look at the MSVC
documentation (which I think Clang is trying to be compatible with),
it warns against assuming known values are in particular registers at
any point, and even using __fastcall at all for a function with an
__asm block for that reason:
https://docs.microsoft.com/en-gb/cpp/assembler/inline/using-and-preserving-registers-in-inline-assembly?view=vs-2017

I'll try to explain a little below how that one mismatch causes the
issues you're seeing.

> BUG #1: the compiler fails to allocate (EAX for) the variable "lfsr"!
> BUG #2: the variable "lfsr" is NOT initialized!

Since the __asm isn't linked (as far as Clang is concerned) to either
input for lfsr32, they're both unused. The compiler has actually
inlined that function as you said, then noticed this lack of use and
decided to completely eliminate the lfsr variable.

You can see this in LLVM IR if you add "-emit-llvm -g0" to the godbolt
command-line. The loop *only* contains a counter and the lonely asm
block

> BUG #3: the compiler allocates EDX for the variable "period"!
> ~~~~~~~
> EDX is a volatile register, it is not preserved through function calls;

There is no function call after inlining, of course, so it doesn't
need to be preserved in this case. It looks like there might be an
issue with an __asm block that actually *makes* a call though:

    __asm {
      call Wibble
    }

doesn't seem to mark registers used by the call as clobbered from my
testing. It's slightly unclear what MSVC's behaviour is in this
situation from the documentation, but some vague poking suggests it
does know that a call instruction might clobber edx.

> additionally it holds the second argument with the __fastcall calling
> convention, so the constant 0x04C11DB7 MUST be loaded into EDX before
> label LBB1_1:

This is the same unused-as-far-as-the-compiler-knows thing as the first two.

Cheers.

Tim.
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] BUGS in code generated for target i386-win32

Alberto Barbaro via llvm-dev
"Tim Northover" <[hidden email]> wrote:

> Hi Stefan,
>
> On Mon, 26 Nov 2018 at 12:37, Stefan Kanthak via llvm-dev
> <[hidden email]> wrote:
>> LLVM/clang generates wrong code for the following program
>> (see <https://godbolt.org/z/UZrrkG>):
>
> It looks like all of these issues come down to mismatched expectations
> on what goes into and out of an __asm block. If you look at the MSVC
> documentation (which I think Clang is trying to be compatible with),
> it warns against assuming known values are in particular registers at
> any point, and even using __fastcall at all for a function with an
> __asm block for that reason:
> https://docs.microsoft.com/en-gb/cpp/assembler/inline/using-and-preserving-registers-in-inline-assembly?view=vs-2017

Trust me: I KNOW THIS DOCUMENTATION!

> I'll try to explain a little below how that one mismatch causes the
> issues you're seeing.
>
>> BUG #1: the compiler fails to allocate (EAX for) the variable "lfsr"!
>> BUG #2: the variable "lfsr" is NOT initialized!
>
> Since the __asm isn't linked (as far as Clang is concerned) to
> either input for lfsr32, they're both unused.

REALLY? Or better: OUCH!
Is Clang NOT aware of the __fastcall calling convention and its
register usage?

Clang does NOT create prolog/epilog here, so it CLEARLY knows that
"argument" is held in ECX and "polynomial" in EDX ... as documented
for __fastcall

JFTR: Clang (like MSVC too) knows VERY well which registers are used
      (clobbered) in an __asm block. See <https://godbolt.org/z/blDIzK>,
      which proves your assumption WRONG!

> The compiler has actually inlined that function as you said, then
> noticed this lack of use and decided to completely eliminate the
> lfsr variable.

1. see above!

2. add "__declspec(noinline)" to the function definition ... and notice
   that the compiler STILL does NOT setup ANY of EAX, ECX and EDX before
   the function call!
   See <https://godbolt.org/z/X9R9w7>, which again proves your assumption
   WRONG.

> You can see this in LLVM IR if you add "-emit-llvm -g0" to the godbolt
> command-line. The loop *only* contains a counter and the lonely asm
> block
>
>> BUG #3: the compiler allocates EDX for the variable "period"!
>> ~~~~~~~
>> EDX is a volatile register, it is not preserved through function calls;
>
> There is no function call after inlining, of course, so it doesn't
> need to be preserved in this case. It looks like there might be an
> issue with an __asm block that actually *makes* a call though:
>
>    __asm {
>      call Wibble
>    }
>
> doesn't seem to mark registers used by the call as clobbered from my
> testing. It's slightly unclear what MSVC's behaviour is in this
> situation from the documentation, but some vague poking suggests it
> does know that a call instruction might clobber edx.

The x86 calling conventions used on Windows (and some more OS) are
VERY WELL KNOWN: EAX, ECX and EDX are NOT preserved through function
calls. [EDX:]EAX is used for the [long] long return value, and ECX
can be clobbered.

Again the addition of "__declspec(noinline)" proves your assumptions
wrong: the function call is NOT inlined any more, but its arguments
are STILL not set up at all ... and the variable lfsr STILL not
initialized.

If the function is declared only, but not defined, then the compiler
sets the arguments up before the call: see <https://godbolt.org/z/4pmOlC>

>> additionally it holds the second argument with the __fastcall calling
>> convention, so the constant 0x04C11DB7 MUST be loaded into EDX before
>> label LBB1_1:
>
> This is the same unused-as-far-as-the-compiler-knows thing as the first two.

OUCH!
<https://godbolt.org/z/X9R9w7> proves this WRONG.

LLVM is apparently REALLY confused by __asm, at least when used inside
a __fastcall function.
My expectation is that AT LEAST a warning message should be printed by
the compiler.

regards
Stefan  Kanthak

PS: I can help myself when a compiler emits wrong code.
    Other people might just rely on the compiler...
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] BUGS in code generated for target i386-win32

Alberto Barbaro via llvm-dev
Hi Stefan,

On Mon, 26 Nov 2018 at 18:01, Stefan Kanthak <[hidden email]> wrote:

> > I'll try to explain a little below how that one mismatch causes the
> > issues you're seeing.
> >
> >> BUG #1: the compiler fails to allocate (EAX for) the variable "lfsr"!
> >> BUG #2: the variable "lfsr" is NOT initialized!
> >
> > Since the __asm isn't linked (as far as Clang is concerned) to
> > either input for lfsr32, they're both unused.
>
> REALLY? Or better: OUCH!
> Is Clang NOT aware of the __fastcall calling convention and its
> register usage?

It is, but it intentionally does not tie that knowledge together with
any __asm blocks, even if they are the only statement in a function
body.

> Clang does NOT create prolog/epilog here, so it CLEARLY knows that
> "argument" is held in ECX and "polynomial" in EDX ... as documented
> for __fastcall

That doesn't follow from the lack of a prologue, though it happens to
be true. There is still no link with the register usage in the __asm
however.

> JFTR: Clang (like MSVC too) knows VERY well which registers are used
>       (clobbered) in an __asm block. See <https://godbolt.org/z/blDIzK>,
>       which proves your assumption WRONG!

I don't believe so. Clobbering a register is very different from
guessing some input value to put into that register prior to the
block. For MSVC you're apparently supposed to write inline asm that
materializes values into registers from elsewhere (variables of one
kind or another) before using them.

I could see the behaviour you're expecting being a reasonable default
for a __declspec(naked) function restricted to a single __asm block,
but it interacts very poorly with C language semantics for functions
that can contain both C and assembly.

> 2. add "__declspec(noinline)" to the function definition ... and notice
>    that the compiler STILL does NOT setup ANY of EAX, ECX and EDX before
>    the function call!
>    See <https://godbolt.org/z/X9R9w7>, which again proves your assumption
>    WRONG.

The noinline prevents actual inlining, but the same inferences can be
(and are) made through inter-procedural analysis. Add "-emit-llvm -g0"
and you'll see your function being called with "undef" arguments and
the complete elimination of lfsr.

> If the function is declared only, but not defined, then the compiler
> sets the arguments up before the call: see <https://godbolt.org/z/4pmOlC>

Indeed.

> >> additionally it holds the second argument with the __fastcall calling
> >> convention, so the constant 0x04C11DB7 MUST be loaded into EDX before
> >> label LBB1_1:
> >
> > This is the same unused-as-far-as-the-compiler-knows thing as the first two.
>
> OUCH!
> <https://godbolt.org/z/X9R9w7> proves this WRONG.

I'm afraid I don't see how. It's an unused variable (as far as Clang
is concerned) so it doesn't need to be materialized anywhere.

> My expectation is that AT LEAST a warning message should be printed by
> the compiler.

For using a register before defining it, perhaps? That's within the
implementable realm I think (at least to cover some cases), though
whether anyone is interested enough to actually do it is another
question.

Cheers.

Tim.
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] BUGS in code generated for target i386-win32

Alberto Barbaro via llvm-dev
It's worth pointing out that we *do* add EAX as an implicit output of every asm block, since some MSVC inline asm does stuff like this in SDK headers and we have to support it:
  int foo(int x) {
    __asm mov eax, x
  }

There's a bug for this somewhere in bugzilla with more details, but I'm late to work and wasn't able to find it in my first search.

We could similarly detect __asm blocks at the start of a function, and add implicit inputs for all parameters, and that would get the desired behavior, with inlining to boot, which MSVC doesn't get. It's worth filing a bug about it in any case.

As Tim says, if you just want clang to leave this code alone, add an explicit return and put __declspec(naked) on the function. That'll turn off all the parameter IPO as well.

Adding warnings for this is out of scope. If we took the time to analyze the user's usage of registers and correlate it with the target calling convention, we might as well take the time to implement this wacky behavior.

On Mon, Nov 26, 2018 at 10:24 AM Tim Northover via llvm-dev <[hidden email]> wrote:
Hi Stefan,

On Mon, 26 Nov 2018 at 18:01, Stefan Kanthak <[hidden email]> wrote:
> > I'll try to explain a little below how that one mismatch causes the
> > issues you're seeing.
> >
> >> BUG #1: the compiler fails to allocate (EAX for) the variable "lfsr"!
> >> BUG #2: the variable "lfsr" is NOT initialized!
> >
> > Since the __asm isn't linked (as far as Clang is concerned) to
> > either input for lfsr32, they're both unused.
>
> REALLY? Or better: OUCH!
> Is Clang NOT aware of the __fastcall calling convention and its
> register usage?

It is, but it intentionally does not tie that knowledge together with
any __asm blocks, even if they are the only statement in a function
body.

> Clang does NOT create prolog/epilog here, so it CLEARLY knows that
> "argument" is held in ECX and "polynomial" in EDX ... as documented
> for __fastcall

That doesn't follow from the lack of a prologue, though it happens to
be true. There is still no link with the register usage in the __asm
however.

> JFTR: Clang (like MSVC too) knows VERY well which registers are used
>       (clobbered) in an __asm block. See <https://godbolt.org/z/blDIzK>,
>       which proves your assumption WRONG!

I don't believe so. Clobbering a register is very different from
guessing some input value to put into that register prior to the
block. For MSVC you're apparently supposed to write inline asm that
materializes values into registers from elsewhere (variables of one
kind or another) before using them.

I could see the behaviour you're expecting being a reasonable default
for a __declspec(naked) function restricted to a single __asm block,
but it interacts very poorly with C language semantics for functions
that can contain both C and assembly.

> 2. add "__declspec(noinline)" to the function definition ... and notice
>    that the compiler STILL does NOT setup ANY of EAX, ECX and EDX before
>    the function call!
>    See <https://godbolt.org/z/X9R9w7>, which again proves your assumption
>    WRONG.

The noinline prevents actual inlining, but the same inferences can be
(and are) made through inter-procedural analysis. Add "-emit-llvm -g0"
and you'll see your function being called with "undef" arguments and
the complete elimination of lfsr.

> If the function is declared only, but not defined, then the compiler
> sets the arguments up before the call: see <https://godbolt.org/z/4pmOlC>

Indeed.

> >> additionally it holds the second argument with the __fastcall calling
> >> convention, so the constant 0x04C11DB7 MUST be loaded into EDX before
> >> label LBB1_1:
> >
> > This is the same unused-as-far-as-the-compiler-knows thing as the first two.
>
> OUCH!
> <https://godbolt.org/z/X9R9w7> proves this WRONG.

I'm afraid I don't see how. It's an unused variable (as far as Clang
is concerned) so it doesn't need to be materialized anywhere.

> My expectation is that AT LEAST a warning message should be printed by
> the compiler.

For using a register before defining it, perhaps? That's within the
implementable realm I think (at least to cover some cases), though
whether anyone is interested enough to actually do it is another
question.

Cheers.

Tim.
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] BUGS in code generated for target i386-win32

Alberto Barbaro via llvm-dev
"Reid Kleckner" <[hidden email]> wrote:

> It's worth pointing out that we *do* add EAX as an implicit output of every
> asm block, since some MSVC inline asm does stuff like this in SDK headers
> and we have to support it:
>  int foo(int x) {
>    __asm mov eax, x
>  }

For functions like this, where the body is only a single __asm block and no
C code around, I'd expect that ... but not for EVERY __asm block.

> There's a bug for this somewhere in bugzilla with more details, but I'm
> late to work and wasn't able to find it in my first search.
>
> We could similarly detect __asm blocks at the start of a function, and add
> implicit inputs for all parameters, and that would get the desired
> behavior, with inlining to boot, which MSVC doesn't get. It's worth filing
> a bug about it in any case.

OK.
But: I don't have an account at LLVM's bugzilla, and I'm not so very inclined
to create one. I don't use clang at all (except with "Compiler Explorer"):
I was just curious how it handles some cases where MSVC and/or ICC failed,
either completely or only their optimiser, and fed it with some of my samples.

> As Tim says, if you just want clang to leave this code alone, add an
> explicit return and put __declspec(naked) on the function. That'll turn off
> all the parameter IPO as well.

And it will turn off inlining as well, just like MSVC!
See <https://godbolt.org/z/LebUqC>

> Adding warnings for this is out of scope. If we took the time to analyze
> the user's usage of registers and correlate it with the target calling
> convention, we might as well take the time to implement this wacky behavior.

I didn't mean SPECIFIC warnings, but just a general one in case of an __asm
block inside a __fastcall function, which says that clang considers the
arguments as NOT referenced and doesn't tie them to ECX and EDX eventually
used in the __asm block.

regards
Stefan

> On Mon, Nov 26, 2018 at 10:24 AM Tim Northover via llvm-dev <
> [hidden email]> wrote:
>
>> Hi Stefan,
>>
>> On Mon, 26 Nov 2018 at 18:01, Stefan Kanthak <[hidden email]>
>> wrote:
>> > > I'll try to explain a little below how that one mismatch causes the
>> > > issues you're seeing.
>> > >
>> > >> BUG #1: the compiler fails to allocate (EAX for) the variable "lfsr"!
>> > >> BUG #2: the variable "lfsr" is NOT initialized!
>> > >
>> > > Since the __asm isn't linked (as far as Clang is concerned) to
>> > > either input for lfsr32, they're both unused.
>> >
>> > REALLY? Or better: OUCH!
>> > Is Clang NOT aware of the __fastcall calling convention and its
>> > register usage?
>>
>> It is, but it intentionally does not tie that knowledge together with
>> any __asm blocks, even if they are the only statement in a function
>> body.
>>
>> > Clang does NOT create prolog/epilog here, so it CLEARLY knows that
>> > "argument" is held in ECX and "polynomial" in EDX ... as documented
>> > for __fastcall
>>
>> That doesn't follow from the lack of a prologue, though it happens to
>> be true. There is still no link with the register usage in the __asm
>> however.
>>
>> > JFTR: Clang (like MSVC too) knows VERY well which registers are used
>> >       (clobbered) in an __asm block. See <https://godbolt.org/z/blDIzK>,
>> >       which proves your assumption WRONG!
>>
>> I don't believe so. Clobbering a register is very different from
>> guessing some input value to put into that register prior to the
>> block. For MSVC you're apparently supposed to write inline asm that
>> materializes values into registers from elsewhere (variables of one
>> kind or another) before using them.
>>
>> I could see the behaviour you're expecting being a reasonable default
>> for a __declspec(naked) function restricted to a single __asm block,
>> but it interacts very poorly with C language semantics for functions
>> that can contain both C and assembly.
>>
>> > 2. add "__declspec(noinline)" to the function definition ... and notice
>> >    that the compiler STILL does NOT setup ANY of EAX, ECX and EDX before
>> >    the function call!
>> >    See <https://godbolt.org/z/X9R9w7>, which again proves your
>> assumption
>> >    WRONG.
>>
>> The noinline prevents actual inlining, but the same inferences can be
>> (and are) made through inter-procedural analysis. Add "-emit-llvm -g0"
>> and you'll see your function being called with "undef" arguments and
>> the complete elimination of lfsr.
>>
>> > If the function is declared only, but not defined, then the compiler
>> > sets the arguments up before the call: see <https://godbolt.org/z/4pmOlC
>> >
>>
>> Indeed.
>>
>> > >> additionally it holds the second argument with the __fastcall calling
>> > >> convention, so the constant 0x04C11DB7 MUST be loaded into EDX before
>> > >> label LBB1_1:
>> > >
>> > > This is the same unused-as-far-as-the-compiler-knows thing as the
>> first two.
>> >
>> > OUCH!
>> > <https://godbolt.org/z/X9R9w7> proves this WRONG.
>>
>> I'm afraid I don't see how. It's an unused variable (as far as Clang
>> is concerned) so it doesn't need to be materialized anywhere.
>>
>> > My expectation is that AT LEAST a warning message should be printed by
>> > the compiler.
>>
>> For using a register before defining it, perhaps? That's within the
>> implementable realm I think (at least to cover some cases), though
>> whether anyone is interested enough to actually do it is another
>> question.
>>
>> Cheers.
>>
>> Tim.
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] BUGS in code generated for target i386-win32

Alberto Barbaro via llvm-dev
In reply to this post by Alberto Barbaro via llvm-dev
Please stop randomly SCREAMING in your emails. It is quite annoying to
follow.

Joerg
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev