interesting possible compiler bug

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

interesting possible compiler bug

Reed Kotler
This code is essentially from an LTP test ( http://ltp.sourceforge.net/ ).


#include <stdlib.h>

int main() {
  void  *curr;

  do {
    curr = malloc(1);
  } while (curr);

  return 0;

}

If you compile it with no optimization, it will keep the malloc calls.

If you compile it with -O2, it will create an infinite loop, i.e. assuming that malloc always returns a non zero result and the result is not used.

~/llvmpb/install/bin/clang loop.c -O2 -S

    .file    "loop.c"
    .text
    .globl    main
    .align    16, 0x90
    .type    main,@function
main:                                   # @main
    .cfi_startproc
# BB#0:                                 # %entry
    .align    16, 0x90
.LBB0_1:                                # %do.body
                                        # =>This Inner Loop Header: Depth=1
    jmp    .LBB0_1
.Ltmp0:
    .size    main, .Ltmp0-main
    .cfi_endproc


    .section    ".note.GNU-stack","",@progbits


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: interesting possible compiler bug

Nick Lewycky-2
On 1 October 2012 18:34, reed kotler <[hidden email]> wrote:
This code is essentially from an LTP test ( http://ltp.sourceforge.net/ ).


#include <stdlib.h>

int main() {
  void  *curr;

  do {
    curr = malloc(1);
  } while (curr);

  return 0;

}

If you compile it with no optimization, it will keep the malloc calls.

If you compile it with -O2, it will create an infinite loop, i.e. assuming that malloc always returns a non zero result and the result is not used.

As far as I know, this optimization is legal. Fix the test with a volatile pointer:

int main() {
  volatile char *curr;

  do {
    curr = malloc(1);
    int i = *curr;
    (void)i;
  } while (curr);

  return 0;
}

which produces:

        pushq   %rax
.Ltmp1:
        movl    $1, %edi
        callq   malloc
        movb    (%rax), %cl
        testq   %rax, %rax
        jne     .LBB0_1
# BB#2:                                 # %do.end
        xorl    %eax, %eax
        popq    %rdx
        ret

Oh how people don't appreciate the luxury of having an infinite memory machine!

Nick

~/llvmpb/install/bin/clang loop.c -O2 -S

    .file    "loop.c"
    .text
    .globl    main
    .align    16, 0x90
    .type    main,@function
main:                                   # @main
    .cfi_startproc
# BB#0:                                 # %entry
    .align    16, 0x90
.LBB0_1:                                # %do.body
                                        # =>This Inner Loop Header: Depth=1
    jmp    .LBB0_1
.Ltmp0:
    .size    main, .Ltmp0-main
    .cfi_endproc


    .section    ".note.GNU-stack","",@progbits


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: interesting possible compiler bug

David Chisnall-5
On 2 Oct 2012, at 03:40, Nick Lewycky wrote:

> As far as I know, this optimization is legal. Fix the test with a volatile pointer:

Why would that be required?  malloc() is defined by the standard to return a pointer that is distinct from any other valid pointer, or NULL.  Any optimisation that makes any assumptions about its next value is invalid.

> int main() {
>   volatile char *curr;
>
>   do {
>     curr = malloc(1);
>     int i = *curr;

This, in particular, looks very wrong.  If curr is void, then you are dereferencing an invalid pointer, and so you are doing something undefined.  In fact, this version of the code is completely free to elide the conditional loop, because by dereferencing the pointer you are asserting that it is not NULL (or, at least, that if it is then after this point the program is in an undefined state and so any behaviour is legal) and so it is completely free to generate the code that it in fact does generate without this test.  So here we have another bug, because the testq in your output is redundant after the movb.

David
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: interesting possible compiler bug

Nick Lewycky
David Chisnall wrote:
> On 2 Oct 2012, at 03:40, Nick Lewycky wrote:
>
>> As far as I know, this optimization is legal. Fix the test with a volatile pointer:
>
> Why would that be required?

It isn't. My suggestion for fixing the test is to make use of the
returned pointer in a fashion that the compiler is forbidden to elide
it: make it an observable side-effect using volatile. Another way would
be to take the pointer and print it to file or stdout. Perhaps you can
think of others.

   malloc() is defined by the standard to return a pointer that is
distinct from any other valid pointer, or NULL.  Any optimisation that
makes any assumptions about its next value is invalid.

Nowhere do we make assumptions about malloc's next value.

This is a straight-forward application of the as-if rule. The malloc
call behaves as-if it allocated memory. Because we prove that the code
doesn't use that memory, we can get away with allocating no memory at
all and not change the behaviour of the program.

But we did change the behaviour of the program, didn't we? Well, we
haven't changed the behaviour of the program in any way that is
observable through a well-defined mechanism. Crashes, running out of
memory, or asking the kernel for your processes' memory usage isn't
behaviour you get to rely on. The first two really aren't defined in the
language, and the last one goes through I/O which is permitted to do its
own thing. (For instance, we don't forbid constant-folding "1+1" because
a program may fopen and disassemble itself to look for the "add 1, 1"
instruction.)

>> int main() {
>>    volatile char *curr;
>>
>>    do {
>>      curr = malloc(1);
>>      int i = *curr;
>
> This, in particular, looks very wrong.  If curr is void, then you are dereferencing an invalid pointer, and so you are doing something undefined.

Do you mean, if curr is NULL? It's a char*, not void*.

In fact, this version of the code is completely free to elide the
conditional loop, because by dereferencing the pointer you are asserting
that it is not NULL (or, at least, that if it is then after this point
the program is in an undefined state and so any behaviour is legal) and
so it is completely free to generate the code that it in fact does
generate without this test.  So here we have another bug, because the
testq in your output is redundant after the movb.

Yes, good point, I totally missed the NULL dereference. I haven't
checked what happens if you write:

   curr = malloc(1);
   if (curr)
     int i = *curr;

but I expect that would work.

Nick
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: interesting possible compiler bug

Reed Kotler
I've sent this issue to some friends on the C++ committee.

The first response I received indicates that the person thinks the
optimizer is within it's rights to optimize away the call to malloc.

Here are some points, extracted from the response:

"There is no observable behavior (a technical term in the standard) that
violates requirements of the standard. In particular, malloc in the
abstract machine defined by the standard need not ever fail."

"On linux in particular, malloc (almost) never fails, because Linux does
not actually make malloc'd memory available until it is used. Here the
allocated memory is never used, so if the compiler recognizes malloc as
a standard library function with well-defined semantics, it can
eliminate the actual call and pretend it succeeded.

Since variable curr is not visible outside main and is not declared
volatile, it can be optimized away."



On 10/02/2012 02:12 AM, Nick Lewycky wrote:

> David Chisnall wrote:
>> On 2 Oct 2012, at 03:40, Nick Lewycky wrote:
>>
>>> As far as I know, this optimization is legal. Fix the test with a
>>> volatile pointer:
>>
>> Why would that be required?
>
> It isn't. My suggestion for fixing the test is to make use of the
> returned pointer in a fashion that the compiler is forbidden to elide
> it: make it an observable side-effect using volatile. Another way would
> be to take the pointer and print it to file or stdout. Perhaps you can
> think of others.
>
>    malloc() is defined by the standard to return a pointer that is
> distinct from any other valid pointer, or NULL.  Any optimisation that
> makes any assumptions about its next value is invalid.
>
> Nowhere do we make assumptions about malloc's next value.
>
> This is a straight-forward application of the as-if rule. The malloc
> call behaves as-if it allocated memory. Because we prove that the code
> doesn't use that memory, we can get away with allocating no memory at
> all and not change the behaviour of the program.
>
> But we did change the behaviour of the program, didn't we? Well, we
> haven't changed the behaviour of the program in any way that is
> observable through a well-defined mechanism. Crashes, running out of
> memory, or asking the kernel for your processes' memory usage isn't
> behaviour you get to rely on. The first two really aren't defined in the
> language, and the last one goes through I/O which is permitted to do its
> own thing. (For instance, we don't forbid constant-folding "1+1" because
> a program may fopen and disassemble itself to look for the "add 1, 1"
> instruction.)
>
>>> int main() {
>>>    volatile char *curr;
>>>
>>>    do {
>>>      curr = malloc(1);
>>>      int i = *curr;
>>
>> This, in particular, looks very wrong.  If curr is void, then you are
>> dereferencing an invalid pointer, and so you are doing something
>> undefined.
>
> Do you mean, if curr is NULL? It's a char*, not void*.
>
> In fact, this version of the code is completely free to elide the
> conditional loop, because by dereferencing the pointer you are asserting
> that it is not NULL (or, at least, that if it is then after this point
> the program is in an undefined state and so any behaviour is legal) and
> so it is completely free to generate the code that it in fact does
> generate without this test.  So here we have another bug, because the
> testq in your output is redundant after the movb.
>
> Yes, good point, I totally missed the NULL dereference. I haven't
> checked what happens if you write:
>
>    curr = malloc(1);
>    if (curr)
>      int i = *curr;
>
> but I expect that would work.
>
> Nick

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: interesting possible compiler bug

Richard Smith-33
See also this proposal for the next C++ committee meeting, which aims to clarify this case and others like it:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3433.html

On Tue, Oct 2, 2012 at 9:49 AM, Reed Kotler <[hidden email]> wrote:
I've sent this issue to some friends on the C++ committee.

The first response I received indicates that the person thinks the optimizer is within it's rights to optimize away the call to malloc.

Here are some points, extracted from the response:

"There is no observable behavior (a technical term in the standard) that violates requirements of the standard. In particular, malloc in the abstract machine defined by the standard need not ever fail."

"On linux in particular, malloc (almost) never fails, because Linux does not actually make malloc'd memory available until it is used. Here the allocated memory is never used, so if the compiler recognizes malloc as a standard library function with well-defined semantics, it can eliminate the actual call and pretend it succeeded.

Since variable curr is not visible outside main and is not declared volatile, it can be optimized away."




On 10/02/2012 02:12 AM, Nick Lewycky wrote:
David Chisnall wrote:
On 2 Oct 2012, at 03:40, Nick Lewycky wrote:

As far as I know, this optimization is legal. Fix the test with a
volatile pointer:

Why would that be required?

It isn't. My suggestion for fixing the test is to make use of the
returned pointer in a fashion that the compiler is forbidden to elide
it: make it an observable side-effect using volatile. Another way would
be to take the pointer and print it to file or stdout. Perhaps you can
think of others.

   malloc() is defined by the standard to return a pointer that is
distinct from any other valid pointer, or NULL.  Any optimisation that
makes any assumptions about its next value is invalid.

Nowhere do we make assumptions about malloc's next value.

This is a straight-forward application of the as-if rule. The malloc
call behaves as-if it allocated memory. Because we prove that the code
doesn't use that memory, we can get away with allocating no memory at
all and not change the behaviour of the program.

But we did change the behaviour of the program, didn't we? Well, we
haven't changed the behaviour of the program in any way that is
observable through a well-defined mechanism. Crashes, running out of
memory, or asking the kernel for your processes' memory usage isn't
behaviour you get to rely on. The first two really aren't defined in the
language, and the last one goes through I/O which is permitted to do its
own thing. (For instance, we don't forbid constant-folding "1+1" because
a program may fopen and disassemble itself to look for the "add 1, 1"
instruction.)

int main() {
   volatile char *curr;

   do {
     curr = malloc(1);
     int i = *curr;

This, in particular, looks very wrong.  If curr is void, then you are
dereferencing an invalid pointer, and so you are doing something
undefined.

Do you mean, if curr is NULL? It's a char*, not void*.

In fact, this version of the code is completely free to elide the
conditional loop, because by dereferencing the pointer you are asserting
that it is not NULL (or, at least, that if it is then after this point
the program is in an undefined state and so any behaviour is legal) and
so it is completely free to generate the code that it in fact does
generate without this test.  So here we have another bug, because the
testq in your output is redundant after the movb.

Yes, good point, I totally missed the NULL dereference. I haven't
checked what happens if you write:

   curr = malloc(1);
   if (curr)
     int i = *curr;

but I expect that would work.

Nick

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: interesting possible compiler bug

Reed Kotler
My friend sent me the first days worth of discussion on this topic from the C committee discussion list and it was very long with many contradictory opinions and citing evidence from a variety of sources.

I don't feel comfortable posting the discussion because it was not from a public forum.

But  I can say there was far from any consensus regarding this issue I originally raised.


On 10/02/2012 03:01 PM, Richard Smith wrote:
See also this proposal for the next C++ committee meeting, which aims to clarify this case and others like it:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3433.html

On Tue, Oct 2, 2012 at 9:49 AM, Reed Kotler <[hidden email]> wrote:
I've sent this issue to some friends on the C++ committee.

The first response I received indicates that the person thinks the optimizer is within it's rights to optimize away the call to malloc.

Here are some points, extracted from the response:

"There is no observable behavior (a technical term in the standard) that violates requirements of the standard. In particular, malloc in the abstract machine defined by the standard need not ever fail."

"On linux in particular, malloc (almost) never fails, because Linux does not actually make malloc'd memory available until it is used. Here the allocated memory is never used, so if the compiler recognizes malloc as a standard library function with well-defined semantics, it can eliminate the actual call and pretend it succeeded.

Since variable curr is not visible outside main and is not declared volatile, it can be optimized away."




On 10/02/2012 02:12 AM, Nick Lewycky wrote:
David Chisnall wrote:
On 2 Oct 2012, at 03:40, Nick Lewycky wrote:

As far as I know, this optimization is legal. Fix the test with a
volatile pointer:

Why would that be required?

It isn't. My suggestion for fixing the test is to make use of the
returned pointer in a fashion that the compiler is forbidden to elide
it: make it an observable side-effect using volatile. Another way would
be to take the pointer and print it to file or stdout. Perhaps you can
think of others.

   malloc() is defined by the standard to return a pointer that is
distinct from any other valid pointer, or NULL.  Any optimisation that
makes any assumptions about its next value is invalid.

Nowhere do we make assumptions about malloc's next value.

This is a straight-forward application of the as-if rule. The malloc
call behaves as-if it allocated memory. Because we prove that the code
doesn't use that memory, we can get away with allocating no memory at
all and not change the behaviour of the program.

But we did change the behaviour of the program, didn't we? Well, we
haven't changed the behaviour of the program in any way that is
observable through a well-defined mechanism. Crashes, running out of
memory, or asking the kernel for your processes' memory usage isn't
behaviour you get to rely on. The first two really aren't defined in the
language, and the last one goes through I/O which is permitted to do its
own thing. (For instance, we don't forbid constant-folding "1+1" because
a program may fopen and disassemble itself to look for the "add 1, 1"
instruction.)

int main() {
   volatile char *curr;

   do {
     curr = malloc(1);
     int i = *curr;

This, in particular, looks very wrong.  If curr is void, then you are
dereferencing an invalid pointer, and so you are doing something
undefined.

Do you mean, if curr is NULL? It's a char*, not void*.

In fact, this version of the code is completely free to elide the
conditional loop, because by dereferencing the pointer you are asserting
that it is not NULL (or, at least, that if it is then after this point
the program is in an undefined state and so any behaviour is legal) and
so it is completely free to generate the code that it in fact does
generate without this test.  So here we have another bug, because the
testq in your output is redundant after the movb.

Yes, good point, I totally missed the NULL dereference. I haven't
checked what happens if you write:

   curr = malloc(1);
   if (curr)
     int i = *curr;

but I expect that would work.

Nick

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: interesting possible compiler bug

Ronan Keryell
In reply to this post by Nick Lewycky-2
>>>>> On Mon, 1 Oct 2012 19:40:58 -0700, Nick Lewycky <[hidden email]> said:

    > On 1 October 2012 18:34, reed kotler <[hidden email]> wrote:
    >     This code is essentially from an LTP test (
    > http://ltp.sourceforge.net/ ).

    >     #include <stdlib.h>
    >     int main() {   void  *curr;
    >       do {     curr = malloc(1);   } while (curr);
    >       return 0;
    >     }

    >     If you compile it with no optimization, it will keep the
    > malloc calls.

    >     If you compile it with -O2, it will create an infinite
    > loop, i.e.  assuming that malloc always returns a non zero
    > result and the result is not used.

    Nick> As far as I know, this optimization is legal.

Well, indeed the result is used by the while and we have a control
dependence on the future execution. The termination or not of the
program is a side effect from the user perspective.

   [...]

    Nick> Oh how people don't appreciate the luxury of having an
    Nick> infinite memory machine!

You have also a potentially infinite loop, which is another kind of
luxury. More boring, especially on the end. :-)

Even if you have a lazy allocation as with the default libC on Linux
that allocates the memory pages only on demand, the memory will blow
up just because the malloc() library itself has to track the allocations
for a potential future free(). Of course, if you have an infinite
memory, you can avoid implementing free() or your interprocedural
analysis may prove you do not call free() in your program. :-)

This makes me feel uncomfortable to have this kind of optimizations with
too many hypothesis on the run-time, even if it addresses domains not
well captured by the norm...

How to know about the real target? It may be another allocation library,
some LD_PRELOAD hack to insert memory verification code or at the end
the code is to be run inside a tool like valgrind to track this very
malloc() bug that would be hidden by this optimization... :-(

Just to give an example from another compiler, in the PIPS
source-to-source compiler the side effect of malloc() is taken into
account by registering a write effect on an abstract location which is
tested later by some optimization phases to avoid too much liberal
optimizations, like the one above.

Anyway, this is an interesting limit case to discuss tonight at the LLVM
Bay-area Social. :-)
--
  Ronan KERYELL                            |\/  Phone:  +1 408 658 9453
  SILKAN Wild Systems                      |/)
  4962 El Camino Real #201                 K    [hidden email]
  Los Altos, CA 94022                      |\   skype:keryell
  USA                                      | \  http://silkan.com

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: interesting possible compiler bug

Reed Kotler
They have been discussing this for a solid two days on the C committee
list with many pro and con opinions, all supported by various parts of
the standard or other sources.

My personal opinion is that it's only a bug if some real program can be
made to fail because of this.

If I were writing this test case for LPT, I would call an external
function and pass the value curr to it.

If I was really paranoid about compiling it with an -O4 compiler, I
would write an assembly language program.

Back in the day, in the ACVS (Ada compiler validation suite), they used
to call functions that were noops but that had some complex recursion
that no compiler could unwind and realise that it was a noop. So that is
probably the safest. I can't remember anymore the exact form of these
recursive functions in the suite as it was 28 years ago that I worked
with that suite.

On 10/04/2012 04:21 PM, Ronan Keryell wrote:

>>>>>> On Mon, 1 Oct 2012 19:40:58 -0700, Nick Lewycky <[hidden email]> said:
>      > On 1 October 2012 18:34, reed kotler <[hidden email]> wrote:
>      >     This code is essentially from an LTP test (
>      > http://ltp.sourceforge.net/ ).
>
>      >     #include <stdlib.h>
>      >     int main() {   void  *curr;
>      >       do {     curr = malloc(1);   } while (curr);
>      >       return 0;
>      >     }
>
>      >     If you compile it with no optimization, it will keep the
>      > malloc calls.
>
>      >     If you compile it with -O2, it will create an infinite
>      > loop, i.e.  assuming that malloc always returns a non zero
>      > result and the result is not used.
>
>      Nick> As far as I know, this optimization is legal.
>
> Well, indeed the result is used by the while and we have a control
> dependence on the future execution. The termination or not of the
> program is a side effect from the user perspective.
>
>     [...]
>
>      Nick> Oh how people don't appreciate the luxury of having an
>      Nick> infinite memory machine!
>
> You have also a potentially infinite loop, which is another kind of
> luxury. More boring, especially on the end. :-)
>
> Even if you have a lazy allocation as with the default libC on Linux
> that allocates the memory pages only on demand, the memory will blow
> up just because the malloc() library itself has to track the allocations
> for a potential future free(). Of course, if you have an infinite
> memory, you can avoid implementing free() or your interprocedural
> analysis may prove you do not call free() in your program. :-)
>
> This makes me feel uncomfortable to have this kind of optimizations with
> too many hypothesis on the run-time, even if it addresses domains not
> well captured by the norm...
>
> How to know about the real target? It may be another allocation library,
> some LD_PRELOAD hack to insert memory verification code or at the end
> the code is to be run inside a tool like valgrind to track this very
> malloc() bug that would be hidden by this optimization... :-(
>
> Just to give an example from another compiler, in the PIPS
> source-to-source compiler the side effect of malloc() is taken into
> account by registering a write effect on an abstract location which is
> tested later by some optimization phases to avoid too much liberal
> optimizations, like the one above.
>
> Anyway, this is an interesting limit case to discuss tonight at the LLVM
> Bay-area Social. :-)

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: interesting possible compiler bug

Chandler Carruth-2
For what it's worth, I will be at the social tonight, and am happy to discuss this paper. I'm a co-author and the instigator the work here to clarify the spec and allow a large number of exciting optimizations on code that have previously not be employed merely because of the use of malloc rather than (for example) alloca.


On Thu, Oct 4, 2012 at 4:40 PM, reed kotler <[hidden email]> wrote:
They have been discussing this for a solid two days on the C committee list with many pro and con opinions, all supported by various parts of the standard or other sources.

My personal opinion is that it's only a bug if some real program can be made to fail because of this.

If I were writing this test case for LPT, I would call an external function and pass the value curr to it.

If I was really paranoid about compiling it with an -O4 compiler, I would write an assembly language program.

Back in the day, in the ACVS (Ada compiler validation suite), they used to call functions that were noops but that had some complex recursion that no compiler could unwind and realise that it was a noop. So that is probably the safest. I can't remember anymore the exact form of these recursive functions in the suite as it was 28 years ago that I worked with that suite.

On 10/04/2012 04:21 PM, Ronan Keryell wrote:
On Mon, 1 Oct 2012 19:40:58 -0700, Nick Lewycky <[hidden email]> said:
     > On 1 October 2012 18:34, reed kotler <[hidden email]> wrote:
     >     This code is essentially from an LTP test (
     > http://ltp.sourceforge.net/ ).

     >     #include <stdlib.h>
     >     int main() {   void  *curr;
     >       do {     curr = malloc(1);   } while (curr);
     >       return 0;
     >     }

     >     If you compile it with no optimization, it will keep the
     > malloc calls.

     >     If you compile it with -O2, it will create an infinite
     > loop, i.e.  assuming that malloc always returns a non zero
     > result and the result is not used.

     Nick> As far as I know, this optimization is legal.

Well, indeed the result is used by the while and we have a control
dependence on the future execution. The termination or not of the
program is a side effect from the user perspective.

    [...]

     Nick> Oh how people don't appreciate the luxury of having an
     Nick> infinite memory machine!

You have also a potentially infinite loop, which is another kind of
luxury. More boring, especially on the end. :-)

Even if you have a lazy allocation as with the default libC on Linux
that allocates the memory pages only on demand, the memory will blow
up just because the malloc() library itself has to track the allocations
for a potential future free(). Of course, if you have an infinite
memory, you can avoid implementing free() or your interprocedural
analysis may prove you do not call free() in your program. :-)

This makes me feel uncomfortable to have this kind of optimizations with
too many hypothesis on the run-time, even if it addresses domains not
well captured by the norm...

How to know about the real target? It may be another allocation library,
some LD_PRELOAD hack to insert memory verification code or at the end
the code is to be run inside a tool like valgrind to track this very
malloc() bug that would be hidden by this optimization... :-(

Just to give an example from another compiler, in the PIPS
source-to-source compiler the side effect of malloc() is taken into
account by registering a write effect on an abstract location which is
tested later by some optimization phases to avoid too much liberal
optimizations, like the one above.

Anyway, this is an interesting limit case to discuss tonight at the LLVM
Bay-area Social. :-)

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: interesting possible compiler bug

Reed Kotler
I'll be at the social tonight wearing my black Institute of Martial Arts t-shirt.

On 10/04/2012 05:11 PM, Chandler Carruth wrote:
For what it's worth, I will be at the social tonight, and am happy to discuss this paper. I'm a co-author and the instigator the work here to clarify the spec and allow a large number of exciting optimizations on code that have previously not be employed merely because of the use of malloc rather than (for example) alloca.


On Thu, Oct 4, 2012 at 4:40 PM, reed kotler <[hidden email]> wrote:
They have been discussing this for a solid two days on the C committee list with many pro and con opinions, all supported by various parts of the standard or other sources.

My personal opinion is that it's only a bug if some real program can be made to fail because of this.

If I were writing this test case for LPT, I would call an external function and pass the value curr to it.

If I was really paranoid about compiling it with an -O4 compiler, I would write an assembly language program.

Back in the day, in the ACVS (Ada compiler validation suite), they used to call functions that were noops but that had some complex recursion that no compiler could unwind and realise that it was a noop. So that is probably the safest. I can't remember anymore the exact form of these recursive functions in the suite as it was 28 years ago that I worked with that suite.

On 10/04/2012 04:21 PM, Ronan Keryell wrote:
On Mon, 1 Oct 2012 19:40:58 -0700, Nick Lewycky <[hidden email]> said:
     > On 1 October 2012 18:34, reed kotler <[hidden email]> wrote:
     >     This code is essentially from an LTP test (
     > http://ltp.sourceforge.net/ ).

     >     #include <stdlib.h>
     >     int main() {   void  *curr;
     >       do {     curr = malloc(1);   } while (curr);
     >       return 0;
     >     }

     >     If you compile it with no optimization, it will keep the
     > malloc calls.

     >     If you compile it with -O2, it will create an infinite
     > loop, i.e.  assuming that malloc always returns a non zero
     > result and the result is not used.

     Nick> As far as I know, this optimization is legal.

Well, indeed the result is used by the while and we have a control
dependence on the future execution. The termination or not of the
program is a side effect from the user perspective.

    [...]

     Nick> Oh how people don't appreciate the luxury of having an
     Nick> infinite memory machine!

You have also a potentially infinite loop, which is another kind of
luxury. More boring, especially on the end. :-)

Even if you have a lazy allocation as with the default libC on Linux
that allocates the memory pages only on demand, the memory will blow
up just because the malloc() library itself has to track the allocations
for a potential future free(). Of course, if you have an infinite
memory, you can avoid implementing free() or your interprocedural
analysis may prove you do not call free() in your program. :-)

This makes me feel uncomfortable to have this kind of optimizations with
too many hypothesis on the run-time, even if it addresses domains not
well captured by the norm...

How to know about the real target? It may be another allocation library,
some LD_PRELOAD hack to insert memory verification code or at the end
the code is to be run inside a tool like valgrind to track this very
malloc() bug that would be hidden by this optimization... :-(

Just to give an example from another compiler, in the PIPS
source-to-source compiler the side effect of malloc() is taken into
account by registering a write effect on an abstract location which is
tested later by some optimization phases to avoid too much liberal
optimizations, like the one above.

Anyway, this is an interesting limit case to discuss tonight at the LLVM
Bay-area Social. :-)

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: interesting possible compiler bug

Krzysztof Parzyszek
In reply to this post by Nick Lewycky-2
On 10/1/2012 9:40 PM, Nick Lewycky wrote:
>
> As far as I know, this optimization is legal.

It's not legal.
1. malloc is not known to always return a non-null pointer.
2. malloc has side-effects and hence cannot be eliminated.

-K

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: interesting possible compiler bug

Krzysztof Parzyszek
In reply to this post by Nick Lewycky
On 10/2/2012 4:12 AM, Nick Lewycky wrote:
>
> This is a straight-forward application of the as-if rule. The malloc
> call behaves as-if it allocated memory. Because we prove that the code
> doesn't use that memory, we can get away with allocating no memory at
> all and not change the behaviour of the program.

Are you saying that we do this kind of an analysis at -O2?  This could
only work if the entire program was contained in a single source file,
and even then it would require interprocedural analysis to get it right.

-K

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: interesting possible compiler bug

Eli Friedman-2
In reply to this post by Krzysztof Parzyszek
On Mon, Oct 15, 2012 at 2:28 PM, Krzysztof Parzyszek
<[hidden email]> wrote:
> On 10/1/2012 9:40 PM, Nick Lewycky wrote:
>>
>>
>> As far as I know, this optimization is legal.
>
>
> It's not legal.
> 1. malloc is not known to always return a non-null pointer.

The system malloc isn't, but the compiler can change any given call to
malloc to call an implementation which is known to return a non-null
pointer.

-Eli
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: interesting possible compiler bug

Krzysztof Parzyszek
On 10/15/2012 4:39 PM, Eli Friedman wrote:
>
> The system malloc isn't, but the compiler can change any given call to
> malloc to call an implementation which is known to return a non-null
> pointer.

Do we do that in LLVM?  That would be surprising...  Optimizing calls to
malloc (like memory pooling for example) is not a trivial thing to do,
and it requires a fairly strong interprocedural analysis.  The fact that
this seems to happen with LLVM at -O2 looks more like a bug than a
clever optimization.

-K

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: interesting possible compiler bug

David Blaikie
On Mon, Oct 15, 2012 at 2:43 PM, Krzysztof Parzyszek
<[hidden email]> wrote:

> On 10/15/2012 4:39 PM, Eli Friedman wrote:
>>
>>
>> The system malloc isn't, but the compiler can change any given call to
>> malloc to call an implementation which is known to return a non-null
>> pointer.
>
>
> Do we do that in LLVM?  That would be surprising...  Optimizing calls to
> malloc (like memory pooling for example) is not a trivial thing to do, and
> it requires a fairly strong interprocedural analysis.

Why would it require that? If you can see that the malloc doesn't
escape you can, in such simple cases, simply move the data from malloc
memory into an alloca instead.

> The fact that this
> seems to happen with LLVM at -O2 looks more like a bug than a clever
> optimization.
>
>
> -K
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
> The Linux Foundation
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: interesting possible compiler bug

Krzysztof Parzyszek
On 10/15/2012 4:49 PM, David Blaikie wrote:
> On Mon, Oct 15, 2012 at 2:43 PM, Krzysztof Parzyszek
>>
>> Do we do that in LLVM?  That would be surprising...  Optimizing calls to
>> malloc (like memory pooling for example) is not a trivial thing to do, and
>> it requires a fairly strong interprocedural analysis.
>
> Why would it require that? If you can see that the malloc doesn't
> escape you can, in such simple cases, simply move the data from malloc
> memory into an alloca instead.

You're right about moving data from heap to stack, but I think it would
be somewhat limited in its scope.  The compiler should be careful doing
such things anyways, since the user can set stack and heap memory limits
independently, and have expectations as to where the memory will be
allocated at run time.

Another thing is that if malloc is called twice, and neither call
returns null, then the two pointers returned are guaranteed to be
different.  If you replace the call to malloc with alloca, the alloca
(i.e. stack allocation) still needs to run in a loop.  Of course, this
doesn't have to happen in this particular example, but an optimization
targeting this case is quite pointless.

-K

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: interesting possible compiler bug

Eli Friedman-2
On Mon, Oct 15, 2012 at 3:12 PM, Krzysztof Parzyszek
<[hidden email]> wrote:

> On 10/15/2012 4:49 PM, David Blaikie wrote:
>>
>> On Mon, Oct 15, 2012 at 2:43 PM, Krzysztof Parzyszek
>>>
>>>
>>> Do we do that in LLVM?  That would be surprising...  Optimizing calls to
>>> malloc (like memory pooling for example) is not a trivial thing to do,
>>> and
>>> it requires a fairly strong interprocedural analysis.
>>
>>
>> Why would it require that? If you can see that the malloc doesn't
>> escape you can, in such simple cases, simply move the data from malloc
>> memory into an alloca instead.
>
>
> You're right about moving data from heap to stack, but I think it would be
> somewhat limited in its scope.  The compiler should be careful doing such
> things anyways, since the user can set stack and heap memory limits
> independently, and have expectations as to where the memory will be
> allocated at run time.
>
> Another thing is that if malloc is called twice, and neither call returns
> null, then the two pointers returned are guaranteed to be different.  If you
> replace the call to malloc with alloca, the alloca (i.e. stack allocation)
> still needs to run in a loop.  Of course, this doesn't have to happen in
> this particular example, but an optimization targeting this case is quite
> pointless.

The optimization which triggers here is essentially dead malloc
elimination; we can end up with IR which looks a lot like the original
example in non-trivial cases because sometimes we can eliminate all
accesses to a block of malloc'ed memory (via inlining/GVN/etc.).

-Eli
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: interesting possible compiler bug

Chris Lattner-2
In reply to this post by Krzysztof Parzyszek
On Oct 15, 2012, at 2:28 PM, Krzysztof Parzyszek <[hidden email]> wrote:
> On 10/1/2012 9:40 PM, Nick Lewycky wrote:
>>
>> As far as I know, this optimization is legal.
>
> It's not legal.
> 1. malloc is not known to always return a non-null pointer.
> 2. malloc has side-effects and hence cannot be eliminated.

You've made a number of claims on this thread, but I'm not sure what you're basing these claims on - certainly not the C standard.

If you don't want the compiler to touch well known functions like malloc, build with -fno-builtin and you'll get the behavior you want.

-Chris
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: interesting possible compiler bug

Krzysztof Parzyszek
On 10/15/2012 7:20 PM, Chris Lattner wrote:
>
> You've made a number of claims on this thread, but I'm not sure what you're basing these claims on - certainly not the C standard.
>
> If you don't want the compiler to touch well known functions like malloc, build with -fno-builtin and you'll get the behavior you want.

The only claim you can object to is about side-effects.  It was more of
a general remark, but yes, the compiler can delete calls where the
result is not used.  This is indeed a case of a "dead malloc", which Eli
pointed out.

Is there anything else you disagree with?

-K

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
12