For alias analysis, It's gcc too aggressive or LLVM need to improve?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

For alias analysis, It's gcc too aggressive or LLVM need to improve?

Kevin Qin
Hi all,

Recently, I found gcc can generate faster codes by localizing global variable inside loop and only write back once before function return. Gcc can do this because its alias analysis think it's safe. But for below case, gcc gives different result from -O0 and -O2.

#include <stdio.h>
struct heap {int index; int b;};
struct heap **ptr;
int aa;

int main() {
  struct heap element;
  struct heap *array[2];
  array[0] = (struct heap *)&aa;
  array[1] = &element;
  ptr = array;
  aa = 1;
  int i;
  for (i =0; i< 2; i++) {
    printf("i is %d, aa is %d\n", i, aa);
    ptr[i]->index = 0;
  }
  return 0;
}

$gcc test.c -O0
$./a.out
i is 0, aa is 1
i is 1, aa is 0

$gcc test.c -O2
$./a.out
i is 0, aa is 1
i is 1, aa is 1

The version of gcc I tried this is 4.8.2 (Ubuntu 4.8.2-19ubuntu1).

On Clang side, it always give the expected result(as the result of gcc -O0) no matter with the optimization level. But it also lose the opportunity to localize variable aa and introduced extra load instruction inside loop because LLVM alias analysis think aa are not independent with the value ptr point to.

Then my question is,

1. Is this C program legal or not? Especially the type conversion between int pointer and struct pointer. But at least there's no warning or error posted during compiling time on both Clang and gcc side.

2. What we should do in LLVM side? LLVM gives correct result on this corner case no matter it's legal or not, but sacrifices performance on most generic cases. Did this need a improvement?




--
Best Regards,

Kevin Qin

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: For alias analysis, It's gcc too aggressive or LLVM need to improve?

Jonas Wagner
Hi Kevin,

your C program invokes undefined behavior when it dereferences pointers that have been converted to other types. See for example http://stackoverflow.com/questions/4810417/c-when-is-casting-between-pointer-types-not-undefined-behavior

Because of this, the program could do anything... in particular, you can expect the output of different compilers to be different.

Cheers,
Jonas


On Thu, Jul 31, 2014 at 8:38 AM, Kevin Qin <[hidden email]> wrote:
Hi all,

Recently, I found gcc can generate faster codes by localizing global variable inside loop and only write back once before function return. Gcc can do this because its alias analysis think it's safe. But for below case, gcc gives different result from -O0 and -O2.

#include <stdio.h>
struct heap {int index; int b;};
struct heap **ptr;
int aa;

int main() {
  struct heap element;
  struct heap *array[2];
  array[0] = (struct heap *)&aa;
  array[1] = &element;
  ptr = array;
  aa = 1;
  int i;
  for (i =0; i< 2; i++) {
    printf("i is %d, aa is %d\n", i, aa);
    ptr[i]->index = 0;
  }
  return 0;
}

$gcc test.c -O0
$./a.out
i is 0, aa is 1
i is 1, aa is 0

$gcc test.c -O2
$./a.out
i is 0, aa is 1
i is 1, aa is 1

The version of gcc I tried this is 4.8.2 (Ubuntu 4.8.2-19ubuntu1).

On Clang side, it always give the expected result(as the result of gcc -O0) no matter with the optimization level. But it also lose the opportunity to localize variable aa and introduced extra load instruction inside loop because LLVM alias analysis think aa are not independent with the value ptr point to.

Then my question is,

1. Is this C program legal or not? Especially the type conversion between int pointer and struct pointer. But at least there's no warning or error posted during compiling time on both Clang and gcc side.

2. What we should do in LLVM side? LLVM gives correct result on this corner case no matter it's legal or not, but sacrifices performance on most generic cases. Did this need a improvement?




--
Best Regards,

Kevin Qin

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: For alias analysis, It's gcc too aggressive or LLVM need to improve?

Tim Northover-2
> your C program invokes undefined behavior when it dereferences pointers that
> have been converted to other types. See for example
> http://stackoverflow.com/questions/4810417/c-when-is-casting-between-pointer-types-not-undefined-behavior

I don't think it's quite that simple.The type-based aliasing rules
come from 6.5p7 of C11, I think. That says:

"An object shall have its stored value accessed only by an lvalue
expression that has one of
the following types:
  + a type compatible with the effective type of the object,
  [...]
  + an aggregate or union type that includes one of the aforementioned
types among its members [...]"

That would seem to allow this usage: aa (effective type "int") is
being accessed via an lvalue "ptr[i]->index" of type "int".

The second point would even seem to allow something like "ptr[i] =
..." if aa was declared "int aa[2];", though that seems to be going
too far. It also seems to be very difficult to pin down a meaning
(from the standard) for "a->b" if a is not a pointer to an object with
the correct effective type. So the entire area is probably one that's
open to interpretation.

I've added cfe-dev to the list; they're the *professional* language lawyers.

Cheers.

Tim.
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [cfe-dev] For alias analysis, It's gcc too aggressive or LLVM need to improve?

Hal Finkel
----- Original Message -----

> From: "Tim Northover" <[hidden email]>
> To: "Jonas Wagner" <[hidden email]>
> Cc: "cfe-dev Developers" <[hidden email]>, "LLVM Developers Mailing List" <[hidden email]>
> Sent: Friday, August 8, 2014 6:54:50 AM
> Subject: Re: [cfe-dev] [LLVMdev] For alias analysis, It's gcc too aggressive or LLVM need to improve?
>
> > your C program invokes undefined behavior when it dereferences
> > pointers that
> > have been converted to other types. See for example
> > http://stackoverflow.com/questions/4810417/c-when-is-casting-between-pointer-types-not-undefined-behavior
>
> I don't think it's quite that simple.The type-based aliasing rules
> come from 6.5p7 of C11, I think. That says:
>
> "An object shall have its stored value accessed only by an lvalue
> expression that has one of
> the following types:
>   + a type compatible with the effective type of the object,
>   [...]
>   + an aggregate or union type that includes one of the
>   aforementioned
> types among its members [...]"
>
> That would seem to allow this usage: aa (effective type "int") is
> being accessed via an lvalue "ptr[i]->index" of type "int".
>
> The second point would even seem to allow something like "ptr[i] =
> ..." if aa was declared "int aa[2];", though that seems to be going
> too far. It also seems to be very difficult to pin down a meaning
> (from the standard) for "a->b" if a is not a pointer to an object
> with
> the correct effective type. So the entire area is probably one that's
> open to interpretation.
>
> I've added cfe-dev to the list; they're the *professional* language
> lawyers.

Coincidentally, this also seems to be PR20585 (adding Jiangning Liu, the reporter of that bug, to this thread too).

 -Hal

>
> Cheers.
>
> Tim.
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>

--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [cfe-dev] For alias analysis, It's gcc too aggressive or LLVM need to improve?

Reid Kleckner-2
+aliasing people

I *think* this is valid, because the rules have always been described to me in terms of underlying storage type, and not access path. These are all ints, so all loads and stores can alias.


On Sat, Aug 9, 2014 at 3:07 PM, Hal Finkel <[hidden email]> wrote:
----- Original Message -----
> From: "Tim Northover" <[hidden email]>
> To: "Jonas Wagner" <[hidden email]>
> Cc: "cfe-dev Developers" <[hidden email]>, "LLVM Developers Mailing List" <[hidden email]>
> Sent: Friday, August 8, 2014 6:54:50 AM
> Subject: Re: [cfe-dev] [LLVMdev] For alias analysis, It's gcc too aggressive or LLVM need to improve?
>
> > your C program invokes undefined behavior when it dereferences
> > pointers that
> > have been converted to other types. See for example
> > http://stackoverflow.com/questions/4810417/c-when-is-casting-between-pointer-types-not-undefined-behavior
>
> I don't think it's quite that simple.The type-based aliasing rules
> come from 6.5p7 of C11, I think. That says:
>
> "An object shall have its stored value accessed only by an lvalue
> expression that has one of
> the following types:
>   + a type compatible with the effective type of the object,
>   [...]
>   + an aggregate or union type that includes one of the
>   aforementioned
> types among its members [...]"
>
> That would seem to allow this usage: aa (effective type "int") is
> being accessed via an lvalue "ptr[i]->index" of type "int".
>
> The second point would even seem to allow something like "ptr[i] =
> ..." if aa was declared "int aa[2];", though that seems to be going
> too far. It also seems to be very difficult to pin down a meaning
> (from the standard) for "a->b" if a is not a pointer to an object
> with
> the correct effective type. So the entire area is probably one that's
> open to interpretation.
>
> I've added cfe-dev to the list; they're the *professional* language
> lawyers.

Coincidentally, this also seems to be PR20585 (adding Jiangning Liu, the reporter of that bug, to this thread too).

 -Hal

>
> Cheers.
>
> Tim.
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>

--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [cfe-dev] For alias analysis, It's gcc too aggressive or LLVM need to improve?

Daniel Berlin
The access path matters (in some sense), but this is, AFIAK, valid no
matter how you look at it.

Let's take a look line by line

#include <stdio.h>
struct heap {int index; int b;};
struct heap **ptr;
int aa;

int main() {
  struct heap element;
  struct heap *array[2];
  array[0] = (struct heap *)&aa; <- Okay so far.
  array[1] = &element; <- Clearly okay
  ptr = array; <- still okay so far
  aa = 1; <- not pointer related.
  int i; <- not pointer related
  for (i =0; i< 2; i++) { <- not pointer related
    printf("i is %d, aa is %d\n", i, aa); <- not pointer related
    ptr[i]->index = 0; <- Here is where it gets wonky.

<rest of codeis irrelevan>

First, ptr[i] is an lvalue, of type struct heap *, and ptr[i]-> is an
lvalue of type struct heap (in C++03, this is 5.2.5 paragraph 3, check
footnote 59).
(I'm too lazy to parse the rules for whether E1.E2 is an lvalue,
because it doesn't end up making a difference)

Let's assume, for the sake of argument,  the actual access to aa
occurs through an lvalue of type "struct heap" rather than "int"
In C++03 and C++11, it says:

An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:

a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the
effective type of the object,
a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or
a character type.
(C++11 adds something about dynamic type here)

struct heap is "an aggregate or union type that includes one of the
aforementioned types among it's members".

Thus, this is legal to access this int through an lvalue expression
that has a type of struct heap.
Whether the actual store is legal for other reasons, i don't know.
There are all kinds of rules about object alignment and value
representation that aren't my baliwick.  I leave it to another
language lawyer to say whether it's okay to do a store to something
that is essentially, a partial object.

Note that GCC actually knows this is legal to alias, at least at the
tree level. I debugged it there, and it definitely isn't eliminating
it at a high level. It also completely understands the call to ->index
= 0 affects "aa", and has a reload for aa before the printf call.

I don't know what is eliminating this at the RTL level, but i can't
see why it's illegal from *aliasing rules*. Maybe this is invalid for
some other reason.







  }
  return 0;
}

ptr[i]->index = 0;

On Mon, Aug 11, 2014 at 1:13 PM, Reid Kleckner <[hidden email]> wrote:

> +aliasing people
>
> I *think* this is valid, because the rules have always been described to me
> in terms of underlying storage type, and not access path. These are all
> ints, so all loads and stores can alias.
>
>
> On Sat, Aug 9, 2014 at 3:07 PM, Hal Finkel <[hidden email]> wrote:
>>
>> ----- Original Message -----
>> > From: "Tim Northover" <[hidden email]>
>> > To: "Jonas Wagner" <[hidden email]>
>> > Cc: "cfe-dev Developers" <[hidden email]>, "LLVM Developers Mailing
>> > List" <[hidden email]>
>> > Sent: Friday, August 8, 2014 6:54:50 AM
>> > Subject: Re: [cfe-dev] [LLVMdev] For alias analysis, It's gcc too
>> > aggressive or LLVM need to improve?
>> >
>> > > your C program invokes undefined behavior when it dereferences
>> > > pointers that
>> > > have been converted to other types. See for example
>> > >
>> > > http://stackoverflow.com/questions/4810417/c-when-is-casting-between-pointer-types-not-undefined-behavior
>> >
>> > I don't think it's quite that simple.The type-based aliasing rules
>> > come from 6.5p7 of C11, I think. That says:
>> >
>> > "An object shall have its stored value accessed only by an lvalue
>> > expression that has one of
>> > the following types:
>> >   + a type compatible with the effective type of the object,
>> >   [...]
>> >   + an aggregate or union type that includes one of the
>> >   aforementioned
>> > types among its members [...]"
>> >
>> > That would seem to allow this usage: aa (effective type "int") is
>> > being accessed via an lvalue "ptr[i]->index" of type "int".
>> >
>> > The second point would even seem to allow something like "ptr[i] =
>> > ..." if aa was declared "int aa[2];", though that seems to be going
>> > too far. It also seems to be very difficult to pin down a meaning
>> > (from the standard) for "a->b" if a is not a pointer to an object
>> > with
>> > the correct effective type. So the entire area is probably one that's
>> > open to interpretation.
>> >
>> > I've added cfe-dev to the list; they're the *professional* language
>> > lawyers.
>>
>> Coincidentally, this also seems to be PR20585 (adding Jiangning Liu, the
>> reporter of that bug, to this thread too).
>>
>>  -Hal
>>
>> >
>> > Cheers.
>> >
>> > Tim.
>> > _______________________________________________
>> > cfe-dev mailing list
>> > [hidden email]
>> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>> >
>>
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [cfe-dev] For alias analysis, It's gcc too aggressive or LLVM need to improve?

Richard Smith-33
I'll take this from the C++ angle; the C rules are not the same, and I'm not confident they give the same answer.

On Mon, Aug 11, 2014 at 2:09 PM, Daniel Berlin <[hidden email]> wrote:
The access path matters (in some sense), but this is, AFIAK, valid no
matter how you look at it.

Let's take a look line by line

#include <stdio.h>
struct heap {int index; int b;};
struct heap **ptr;
int aa;

int main() {
  struct heap element;
  struct heap *array[2];
  array[0] = (struct heap *)&aa; <- Okay so far.
  array[1] = &element; <- Clearly okay
  ptr = array; <- still okay so far
  aa = 1; <- not pointer related.
  int i; <- not pointer related
  for (i =0; i< 2; i++) { <- not pointer related
    printf("i is %d, aa is %d\n", i, aa); <- not pointer related
    ptr[i]->index = 0; <- Here is where it gets wonky.

<rest of codeis irrelevan>

First, ptr[i] is an lvalue, of type struct heap *, and ptr[i]-> is an
lvalue of type struct heap (in C++03, this is 5.2.5 paragraph 3, check
footnote 59).

This is where we get the undefined behavior.

3.8/6: "[If you have a glvalue referring to storage but where there is no corresponding object, the] program has undefined behavior if:
[...] the glvalue is used to access a non-static data member".

There is no object of type 'heap' denoted by *ptr[0] (by 1.8/1, we can only create objects through definitions, new-expressions, and by creating temporary objects). So the behavior is undefined when we evaluate ptr[0]->index.

(I'm too lazy to parse the rules for whether E1.E2 is an lvalue,
because it doesn't end up making a difference)

Let's assume, for the sake of argument,  the actual access to aa
occurs through an lvalue of type "struct heap" rather than "int"
In C++03 and C++11, it says:

An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:

a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the
effective type of the object,
a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned
types among its members (including, recursively, a member of a
subaggregate or contained union), or
a character type.
(C++11 adds something about dynamic type here)

struct heap is "an aggregate or union type that includes one of the
aforementioned types among it's members".

Thus, this is legal to access this int through an lvalue expression
that has a type of struct heap.
Whether the actual store is legal for other reasons, i don't know.
There are all kinds of rules about object alignment and value
representation that aren't my baliwick.  I leave it to another
language lawyer to say whether it's okay to do a store to something
that is essentially, a partial object.

Note that GCC actually knows this is legal to alias, at least at the
tree level. I debugged it there, and it definitely isn't eliminating
it at a high level. It also completely understands the call to ->index
= 0 affects "aa", and has a reload for aa before the printf call.

I don't know what is eliminating this at the RTL level, but i can't
see why it's illegal from *aliasing rules*. Maybe this is invalid for
some other reason.







  }
  return 0;
}

ptr[i]->index = 0;

On Mon, Aug 11, 2014 at 1:13 PM, Reid Kleckner <[hidden email]> wrote:
> +aliasing people
>
> I *think* this is valid, because the rules have always been described to me
> in terms of underlying storage type, and not access path. These are all
> ints, so all loads and stores can alias.
>
>
> On Sat, Aug 9, 2014 at 3:07 PM, Hal Finkel <[hidden email]> wrote:
>>
>> ----- Original Message -----
>> > From: "Tim Northover" <[hidden email]>
>> > To: "Jonas Wagner" <[hidden email]>
>> > Cc: "cfe-dev Developers" <[hidden email]>, "LLVM Developers Mailing
>> > List" <[hidden email]>
>> > Sent: Friday, August 8, 2014 6:54:50 AM
>> > Subject: Re: [cfe-dev] [LLVMdev] For alias analysis, It's gcc too
>> > aggressive or LLVM need to improve?
>> >
>> > > your C program invokes undefined behavior when it dereferences
>> > > pointers that
>> > > have been converted to other types. See for example
>> > >
>> > > http://stackoverflow.com/questions/4810417/c-when-is-casting-between-pointer-types-not-undefined-behavior
>> >
>> > I don't think it's quite that simple.The type-based aliasing rules
>> > come from 6.5p7 of C11, I think. That says:
>> >
>> > "An object shall have its stored value accessed only by an lvalue
>> > expression that has one of
>> > the following types:
>> >   + a type compatible with the effective type of the object,
>> >   [...]
>> >   + an aggregate or union type that includes one of the
>> >   aforementioned
>> > types among its members [...]"
>> >
>> > That would seem to allow this usage: aa (effective type "int") is
>> > being accessed via an lvalue "ptr[i]->index" of type "int".
>> >
>> > The second point would even seem to allow something like "ptr[i] =
>> > ..." if aa was declared "int aa[2];", though that seems to be going
>> > too far. It also seems to be very difficult to pin down a meaning
>> > (from the standard) for "a->b" if a is not a pointer to an object
>> > with
>> > the correct effective type. So the entire area is probably one that's
>> > open to interpretation.
>> >
>> > I've added cfe-dev to the list; they're the *professional* language
>> > lawyers.
>>
>> Coincidentally, this also seems to be PR20585 (adding Jiangning Liu, the
>> reporter of that bug, to this thread too).
>>
>>  -Hal
>>
>> >
>> > Cheers.
>> >
>> > Tim.
>> > _______________________________________________
>> > cfe-dev mailing list
>> > [hidden email]
>> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>> >
>>
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [cfe-dev] For alias analysis, It's gcc too aggressive or LLVM need to improve?

Daniel Berlin
So then there you go, a real language lawyer says it's invalid for
other reasons :)


On Mon, Aug 11, 2014 at 7:31 PM, Richard Smith <[hidden email]> wrote:

> I'll take this from the C++ angle; the C rules are not the same, and I'm not
> confident they give the same answer.
>
> On Mon, Aug 11, 2014 at 2:09 PM, Daniel Berlin <[hidden email]> wrote:
>>
>> The access path matters (in some sense), but this is, AFIAK, valid no
>> matter how you look at it.
>>
>> Let's take a look line by line
>>
>> #include <stdio.h>
>> struct heap {int index; int b;};
>> struct heap **ptr;
>> int aa;
>>
>> int main() {
>>   struct heap element;
>>   struct heap *array[2];
>>   array[0] = (struct heap *)&aa; <- Okay so far.
>>   array[1] = &element; <- Clearly okay
>>   ptr = array; <- still okay so far
>>   aa = 1; <- not pointer related.
>>   int i; <- not pointer related
>>   for (i =0; i< 2; i++) { <- not pointer related
>>     printf("i is %d, aa is %d\n", i, aa); <- not pointer related
>>     ptr[i]->index = 0; <- Here is where it gets wonky.
>>
>> <rest of codeis irrelevan>
>>
>> First, ptr[i] is an lvalue, of type struct heap *, and ptr[i]-> is an
>> lvalue of type struct heap (in C++03, this is 5.2.5 paragraph 3, check
>> footnote 59).
>
>
> This is where we get the undefined behavior.
>
> 3.8/6: "[If you have a glvalue referring to storage but where there is no
> corresponding object, the] program has undefined behavior if:
> [...] the glvalue is used to access a non-static data member".
>
> There is no object of type 'heap' denoted by *ptr[0] (by 1.8/1, we can only
> create objects through definitions, new-expressions, and by creating
> temporary objects). So the behavior is undefined when we evaluate
> ptr[0]->index.
>
>> (I'm too lazy to parse the rules for whether E1.E2 is an lvalue,
>> because it doesn't end up making a difference)
>>
>> Let's assume, for the sake of argument,  the actual access to aa
>> occurs through an lvalue of type "struct heap" rather than "int"
>> In C++03 and C++11, it says:
>>
>> An object shall have its stored value accessed only by an lvalue
>> expression that has one of the following types:
>>
>> a type compatible with the effective type of the object,
>> a qualified version of a type compatible with the effective type of the
>> object,
>> a type that is the signed or unsigned type corresponding to the
>> effective type of the object,
>> a type that is the signed or unsigned type corresponding to a
>> qualified version of the effective type of the object,
>> an aggregate or union type that includes one of the aforementioned
>> types among its members (including, recursively, a member of a
>> subaggregate or contained union), or
>> a character type.
>> (C++11 adds something about dynamic type here)
>>
>> struct heap is "an aggregate or union type that includes one of the
>> aforementioned types among it's members".
>>
>> Thus, this is legal to access this int through an lvalue expression
>> that has a type of struct heap.
>> Whether the actual store is legal for other reasons, i don't know.
>> There are all kinds of rules about object alignment and value
>> representation that aren't my baliwick.  I leave it to another
>> language lawyer to say whether it's okay to do a store to something
>> that is essentially, a partial object.
>>
>> Note that GCC actually knows this is legal to alias, at least at the
>> tree level. I debugged it there, and it definitely isn't eliminating
>> it at a high level. It also completely understands the call to ->index
>> = 0 affects "aa", and has a reload for aa before the printf call.
>>
>> I don't know what is eliminating this at the RTL level, but i can't
>> see why it's illegal from *aliasing rules*. Maybe this is invalid for
>> some other reason.
>>
>>
>>
>>
>>
>>
>>
>>   }
>>   return 0;
>> }
>>
>> ptr[i]->index = 0;
>>
>> On Mon, Aug 11, 2014 at 1:13 PM, Reid Kleckner <[hidden email]> wrote:
>> > +aliasing people
>> >
>> > I *think* this is valid, because the rules have always been described to
>> > me
>> > in terms of underlying storage type, and not access path. These are all
>> > ints, so all loads and stores can alias.
>> >
>> >
>> > On Sat, Aug 9, 2014 at 3:07 PM, Hal Finkel <[hidden email]> wrote:
>> >>
>> >> ----- Original Message -----
>> >> > From: "Tim Northover" <[hidden email]>
>> >> > To: "Jonas Wagner" <[hidden email]>
>> >> > Cc: "cfe-dev Developers" <[hidden email]>, "LLVM Developers
>> >> > Mailing
>> >> > List" <[hidden email]>
>> >> > Sent: Friday, August 8, 2014 6:54:50 AM
>> >> > Subject: Re: [cfe-dev] [LLVMdev] For alias analysis, It's gcc too
>> >> > aggressive or LLVM need to improve?
>> >> >
>> >> > > your C program invokes undefined behavior when it dereferences
>> >> > > pointers that
>> >> > > have been converted to other types. See for example
>> >> > >
>> >> > >
>> >> > > http://stackoverflow.com/questions/4810417/c-when-is-casting-between-pointer-types-not-undefined-behavior
>> >> >
>> >> > I don't think it's quite that simple.The type-based aliasing rules
>> >> > come from 6.5p7 of C11, I think. That says:
>> >> >
>> >> > "An object shall have its stored value accessed only by an lvalue
>> >> > expression that has one of
>> >> > the following types:
>> >> >   + a type compatible with the effective type of the object,
>> >> >   [...]
>> >> >   + an aggregate or union type that includes one of the
>> >> >   aforementioned
>> >> > types among its members [...]"
>> >> >
>> >> > That would seem to allow this usage: aa (effective type "int") is
>> >> > being accessed via an lvalue "ptr[i]->index" of type "int".
>> >> >
>> >> > The second point would even seem to allow something like "ptr[i] =
>> >> > ..." if aa was declared "int aa[2];", though that seems to be going
>> >> > too far. It also seems to be very difficult to pin down a meaning
>> >> > (from the standard) for "a->b" if a is not a pointer to an object
>> >> > with
>> >> > the correct effective type. So the entire area is probably one that's
>> >> > open to interpretation.
>> >> >
>> >> > I've added cfe-dev to the list; they're the *professional* language
>> >> > lawyers.
>> >>
>> >> Coincidentally, this also seems to be PR20585 (adding Jiangning Liu,
>> >> the
>> >> reporter of that bug, to this thread too).
>> >>
>> >>  -Hal
>> >>
>> >> >
>> >> > Cheers.
>> >> >
>> >> > Tim.
>> >> > _______________________________________________
>> >> > cfe-dev mailing list
>> >> > [hidden email]
>> >> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>> >> >
>> >>
>> >> --
>> >> Hal Finkel
>> >> Assistant Computational Scientist
>> >> Leadership Computing Facility
>> >> Argonne National Laboratory
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> [hidden email]         http://llvm.cs.uiuc.edu
>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >
>> >
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > [hidden email]         http://llvm.cs.uiuc.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >
>
>
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [cfe-dev] For alias analysis, It's gcc too aggressive or LLVM need to improve?

Kevin Qin
Hi all,

Thanks for you paying time to look at this issue. I'm not an expert for C/C++ language, so I can just post some experiment results from LLVM and GCC. 

If we make minor changes to the test, gcc may give different results.

#include <stdio.h>
struct heap {int index; int b;};
struct heap **ptr;
int aa;

int main() {
  struct heap element;
  struct heap *array[2];
  array[0] = (struct heap *)&aa;
  array[1] = &element;
  ptr = array;
  aa = 1;
  int i;
  for (i =0; i< 2; i++) {
    printf("i is %d, aa is %d\n", i, aa);
    array[i]->index = 0;  // we replace ptr to array here. so no global lvalue is used.
  }
  return 0;
}

Result didn't get changed,

$gcc test.c -O0
$./a.out
i is 0, aa is 1
i is 1, aa is 0

$gcc test.c -O2
$./a.out
i is 0, aa is 1
i is 1, aa is 1

But if we change a bit more, like

#include <stdio.h>
struct heap {int index; int b;};
struct heap **ptr;
int aa;

int main() {
  struct heap element;
  struct heap *array[2];
  array[0] = (struct heap *)&aa;
  array[1] = &element;
  //ptr = array; // remove this assignment as well.
  aa = 1;
  int i;
  for (i =0; i< 2; i++) {
    printf("i is %d, aa is %d\n", i, aa);
    array[i]->index = 0;  // we replace ptr to array here. so no global lvalue is used.
  }
  return 0;
}

Result changed to be the same as LLVM.

$gcc test.c -O0
$./a.out
i is 0, aa is 1
i is 1, aa is 0

$gcc test.c -O2
$./a.out
i is 0, aa is 1
i is 1, aa is 0

I don't know why a assignement statment to a unrelated global pointer will affect gcc's aliasing work, and I don't know from the language point of view, if we use a local pointer to replace the global pointer, then the result would be still undefined or not.

Regards,
Kevin


2014-08-12 12:02 GMT+08:00 Daniel Berlin <[hidden email]>:
So then there you go, a real language lawyer says it's invalid for
other reasons :)


On Mon, Aug 11, 2014 at 7:31 PM, Richard Smith <[hidden email]> wrote:
> I'll take this from the C++ angle; the C rules are not the same, and I'm not
> confident they give the same answer.
>
> On Mon, Aug 11, 2014 at 2:09 PM, Daniel Berlin <[hidden email]> wrote:
>>
>> The access path matters (in some sense), but this is, AFIAK, valid no
>> matter how you look at it.
>>
>> Let's take a look line by line
>>
>> #include <stdio.h>
>> struct heap {int index; int b;};
>> struct heap **ptr;
>> int aa;
>>
>> int main() {
>>   struct heap element;
>>   struct heap *array[2];
>>   array[0] = (struct heap *)&aa; <- Okay so far.
>>   array[1] = &element; <- Clearly okay
>>   ptr = array; <- still okay so far
>>   aa = 1; <- not pointer related.
>>   int i; <- not pointer related
>>   for (i =0; i< 2; i++) { <- not pointer related
>>     printf("i is %d, aa is %d\n", i, aa); <- not pointer related
>>     ptr[i]->index = 0; <- Here is where it gets wonky.
>>
>> <rest of codeis irrelevan>
>>
>> First, ptr[i] is an lvalue, of type struct heap *, and ptr[i]-> is an
>> lvalue of type struct heap (in C++03, this is 5.2.5 paragraph 3, check
>> footnote 59).
>
>
> This is where we get the undefined behavior.
>
> 3.8/6: "[If you have a glvalue referring to storage but where there is no
> corresponding object, the] program has undefined behavior if:
> [...] the glvalue is used to access a non-static data member".
>
> There is no object of type 'heap' denoted by *ptr[0] (by 1.8/1, we can only
> create objects through definitions, new-expressions, and by creating
> temporary objects). So the behavior is undefined when we evaluate
> ptr[0]->index.
>
>> (I'm too lazy to parse the rules for whether E1.E2 is an lvalue,
>> because it doesn't end up making a difference)
>>
>> Let's assume, for the sake of argument,  the actual access to aa
>> occurs through an lvalue of type "struct heap" rather than "int"
>> In C++03 and C++11, it says:
>>
>> An object shall have its stored value accessed only by an lvalue
>> expression that has one of the following types:
>>
>> a type compatible with the effective type of the object,
>> a qualified version of a type compatible with the effective type of the
>> object,
>> a type that is the signed or unsigned type corresponding to the
>> effective type of the object,
>> a type that is the signed or unsigned type corresponding to a
>> qualified version of the effective type of the object,
>> an aggregate or union type that includes one of the aforementioned
>> types among its members (including, recursively, a member of a
>> subaggregate or contained union), or
>> a character type.
>> (C++11 adds something about dynamic type here)
>>
>> struct heap is "an aggregate or union type that includes one of the
>> aforementioned types among it's members".
>>
>> Thus, this is legal to access this int through an lvalue expression
>> that has a type of struct heap.
>> Whether the actual store is legal for other reasons, i don't know.
>> There are all kinds of rules about object alignment and value
>> representation that aren't my baliwick.  I leave it to another
>> language lawyer to say whether it's okay to do a store to something
>> that is essentially, a partial object.
>>
>> Note that GCC actually knows this is legal to alias, at least at the
>> tree level. I debugged it there, and it definitely isn't eliminating
>> it at a high level. It also completely understands the call to ->index
>> = 0 affects "aa", and has a reload for aa before the printf call.
>>
>> I don't know what is eliminating this at the RTL level, but i can't
>> see why it's illegal from *aliasing rules*. Maybe this is invalid for
>> some other reason.
>>
>>
>>
>>
>>
>>
>>
>>   }
>>   return 0;
>> }
>>
>> ptr[i]->index = 0;
>>
>> On Mon, Aug 11, 2014 at 1:13 PM, Reid Kleckner <[hidden email]> wrote:
>> > +aliasing people
>> >
>> > I *think* this is valid, because the rules have always been described to
>> > me
>> > in terms of underlying storage type, and not access path. These are all
>> > ints, so all loads and stores can alias.
>> >
>> >
>> > On Sat, Aug 9, 2014 at 3:07 PM, Hal Finkel <[hidden email]> wrote:
>> >>
>> >> ----- Original Message -----
>> >> > From: "Tim Northover" <[hidden email]>
>> >> > To: "Jonas Wagner" <[hidden email]>
>> >> > Cc: "cfe-dev Developers" <[hidden email]>, "LLVM Developers
>> >> > Mailing
>> >> > List" <[hidden email]>
>> >> > Sent: Friday, August 8, 2014 6:54:50 AM
>> >> > Subject: Re: [cfe-dev] [LLVMdev] For alias analysis, It's gcc too
>> >> > aggressive or LLVM need to improve?
>> >> >
>> >> > > your C program invokes undefined behavior when it dereferences
>> >> > > pointers that
>> >> > > have been converted to other types. See for example
>> >> > >
>> >> > >
>> >> > > http://stackoverflow.com/questions/4810417/c-when-is-casting-between-pointer-types-not-undefined-behavior
>> >> >
>> >> > I don't think it's quite that simple.The type-based aliasing rules
>> >> > come from 6.5p7 of C11, I think. That says:
>> >> >
>> >> > "An object shall have its stored value accessed only by an lvalue
>> >> > expression that has one of
>> >> > the following types:
>> >> >   + a type compatible with the effective type of the object,
>> >> >   [...]
>> >> >   + an aggregate or union type that includes one of the
>> >> >   aforementioned
>> >> > types among its members [...]"
>> >> >
>> >> > That would seem to allow this usage: aa (effective type "int") is
>> >> > being accessed via an lvalue "ptr[i]->index" of type "int".
>> >> >
>> >> > The second point would even seem to allow something like "ptr[i] =
>> >> > ..." if aa was declared "int aa[2];", though that seems to be going
>> >> > too far. It also seems to be very difficult to pin down a meaning
>> >> > (from the standard) for "a->b" if a is not a pointer to an object
>> >> > with
>> >> > the correct effective type. So the entire area is probably one that's
>> >> > open to interpretation.
>> >> >
>> >> > I've added cfe-dev to the list; they're the *professional* language
>> >> > lawyers.
>> >>
>> >> Coincidentally, this also seems to be PR20585 (adding Jiangning Liu,
>> >> the
>> >> reporter of that bug, to this thread too).
>> >>
>> >>  -Hal
>> >>
>> >> >
>> >> > Cheers.
>> >> >
>> >> > Tim.
>> >> > _______________________________________________
>> >> > cfe-dev mailing list
>> >> > [hidden email]
>> >> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>> >> >
>> >>
>> >> --
>> >> Hal Finkel
>> >> Assistant Computational Scientist
>> >> Leadership Computing Facility
>> >> Argonne National Laboratory
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> [hidden email]         http://llvm.cs.uiuc.edu
>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >
>> >
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > [hidden email]         http://llvm.cs.uiuc.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >
>
>
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev



--
Best Regards,

Kevin Qin

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [cfe-dev] For alias analysis, It's gcc too aggressive or LLVM need to improve?

Daniel Berlin-5



On Mon, Aug 11, 2014 at 11:44 PM, Kevin Qin <[hidden email]> wrote:
Hi all,

Thanks for you paying time to look at this issue. I'm not an expert for C/C++ language, so I can just post some experiment results from LLVM and GCC. 

If we make minor changes to the test, gcc may give different results.

#include <stdio.h>
struct heap {int index; int b;};
struct heap **ptr;
int aa;

int main() {
  struct heap element;
  struct heap *array[2];
  array[0] = (struct heap *)&aa;
  array[1] = &element;
  ptr = array;
  aa = 1;
  int i;
  for (i =0; i< 2; i++) {
    printf("i is %d, aa is %d\n", i, aa);
    array[i]->index = 0;  // we replace ptr to array here. so no global lvalue is used.
  }
  return 0;
}

Result didn't get changed,

$gcc test.c -O0
$./a.out
i is 0, aa is 1
i is 1, aa is 0

$gcc test.c -O2
$./a.out
i is 0, aa is 1
i is 1, aa is 1

But if we change a bit more, like

#include <stdio.h>
struct heap {int index; int b;};
struct heap **ptr;
int aa;

int main() {
  struct heap element;
  struct heap *array[2];
  array[0] = (struct heap *)&aa;
  array[1] = &element;
  //ptr = array; // remove this assignment as well.
  aa = 1;
  int i;
  for (i =0; i< 2; i++) {
    printf("i is %d, aa is %d\n", i, aa);
    array[i]->index = 0;  // we replace ptr to array here. so no global lvalue is used.
  }
  return 0;
}

Result changed to be the same as LLVM.

$gcc test.c -O0
$./a.out
i is 0, aa is 1
i is 1, aa is 0

$gcc test.c -O2
$./a.out
i is 0, aa is 1
i is 1, aa is 0

I don't know why a assignement statment to a unrelated global pointer will affect gcc's aliasing work,

Because it blocks the load elimination/copy propagation.  With that pointer assignment there, GCC sees it as a global aliasing the same memory location as the array, and that global escapes the function.  Because of that, it no longer believes it knows what happens to the memory once the printf call happens (since it's really a call to printf_chk, and because of the way glibc works, printf is not a readonly functiojn)



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [cfe-dev] For alias analysis, It's gcc too aggressive or LLVM need to improve?

Daniel Berlin-5



On Tue, Aug 12, 2014 at 8:55 AM, Daniel Berlin <[hidden email]> wrote:



On Mon, Aug 11, 2014 at 11:44 PM, Kevin Qin <[hidden email]> wrote:
Hi all,

Thanks for you paying time to look at this issue. I'm not an expert for C/C++ language, so I can just post some experiment results from LLVM and GCC. 

If we make minor changes to the test, gcc may give different results.

#include <stdio.h>
struct heap {int index; int b;};
struct heap **ptr;
int aa;

int main() {
  struct heap element;
  struct heap *array[2];
  array[0] = (struct heap *)&aa;
  array[1] = &element;
  ptr = array;
  aa = 1;
  int i;
  for (i =0; i< 2; i++) {
    printf("i is %d, aa is %d\n", i, aa);
    array[i]->index = 0;  // we replace ptr to array here. so no global lvalue is used.
  }
  return 0;
}

Result didn't get changed,

$gcc test.c -O0
$./a.out
i is 0, aa is 1
i is 1, aa is 0

$gcc test.c -O2
$./a.out
i is 0, aa is 1
i is 1, aa is 1

But if we change a bit more, like

#include <stdio.h>
struct heap {int index; int b;};
struct heap **ptr;
int aa;

int main() {
  struct heap element;
  struct heap *array[2];
  array[0] = (struct heap *)&aa;
  array[1] = &element;
  //ptr = array; // remove this assignment as well.
  aa = 1;
  int i;
  for (i =0; i< 2; i++) {
    printf("i is %d, aa is %d\n", i, aa);
    array[i]->index = 0;  // we replace ptr to array here. so no global lvalue is used.
  }
  return 0;
}

Result changed to be the same as LLVM.

$gcc test.c -O0
$./a.out
i is 0, aa is 1
i is 1, aa is 0

$gcc test.c -O2
$./a.out
i is 0, aa is 1
i is 1, aa is 0

I don't know why a assignement statment to a unrelated global pointer will affect gcc's aliasing work,

Because it blocks the load elimination/copy propagation.  With that pointer assignment there, GCC sees it as a global aliasing the same memory location as the array, and that global escapes the function.  Because of that, it no longer believes it knows what happens to the memory once the printf call happens (since it's really a call to printf_chk, and because of the way glibc works, printf is not a readonly functiojn)




_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [cfe-dev] For alias analysis, It's gcc too aggressive or LLVM need to improve?

Hal Finkel
In reply to this post by Richard Smith-33
----- Original Message -----

> From: "Richard Smith" <[hidden email]>
> To: "Daniel Berlin" <[hidden email]>
> Cc: "Reid Kleckner" <[hidden email]>, "Hal Finkel" <[hidden email]>, "Daniel Berlin" <[hidden email]>, "LLVM
> Developers Mailing List" <[hidden email]>, "cfe-dev Developers" <[hidden email]>
> Sent: Monday, August 11, 2014 9:31:09 PM
> Subject: Re: [LLVMdev] [cfe-dev] For alias analysis, It's gcc too aggressive or LLVM need to improve?
>
>
>
>
> I'll take this from the C++ angle; the C rules are not the same, and
> I'm not confident they give the same answer.
>
>
> On Mon, Aug 11, 2014 at 2:09 PM, Daniel Berlin < [hidden email]
> > wrote:
>
>
> The access path matters (in some sense), but this is, AFIAK, valid no
> matter how you look at it.
>
> Let's take a look line by line
>
>
> #include <stdio.h>
> struct heap {int index; int b;};
> struct heap **ptr;
> int aa;
>
> int main() {
> struct heap element;
> struct heap *array[2];
> array[0] = (struct heap *)&aa; <- Okay so far.
> array[1] = &element; <- Clearly okay
> ptr = array; <- still okay so far
> aa = 1; <- not pointer related.
> int i; <- not pointer related
> for (i =0; i< 2; i++) { <- not pointer related
> printf("i is %d, aa is %d\n", i, aa); <- not pointer related
> ptr[i]->index = 0; <- Here is where it gets wonky.
>
> <rest of codeis irrelevan>
>
> First, ptr[i] is an lvalue, of type struct heap *, and ptr[i]-> is an
> lvalue of type struct heap (in C++03, this is 5.2.5 paragraph 3,
> check
> footnote 59).
>
>
>
> This is where we get the undefined behavior.
>
>
>
> 3.8/6: "[If you have a glvalue referring to storage but where there
> is no corresponding object, the] program has undefined behavior if:
>
> [...] the glvalue is used to access a non-static data member".
>
>
> There is no object of type 'heap' denoted by *ptr[0] (by 1.8/1, we
> can only create objects through definitions, new-expressions, and by
> creating temporary objects). So the behavior is undefined when we
> evaluate ptr[0]->index.

Any thoughts on how we might communicate this information to the optimizer? In this case we might be able to infer something from the struct-path aware TBAA (because reaching the int though the structure has a different path length than reaching the top-level int), but I'm not sure that addresses the general case.

Thanks again,
Hal

>
> (I'm too lazy to parse the rules for whether E1.E2 is an lvalue,
> because it doesn't end up making a difference)
>
> Let's assume, for the sake of argument, the actual access to aa
> occurs through an lvalue of type "struct heap" rather than "int"
>
> In C++03 and C++11, it says:
>
> An object shall have its stored value accessed only by an lvalue
> expression that has one of the following types:
>
>
> a type compatible with the effective type of the object,
> a qualified version of a type compatible with the effective type of
> the object,
> a type that is the signed or unsigned type corresponding to the
>
> effective type of the object,
> a type that is the signed or unsigned type corresponding to a
> qualified version of the effective type of the object,
>
> an aggregate or union type that includes one of the aforementioned
> types among its members (including, recursively, a member of a
> subaggregate or contained union), or
> a character type.
> (C++11 adds something about dynamic type here)
>
> struct heap is "an aggregate or union type that includes one of the
> aforementioned types among it's members".
>
> Thus, this is legal to access this int through an lvalue expression
> that has a type of struct heap.
> Whether the actual store is legal for other reasons, i don't know.
> There are all kinds of rules about object alignment and value
> representation that aren't my baliwick. I leave it to another
> language lawyer to say whether it's okay to do a store to something
> that is essentially, a partial object.
>
> Note that GCC actually knows this is legal to alias, at least at the
> tree level. I debugged it there, and it definitely isn't eliminating
> it at a high level. It also completely understands the call to
> ->index
> = 0 affects "aa", and has a reload for aa before the printf call.
>
> I don't know what is eliminating this at the RTL level, but i can't
> see why it's illegal from *aliasing rules*. Maybe this is invalid for
> some other reason.
>
>
>
>
>
>
>
> }
> return 0;
>
> }
>
> ptr[i]->index = 0;
>
>
>
> On Mon, Aug 11, 2014 at 1:13 PM, Reid Kleckner < [hidden email] >
> wrote:
> > +aliasing people
> >
> > I *think* this is valid, because the rules have always been
> > described to me
> > in terms of underlying storage type, and not access path. These are
> > all
> > ints, so all loads and stores can alias.
> >
> >
> > On Sat, Aug 9, 2014 at 3:07 PM, Hal Finkel < [hidden email] >
> > wrote:
> >>
> >> ----- Original Message -----
> >> > From: "Tim Northover" < [hidden email] >
> >> > To: "Jonas Wagner" < [hidden email] >
> >> > Cc: "cfe-dev Developers" < [hidden email] >, "LLVM
> >> > Developers Mailing
> >> > List" < [hidden email] >
> >> > Sent: Friday, August 8, 2014 6:54:50 AM
> >> > Subject: Re: [cfe-dev] [LLVMdev] For alias analysis, It's gcc
> >> > too
> >> > aggressive or LLVM need to improve?
> >> >
> >> > > your C program invokes undefined behavior when it dereferences
> >> > > pointers that
> >> > > have been converted to other types. See for example
> >> > >
> >> > > http://stackoverflow.com/questions/4810417/c-when-is-casting-between-pointer-types-not-undefined-behavior
> >> >
> >> > I don't think it's quite that simple.The type-based aliasing
> >> > rules
> >> > come from 6.5p7 of C11, I think. That says:
> >> >
> >> > "An object shall have its stored value accessed only by an
> >> > lvalue
> >> > expression that has one of
> >> > the following types:
> >> > + a type compatible with the effective type of the object,
> >> > [...]
> >> > + an aggregate or union type that includes one of the
> >> > aforementioned
> >> > types among its members [...]"
> >> >
> >> > That would seem to allow this usage: aa (effective type "int")
> >> > is
> >> > being accessed via an lvalue "ptr[i]->index" of type "int".
> >> >
> >> > The second point would even seem to allow something like "ptr[i]
> >> > =
> >> > ..." if aa was declared "int aa[2];", though that seems to be
> >> > going
> >> > too far. It also seems to be very difficult to pin down a
> >> > meaning
> >> > (from the standard) for "a->b" if a is not a pointer to an
> >> > object
> >> > with
> >> > the correct effective type. So the entire area is probably one
> >> > that's
> >> > open to interpretation.
> >> >
> >> > I've added cfe-dev to the list; they're the *professional*
> >> > language
> >> > lawyers.
> >>
> >> Coincidentally, this also seems to be PR20585 (adding Jiangning
> >> Liu, the
> >> reporter of that bug, to this thread too).
> >>
> >> -Hal
> >>
> >> >
> >> > Cheers.
> >> >
> >> > Tim.
> >> > _______________________________________________
> >> > cfe-dev mailing list
> >> > [hidden email]
> >> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> >> >
> >>
> >> --
> >> Hal Finkel
> >> Assistant Computational Scientist
> >> Leadership Computing Facility
> >> Argonne National Laboratory
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> [hidden email] http://llvm.cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > [hidden email] http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
>
>

--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [cfe-dev] For alias analysis, It's gcc too aggressive or LLVM need to improve?

Daniel Berlin-5
The traditional way to communicate this is by filtering some analysis
with other ones (which is different than chaining them and asking them
each the same question)

For example, GCC post-filters points-to sets with TBAA (or you could
do it during analysis prior, but it's significantly more expensive to
do repeated filtering, even though you may gain precision).

Here, points-to would come up and say "array points to aa and element"
post-filtering the set with TBAA (or whatever you like)  and asking
"can array legally point to aa by the TBAA rules" would come up with
"no" (even in LLVM, given the types, you should see that they are in
different TBAA subtrees).  It will then say "array only points to
element". The aliases query then answers it right. The optimizer is
happy.

(It's a bit more complex for C on the analysis side because you can
only do this when the pointer is dereferenced, since otherwise they
can *store* whatever they like in it, if it's never used).

I'm not aware of a more general mechanism for doing this than the
above.  The generalization is usually the "type filtering".




On Tue, Aug 19, 2014 at 4:57 PM, Hal Finkel <[hidden email]> wrote:

> ----- Original Message -----
>> From: "Richard Smith" <[hidden email]>
>> To: "Daniel Berlin" <[hidden email]>
>> Cc: "Reid Kleckner" <[hidden email]>, "Hal Finkel" <[hidden email]>, "Daniel Berlin" <[hidden email]>, "LLVM
>> Developers Mailing List" <[hidden email]>, "cfe-dev Developers" <[hidden email]>
>> Sent: Monday, August 11, 2014 9:31:09 PM
>> Subject: Re: [LLVMdev] [cfe-dev] For alias analysis, It's gcc too aggressive or LLVM need to improve?
>>
>>
>>
>>
>> I'll take this from the C++ angle; the C rules are not the same, and
>> I'm not confident they give the same answer.
>>
>>
>> On Mon, Aug 11, 2014 at 2:09 PM, Daniel Berlin < [hidden email]
>> > wrote:
>>
>>
>> The access path matters (in some sense), but this is, AFIAK, valid no
>> matter how you look at it.
>>
>> Let's take a look line by line
>>
>>
>> #include <stdio.h>
>> struct heap {int index; int b;};
>> struct heap **ptr;
>> int aa;
>>
>> int main() {
>> struct heap element;
>> struct heap *array[2];
>> array[0] = (struct heap *)&aa; <- Okay so far.
>> array[1] = &element; <- Clearly okay
>> ptr = array; <- still okay so far
>> aa = 1; <- not pointer related.
>> int i; <- not pointer related
>> for (i =0; i< 2; i++) { <- not pointer related
>> printf("i is %d, aa is %d\n", i, aa); <- not pointer related
>> ptr[i]->index = 0; <- Here is where it gets wonky.
>>
>> <rest of codeis irrelevan>
>>
>> First, ptr[i] is an lvalue, of type struct heap *, and ptr[i]-> is an
>> lvalue of type struct heap (in C++03, this is 5.2.5 paragraph 3,
>> check
>> footnote 59).
>>
>>
>>
>> This is where we get the undefined behavior.
>>
>>
>>
>> 3.8/6: "[If you have a glvalue referring to storage but where there
>> is no corresponding object, the] program has undefined behavior if:
>>
>> [...] the glvalue is used to access a non-static data member".
>>
>>
>> There is no object of type 'heap' denoted by *ptr[0] (by 1.8/1, we
>> can only create objects through definitions, new-expressions, and by
>> creating temporary objects). So the behavior is undefined when we
>> evaluate ptr[0]->index.
>
> Any thoughts on how we might communicate this information to the optimizer? In this case we might be able to infer something from the struct-path aware TBAA (because reaching the int though the structure has a different path length than reaching the top-level int), but I'm not sure that addresses the general case.
>
> Thanks again,
> Hal
>
>>
>> (I'm too lazy to parse the rules for whether E1.E2 is an lvalue,
>> because it doesn't end up making a difference)
>>
>> Let's assume, for the sake of argument, the actual access to aa
>> occurs through an lvalue of type "struct heap" rather than "int"
>>
>> In C++03 and C++11, it says:
>>
>> An object shall have its stored value accessed only by an lvalue
>> expression that has one of the following types:
>>
>>
>> a type compatible with the effective type of the object,
>> a qualified version of a type compatible with the effective type of
>> the object,
>> a type that is the signed or unsigned type corresponding to the
>>
>> effective type of the object,
>> a type that is the signed or unsigned type corresponding to a
>> qualified version of the effective type of the object,
>>
>> an aggregate or union type that includes one of the aforementioned
>> types among its members (including, recursively, a member of a
>> subaggregate or contained union), or
>> a character type.
>> (C++11 adds something about dynamic type here)
>>
>> struct heap is "an aggregate or union type that includes one of the
>> aforementioned types among it's members".
>>
>> Thus, this is legal to access this int through an lvalue expression
>> that has a type of struct heap.
>> Whether the actual store is legal for other reasons, i don't know.
>> There are all kinds of rules about object alignment and value
>> representation that aren't my baliwick. I leave it to another
>> language lawyer to say whether it's okay to do a store to something
>> that is essentially, a partial object.
>>
>> Note that GCC actually knows this is legal to alias, at least at the
>> tree level. I debugged it there, and it definitely isn't eliminating
>> it at a high level. It also completely understands the call to
>> ->index
>> = 0 affects "aa", and has a reload for aa before the printf call.
>>
>> I don't know what is eliminating this at the RTL level, but i can't
>> see why it's illegal from *aliasing rules*. Maybe this is invalid for
>> some other reason.
>>
>>
>>
>>
>>
>>
>>
>> }
>> return 0;
>>
>> }
>>
>> ptr[i]->index = 0;
>>
>>
>>
>> On Mon, Aug 11, 2014 at 1:13 PM, Reid Kleckner < [hidden email] >
>> wrote:
>> > +aliasing people
>> >
>> > I *think* this is valid, because the rules have always been
>> > described to me
>> > in terms of underlying storage type, and not access path. These are
>> > all
>> > ints, so all loads and stores can alias.
>> >
>> >
>> > On Sat, Aug 9, 2014 at 3:07 PM, Hal Finkel < [hidden email] >
>> > wrote:
>> >>
>> >> ----- Original Message -----
>> >> > From: "Tim Northover" < [hidden email] >
>> >> > To: "Jonas Wagner" < [hidden email] >
>> >> > Cc: "cfe-dev Developers" < [hidden email] >, "LLVM
>> >> > Developers Mailing
>> >> > List" < [hidden email] >
>> >> > Sent: Friday, August 8, 2014 6:54:50 AM
>> >> > Subject: Re: [cfe-dev] [LLVMdev] For alias analysis, It's gcc
>> >> > too
>> >> > aggressive or LLVM need to improve?
>> >> >
>> >> > > your C program invokes undefined behavior when it dereferences
>> >> > > pointers that
>> >> > > have been converted to other types. See for example
>> >> > >
>> >> > > http://stackoverflow.com/questions/4810417/c-when-is-casting-between-pointer-types-not-undefined-behavior
>> >> >
>> >> > I don't think it's quite that simple.The type-based aliasing
>> >> > rules
>> >> > come from 6.5p7 of C11, I think. That says:
>> >> >
>> >> > "An object shall have its stored value accessed only by an
>> >> > lvalue
>> >> > expression that has one of
>> >> > the following types:
>> >> > + a type compatible with the effective type of the object,
>> >> > [...]
>> >> > + an aggregate or union type that includes one of the
>> >> > aforementioned
>> >> > types among its members [...]"
>> >> >
>> >> > That would seem to allow this usage: aa (effective type "int")
>> >> > is
>> >> > being accessed via an lvalue "ptr[i]->index" of type "int".
>> >> >
>> >> > The second point would even seem to allow something like "ptr[i]
>> >> > =
>> >> > ..." if aa was declared "int aa[2];", though that seems to be
>> >> > going
>> >> > too far. It also seems to be very difficult to pin down a
>> >> > meaning
>> >> > (from the standard) for "a->b" if a is not a pointer to an
>> >> > object
>> >> > with
>> >> > the correct effective type. So the entire area is probably one
>> >> > that's
>> >> > open to interpretation.
>> >> >
>> >> > I've added cfe-dev to the list; they're the *professional*
>> >> > language
>> >> > lawyers.
>> >>
>> >> Coincidentally, this also seems to be PR20585 (adding Jiangning
>> >> Liu, the
>> >> reporter of that bug, to this thread too).
>> >>
>> >> -Hal
>> >>
>> >> >
>> >> > Cheers.
>> >> >
>> >> > Tim.
>> >> > _______________________________________________
>> >> > cfe-dev mailing list
>> >> > [hidden email]
>> >> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>> >> >
>> >>
>> >> --
>> >> Hal Finkel
>> >> Assistant Computational Scientist
>> >> Leadership Computing Facility
>> >> Argonne National Laboratory
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> [hidden email] http://llvm.cs.uiuc.edu
>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >
>> >
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > [hidden email] http://llvm.cs.uiuc.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >
>>
>>
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [cfe-dev] For alias analysis, It's gcc too aggressive or LLVM need to improve?

Xinliang David Li-2
In reply to this post by Kevin Qin
Danny explains well on address escape and function side effect etc.  In particular, without the address escape, GCC fully unrolls the loop, and array[0]->index gets converted into direct access to 'aa' after fre pass.

David


On Mon, Aug 11, 2014 at 11:44 PM, Kevin Qin <[hidden email]> wrote:
Hi all,

Thanks for you paying time to look at this issue. I'm not an expert for C/C++ language, so I can just post some experiment results from LLVM and GCC. 

If we make minor changes to the test, gcc may give different results.

#include <stdio.h>
struct heap {int index; int b;};
struct heap **ptr;
int aa;

int main() {
  struct heap element;
  struct heap *array[2];
  array[0] = (struct heap *)&aa;
  array[1] = &element;
  ptr = array;
  aa = 1;
  int i;
  for (i =0; i< 2; i++) {
    printf("i is %d, aa is %d\n", i, aa);
    array[i]->index = 0;  // we replace ptr to array here. so no global lvalue is used.
  }
  return 0;
}

Result didn't get changed,

$gcc test.c -O0
$./a.out
i is 0, aa is 1
i is 1, aa is 0

$gcc test.c -O2
$./a.out
i is 0, aa is 1
i is 1, aa is 1

But if we change a bit more, like

#include <stdio.h>
struct heap {int index; int b;};
struct heap **ptr;
int aa;

int main() {
  struct heap element;
  struct heap *array[2];
  array[0] = (struct heap *)&aa;
  array[1] = &element;
  //ptr = array; // remove this assignment as well.
  aa = 1;
  int i;
  for (i =0; i< 2; i++) {
    printf("i is %d, aa is %d\n", i, aa);
    array[i]->index = 0;  // we replace ptr to array here. so no global lvalue is used.
  }
  return 0;
}

Result changed to be the same as LLVM.

$gcc test.c -O0
$./a.out
i is 0, aa is 1
i is 1, aa is 0

$gcc test.c -O2
$./a.out
i is 0, aa is 1
i is 1, aa is 0

I don't know why a assignement statment to a unrelated global pointer will affect gcc's aliasing work, and I don't know from the language point of view, if we use a local pointer to replace the global pointer, then the result would be still undefined or not.

Regards,
Kevin


2014-08-12 12:02 GMT+08:00 Daniel Berlin <[hidden email]>:

So then there you go, a real language lawyer says it's invalid for
other reasons :)


On Mon, Aug 11, 2014 at 7:31 PM, Richard Smith <[hidden email]> wrote:
> I'll take this from the C++ angle; the C rules are not the same, and I'm not
> confident they give the same answer.
>
> On Mon, Aug 11, 2014 at 2:09 PM, Daniel Berlin <[hidden email]> wrote:
>>
>> The access path matters (in some sense), but this is, AFIAK, valid no
>> matter how you look at it.
>>
>> Let's take a look line by line
>>
>> #include <stdio.h>
>> struct heap {int index; int b;};
>> struct heap **ptr;
>> int aa;
>>
>> int main() {
>>   struct heap element;
>>   struct heap *array[2];
>>   array[0] = (struct heap *)&aa; <- Okay so far.
>>   array[1] = &element; <- Clearly okay
>>   ptr = array; <- still okay so far
>>   aa = 1; <- not pointer related.
>>   int i; <- not pointer related
>>   for (i =0; i< 2; i++) { <- not pointer related
>>     printf("i is %d, aa is %d\n", i, aa); <- not pointer related
>>     ptr[i]->index = 0; <- Here is where it gets wonky.
>>
>> <rest of codeis irrelevan>
>>
>> First, ptr[i] is an lvalue, of type struct heap *, and ptr[i]-> is an
>> lvalue of type struct heap (in C++03, this is 5.2.5 paragraph 3, check
>> footnote 59).
>
>
> This is where we get the undefined behavior.
>
> 3.8/6: "[If you have a glvalue referring to storage but where there is no
> corresponding object, the] program has undefined behavior if:
> [...] the glvalue is used to access a non-static data member".
>
> There is no object of type 'heap' denoted by *ptr[0] (by 1.8/1, we can only
> create objects through definitions, new-expressions, and by creating
> temporary objects). So the behavior is undefined when we evaluate
> ptr[0]->index.
>
>> (I'm too lazy to parse the rules for whether E1.E2 is an lvalue,
>> because it doesn't end up making a difference)
>>
>> Let's assume, for the sake of argument,  the actual access to aa
>> occurs through an lvalue of type "struct heap" rather than "int"
>> In C++03 and C++11, it says:
>>
>> An object shall have its stored value accessed only by an lvalue
>> expression that has one of the following types:
>>
>> a type compatible with the effective type of the object,
>> a qualified version of a type compatible with the effective type of the
>> object,
>> a type that is the signed or unsigned type corresponding to the
>> effective type of the object,
>> a type that is the signed or unsigned type corresponding to a
>> qualified version of the effective type of the object,
>> an aggregate or union type that includes one of the aforementioned
>> types among its members (including, recursively, a member of a
>> subaggregate or contained union), or
>> a character type.
>> (C++11 adds something about dynamic type here)
>>
>> struct heap is "an aggregate or union type that includes one of the
>> aforementioned types among it's members".
>>
>> Thus, this is legal to access this int through an lvalue expression
>> that has a type of struct heap.
>> Whether the actual store is legal for other reasons, i don't know.
>> There are all kinds of rules about object alignment and value
>> representation that aren't my baliwick.  I leave it to another
>> language lawyer to say whether it's okay to do a store to something
>> that is essentially, a partial object.
>>
>> Note that GCC actually knows this is legal to alias, at least at the
>> tree level. I debugged it there, and it definitely isn't eliminating
>> it at a high level. It also completely understands the call to ->index
>> = 0 affects "aa", and has a reload for aa before the printf call.
>>
>> I don't know what is eliminating this at the RTL level, but i can't
>> see why it's illegal from *aliasing rules*. Maybe this is invalid for
>> some other reason.
>>
>>
>>
>>
>>
>>
>>
>>   }
>>   return 0;
>> }
>>
>> ptr[i]->index = 0;
>>
>> On Mon, Aug 11, 2014 at 1:13 PM, Reid Kleckner <[hidden email]> wrote:
>> > +aliasing people
>> >
>> > I *think* this is valid, because the rules have always been described to
>> > me
>> > in terms of underlying storage type, and not access path. These are all
>> > ints, so all loads and stores can alias.
>> >
>> >
>> > On Sat, Aug 9, 2014 at 3:07 PM, Hal Finkel <[hidden email]> wrote:
>> >>
>> >> ----- Original Message -----
>> >> > From: "Tim Northover" <[hidden email]>
>> >> > To: "Jonas Wagner" <[hidden email]>
>> >> > Cc: "cfe-dev Developers" <[hidden email]>, "LLVM Developers
>> >> > Mailing
>> >> > List" <[hidden email]>
>> >> > Sent: Friday, August 8, 2014 6:54:50 AM
>> >> > Subject: Re: [cfe-dev] [LLVMdev] For alias analysis, It's gcc too
>> >> > aggressive or LLVM need to improve?
>> >> >
>> >> > > your C program invokes undefined behavior when it dereferences
>> >> > > pointers that
>> >> > > have been converted to other types. See for example
>> >> > >
>> >> > >
>> >> > > http://stackoverflow.com/questions/4810417/c-when-is-casting-between-pointer-types-not-undefined-behavior
>> >> >
>> >> > I don't think it's quite that simple.The type-based aliasing rules
>> >> > come from 6.5p7 of C11, I think. That says:
>> >> >
>> >> > "An object shall have its stored value accessed only by an lvalue
>> >> > expression that has one of
>> >> > the following types:
>> >> >   + a type compatible with the effective type of the object,
>> >> >   [...]
>> >> >   + an aggregate or union type that includes one of the
>> >> >   aforementioned
>> >> > types among its members [...]"
>> >> >
>> >> > That would seem to allow this usage: aa (effective type "int") is
>> >> > being accessed via an lvalue "ptr[i]->index" of type "int".
>> >> >
>> >> > The second point would even seem to allow something like "ptr[i] =
>> >> > ..." if aa was declared "int aa[2];", though that seems to be going
>> >> > too far. It also seems to be very difficult to pin down a meaning
>> >> > (from the standard) for "a->b" if a is not a pointer to an object
>> >> > with
>> >> > the correct effective type. So the entire area is probably one that's
>> >> > open to interpretation.
>> >> >
>> >> > I've added cfe-dev to the list; they're the *professional* language
>> >> > lawyers.
>> >>
>> >> Coincidentally, this also seems to be PR20585 (adding Jiangning Liu,
>> >> the
>> >> reporter of that bug, to this thread too).
>> >>
>> >>  -Hal
>> >>
>> >> >
>> >> > Cheers.
>> >> >
>> >> > Tim.
>> >> > _______________________________________________
>> >> > cfe-dev mailing list
>> >> > [hidden email]
>> >> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>> >> >
>> >>
>> >> --
>> >> Hal Finkel
>> >> Assistant Computational Scientist
>> >> Leadership Computing Facility
>> >> Argonne National Laboratory
>> >> _______________________________________________
>> >> LLVM Developers mailing list
>> >> [hidden email]         http://llvm.cs.uiuc.edu
>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >
>> >
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > [hidden email]         http://llvm.cs.uiuc.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >
>
>
_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev



--
Best Regards,

Kevin Qin

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [cfe-dev] For alias analysis, It's gcc too aggressive or LLVM need to improve?

Hal Finkel
In reply to this post by Daniel Berlin-5
----- Original Message -----

> From: "Daniel Berlin" <[hidden email]>
> To: "Hal Finkel" <[hidden email]>
> Cc: "Richard Smith" <[hidden email]>, "Reid Kleckner" <[hidden email]>, "LLVM Developers Mailing List"
> <[hidden email]>, "cfe-dev Developers" <[hidden email]>, "Daniel Berlin" <[hidden email]>
> Sent: Tuesday, August 19, 2014 9:25:11 PM
> Subject: Re: [LLVMdev] [cfe-dev] For alias analysis, It's gcc too aggressive or LLVM need to improve?
>
> The traditional way to communicate this is by filtering some analysis
> with other ones (which is different than chaining them and asking
> them
> each the same question)

For better or worse, for types within LLVM's current infrastructure, I'm not sure there is a distinction: we only have type information at the accesses themselves. But for accesses within structs, we have full "path" information from the base access type, through the parents with offsets included. So we might be able to recover some of the 'recursive filtering' semantics that you describe.

For this case, we have an access to ptr[i]->index and an access to some global int aa. When looking at the access to ptr[i]->index we should have path information that tells us that ptr[i]->index is an access to an int at offset zero from 'struct heap'. We also know from the metadata node describing 'struct heap' that it also has an int at offset 4, and so must be at least 8 bytes in total. We can also look at the address of aa and, because it is a global, easily determine that any object created there must be no larger than 4 bytes (and, thus, no object of type 'struct heap' could ever have existed there). From this, we could conclude NoAlias.

We could conceivably enhance this further by also tagging globals, calls to operator new, etc. with types to provide explicit object types where known. Then we could use these types, instead of just the number of known dereferenceable bytes, to determine what kinds of objects can exist at some pointer address.

Does this make sense?

 -Hal

>
> For example, GCC post-filters points-to sets with TBAA (or you could
> do it during analysis prior, but it's significantly more expensive to
> do repeated filtering, even though you may gain precision).
>
> Here, points-to would come up and say "array points to aa and
> element"
> post-filtering the set with TBAA (or whatever you like)  and asking
> "can array legally point to aa by the TBAA rules" would come up with
> "no" (even in LLVM, given the types, you should see that they are in
> different TBAA subtrees).  It will then say "array only points to
> element". The aliases query then answers it right. The optimizer is
> happy.
>
> (It's a bit more complex for C on the analysis side because you can
> only do this when the pointer is dereferenced, since otherwise they
> can *store* whatever they like in it, if it's never used).
>
> I'm not aware of a more general mechanism for doing this than the
> above.  The generalization is usually the "type filtering".
>
>
>
>
> On Tue, Aug 19, 2014 at 4:57 PM, Hal Finkel <[hidden email]> wrote:
> > ----- Original Message -----
> >> From: "Richard Smith" <[hidden email]>
> >> To: "Daniel Berlin" <[hidden email]>
> >> Cc: "Reid Kleckner" <[hidden email]>, "Hal Finkel"
> >> <[hidden email]>, "Daniel Berlin" <[hidden email]>, "LLVM
> >> Developers Mailing List" <[hidden email]>, "cfe-dev
> >> Developers" <[hidden email]>
> >> Sent: Monday, August 11, 2014 9:31:09 PM
> >> Subject: Re: [LLVMdev] [cfe-dev] For alias analysis, It's gcc too
> >> aggressive or LLVM need to improve?
> >>
> >>
> >>
> >>
> >> I'll take this from the C++ angle; the C rules are not the same,
> >> and
> >> I'm not confident they give the same answer.
> >>
> >>
> >> On Mon, Aug 11, 2014 at 2:09 PM, Daniel Berlin <
> >> [hidden email]
> >> > wrote:
> >>
> >>
> >> The access path matters (in some sense), but this is, AFIAK, valid
> >> no
> >> matter how you look at it.
> >>
> >> Let's take a look line by line
> >>
> >>
> >> #include <stdio.h>
> >> struct heap {int index; int b;};
> >> struct heap **ptr;
> >> int aa;
> >>
> >> int main() {
> >> struct heap element;
> >> struct heap *array[2];
> >> array[0] = (struct heap *)&aa; <- Okay so far.
> >> array[1] = &element; <- Clearly okay
> >> ptr = array; <- still okay so far
> >> aa = 1; <- not pointer related.
> >> int i; <- not pointer related
> >> for (i =0; i< 2; i++) { <- not pointer related
> >> printf("i is %d, aa is %d\n", i, aa); <- not pointer related
> >> ptr[i]->index = 0; <- Here is where it gets wonky.
> >>
> >> <rest of codeis irrelevan>
> >>
> >> First, ptr[i] is an lvalue, of type struct heap *, and ptr[i]-> is
> >> an
> >> lvalue of type struct heap (in C++03, this is 5.2.5 paragraph 3,
> >> check
> >> footnote 59).
> >>
> >>
> >>
> >> This is where we get the undefined behavior.
> >>
> >>
> >>
> >> 3.8/6: "[If you have a glvalue referring to storage but where
> >> there
> >> is no corresponding object, the] program has undefined behavior
> >> if:
> >>
> >> [...] the glvalue is used to access a non-static data member".
> >>
> >>
> >> There is no object of type 'heap' denoted by *ptr[0] (by 1.8/1, we
> >> can only create objects through definitions, new-expressions, and
> >> by
> >> creating temporary objects). So the behavior is undefined when we
> >> evaluate ptr[0]->index.
> >
> > Any thoughts on how we might communicate this information to the
> > optimizer? In this case we might be able to infer something from
> > the struct-path aware TBAA (because reaching the int though the
> > structure has a different path length than reaching the top-level
> > int), but I'm not sure that addresses the general case.
> >
> > Thanks again,
> > Hal
> >
> >>
> >> (I'm too lazy to parse the rules for whether E1.E2 is an lvalue,
> >> because it doesn't end up making a difference)
> >>
> >> Let's assume, for the sake of argument, the actual access to aa
> >> occurs through an lvalue of type "struct heap" rather than "int"
> >>
> >> In C++03 and C++11, it says:
> >>
> >> An object shall have its stored value accessed only by an lvalue
> >> expression that has one of the following types:
> >>
> >>
> >> a type compatible with the effective type of the object,
> >> a qualified version of a type compatible with the effective type
> >> of
> >> the object,
> >> a type that is the signed or unsigned type corresponding to the
> >>
> >> effective type of the object,
> >> a type that is the signed or unsigned type corresponding to a
> >> qualified version of the effective type of the object,
> >>
> >> an aggregate or union type that includes one of the aforementioned
> >> types among its members (including, recursively, a member of a
> >> subaggregate or contained union), or
> >> a character type.
> >> (C++11 adds something about dynamic type here)
> >>
> >> struct heap is "an aggregate or union type that includes one of
> >> the
> >> aforementioned types among it's members".
> >>
> >> Thus, this is legal to access this int through an lvalue
> >> expression
> >> that has a type of struct heap.
> >> Whether the actual store is legal for other reasons, i don't know.
> >> There are all kinds of rules about object alignment and value
> >> representation that aren't my baliwick. I leave it to another
> >> language lawyer to say whether it's okay to do a store to
> >> something
> >> that is essentially, a partial object.
> >>
> >> Note that GCC actually knows this is legal to alias, at least at
> >> the
> >> tree level. I debugged it there, and it definitely isn't
> >> eliminating
> >> it at a high level. It also completely understands the call to
> >> ->index
> >> = 0 affects "aa", and has a reload for aa before the printf call.
> >>
> >> I don't know what is eliminating this at the RTL level, but i
> >> can't
> >> see why it's illegal from *aliasing rules*. Maybe this is invalid
> >> for
> >> some other reason.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> }
> >> return 0;
> >>
> >> }
> >>
> >> ptr[i]->index = 0;
> >>
> >>
> >>
> >> On Mon, Aug 11, 2014 at 1:13 PM, Reid Kleckner < [hidden email] >
> >> wrote:
> >> > +aliasing people
> >> >
> >> > I *think* this is valid, because the rules have always been
> >> > described to me
> >> > in terms of underlying storage type, and not access path. These
> >> > are
> >> > all
> >> > ints, so all loads and stores can alias.
> >> >
> >> >
> >> > On Sat, Aug 9, 2014 at 3:07 PM, Hal Finkel < [hidden email] >
> >> > wrote:
> >> >>
> >> >> ----- Original Message -----
> >> >> > From: "Tim Northover" < [hidden email] >
> >> >> > To: "Jonas Wagner" < [hidden email] >
> >> >> > Cc: "cfe-dev Developers" < [hidden email] >, "LLVM
> >> >> > Developers Mailing
> >> >> > List" < [hidden email] >
> >> >> > Sent: Friday, August 8, 2014 6:54:50 AM
> >> >> > Subject: Re: [cfe-dev] [LLVMdev] For alias analysis, It's gcc
> >> >> > too
> >> >> > aggressive or LLVM need to improve?
> >> >> >
> >> >> > > your C program invokes undefined behavior when it
> >> >> > > dereferences
> >> >> > > pointers that
> >> >> > > have been converted to other types. See for example
> >> >> > >
> >> >> > > http://stackoverflow.com/questions/4810417/c-when-is-casting-between-pointer-types-not-undefined-behavior
> >> >> >
> >> >> > I don't think it's quite that simple.The type-based aliasing
> >> >> > rules
> >> >> > come from 6.5p7 of C11, I think. That says:
> >> >> >
> >> >> > "An object shall have its stored value accessed only by an
> >> >> > lvalue
> >> >> > expression that has one of
> >> >> > the following types:
> >> >> > + a type compatible with the effective type of the object,
> >> >> > [...]
> >> >> > + an aggregate or union type that includes one of the
> >> >> > aforementioned
> >> >> > types among its members [...]"
> >> >> >
> >> >> > That would seem to allow this usage: aa (effective type
> >> >> > "int")
> >> >> > is
> >> >> > being accessed via an lvalue "ptr[i]->index" of type "int".
> >> >> >
> >> >> > The second point would even seem to allow something like
> >> >> > "ptr[i]
> >> >> > =
> >> >> > ..." if aa was declared "int aa[2];", though that seems to be
> >> >> > going
> >> >> > too far. It also seems to be very difficult to pin down a
> >> >> > meaning
> >> >> > (from the standard) for "a->b" if a is not a pointer to an
> >> >> > object
> >> >> > with
> >> >> > the correct effective type. So the entire area is probably
> >> >> > one
> >> >> > that's
> >> >> > open to interpretation.
> >> >> >
> >> >> > I've added cfe-dev to the list; they're the *professional*
> >> >> > language
> >> >> > lawyers.
> >> >>
> >> >> Coincidentally, this also seems to be PR20585 (adding Jiangning
> >> >> Liu, the
> >> >> reporter of that bug, to this thread too).
> >> >>
> >> >> -Hal
> >> >>
> >> >> >
> >> >> > Cheers.
> >> >> >
> >> >> > Tim.
> >> >> > _______________________________________________
> >> >> > cfe-dev mailing list
> >> >> > [hidden email]
> >> >> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> >> >> >
> >> >>
> >> >> --
> >> >> Hal Finkel
> >> >> Assistant Computational Scientist
> >> >> Leadership Computing Facility
> >> >> Argonne National Laboratory
> >> >> _______________________________________________
> >> >> LLVM Developers mailing list
> >> >> [hidden email] http://llvm.cs.uiuc.edu
> >> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > LLVM Developers mailing list
> >> > [hidden email] http://llvm.cs.uiuc.edu
> >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >> >
> >>
> >>
> >
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
>

--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev