GEP vs IntToPtr/PtrToInt

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

GEP vs IntToPtr/PtrToInt

Arushi Aggarwal
Hi,

Is it correct to convert, 

  %196 = load i32* %195, align 8                  ; <i32> [#uses=1]
  %197 = zext i32 %196 to i64                     ; <i64> [#uses=1]
  %198 = ptrtoint i8* %193 to i64                 ; <i64> [#uses=1]
  %199 = add i64 %198, %197                       ; <i64> [#uses=1]
  %200 = inttoptr i64 %199 to i8*                 ; <i8*> [#uses=1]

into 

%200 = getelementptr %193, %196

Reducing the unnecessary casts of converting to integers and then back?

Thanks,
Arushi



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

GEP vs IntToPtr/PtrToInt

Arushi Aggarwal


Hi,

Is it correct to convert, 

  %196 = load i32* %195, align 8                  ; <i32> [#uses=1]
  %197 = zext i32 %196 to i64                     ; <i64> [#uses=1]
  %198 = ptrtoint i8* %193 to i64                 ; <i64> [#uses=1]
  %199 = add i64 %198, %197                       ; <i64> [#uses=1]
  %200 = inttoptr i64 %199 to i8*                 ; <i8*> [#uses=1]

into 

%200 = getelementptr %193, %196

Reducing the unnecessary casts of converting to integers and then back?

Thanks,
Arushi




_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: GEP vs IntToPtr/PtrToInt

Eli Friedman-2
On Mon, Apr 4, 2011 at 5:02 PM, Arushi Aggarwal <[hidden email]> wrote:

>
>
>> Hi,
>> Is it correct to convert,
>>   %196 = load i32* %195, align 8                  ; <i32> [#uses=1]
>>   %197 = zext i32 %196 to i64                     ; <i64> [#uses=1]
>>   %198 = ptrtoint i8* %193 to i64                 ; <i64> [#uses=1]
>>   %199 = add i64 %198, %197                       ; <i64> [#uses=1]
>>   %200 = inttoptr i64 %199 to i8*                 ; <i8*> [#uses=1]
>> into
>> %200 = getelementptr %193, %196
>> Reducing the unnecessary casts of converting to integers and then back?
>> Thanks,
>> Arushi
>>

See http://llvm.org/docs/LangRef.html#pointeraliasing ; it's not
correct in general.  It is correct if %196 isn't dependent on the
address of any memory object, though.

-Eli

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: GEP vs IntToPtr/PtrToInt

John Criswell-4
On 4/4/2011 6:45 PM, Eli Friedman wrote:

> On Mon, Apr 4, 2011 at 5:02 PM, Arushi Aggarwal<[hidden email]>  wrote:
>>
>>> Hi,
>>> Is it correct to convert,
>>>    %196 = load i32* %195, align 8                  ;<i32>  [#uses=1]
>>>    %197 = zext i32 %196 to i64                     ;<i64>  [#uses=1]
>>>    %198 = ptrtoint i8* %193 to i64                 ;<i64>  [#uses=1]
>>>    %199 = add i64 %198, %197                       ;<i64>  [#uses=1]
>>>    %200 = inttoptr i64 %199 to i8*                 ;<i8*>  [#uses=1]
>>> into
>>> %200 = getelementptr %193, %196
>>> Reducing the unnecessary casts of converting to integers and then back?
>>> Thanks,
>>> Arushi
>>>
> See http://llvm.org/docs/LangRef.html#pointeraliasing ; it's not
> correct in general.  It is correct if %196 isn't dependent on the
> address of any memory object, though.

Can you clarify why the transform isn't correct?  Is it because in the
original code, %200 is based on both the originally cast pointer (%193)
and the indexed offset from it (%197) while the transformed code is only
based on %193?

Arushi, is some transform converting a GEP into this ptrtoint/inttoptr
sequence and thereby adding "based on" relationships that didn't exist
previously in the code?

-- John T.

> -Eli
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: GEP vs IntToPtr/PtrToInt

Eli Friedman-2
On Mon, Apr 4, 2011 at 7:10 AM, John Criswell <[hidden email]> wrote:

> On 4/4/2011 6:45 PM, Eli Friedman wrote:
>>
>> On Mon, Apr 4, 2011 at 5:02 PM, Arushi Aggarwal<[hidden email]>
>>  wrote:
>>>
>>>> Hi,
>>>> Is it correct to convert,
>>>>   %196 = load i32* %195, align 8                  ;<i32>  [#uses=1]
>>>>   %197 = zext i32 %196 to i64                     ;<i64>  [#uses=1]
>>>>   %198 = ptrtoint i8* %193 to i64                 ;<i64>  [#uses=1]
>>>>   %199 = add i64 %198, %197                       ;<i64>  [#uses=1]
>>>>   %200 = inttoptr i64 %199 to i8*                 ;<i8*>  [#uses=1]
>>>> into
>>>> %200 = getelementptr %193, %196
>>>> Reducing the unnecessary casts of converting to integers and then back?
>>>> Thanks,
>>>> Arushi
>>>>
>> See http://llvm.org/docs/LangRef.html#pointeraliasing ; it's not
>> correct in general.  It is correct if %196 isn't dependent on the
>> address of any memory object, though.
>
> Can you clarify why the transform isn't correct?  Is it because in the
> original code, %200 is based on both the originally cast pointer (%193) and
> the indexed offset from it (%197) while the transformed code is only based
> on %193?

Yes, exactly.

-Eli

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: GEP vs IntToPtr/PtrToInt

Arushi Aggarwal
In reply to this post by John Criswell-4
This code is generated for va_arg. 
%6 = getelementptr inbounds %struct.__va_list_tag* %5, i32 0, i32 3 ; <i8**> [#uses=1]
  %7 = load i8** %6, align 8                      ; <i8*> [#uses=1]
  %8 = getelementptr inbounds [1 x %struct.__va_list_tag]* %ap, i64 0, i64 0 ; <%struct.__va_list_tag*> [#uses=1]
 %9 = getelementptr inbounds %struct.__va_list_tag* %8, i32 0, i32 0 ; <i32*> [#uses=1]
  %10 = load i32* %9, align 8                     ; <i32> [#uses=1]
  %11 = inttoptr i32 %10 to i8*                   ; <i8*> [#uses=1]
  %12 = ptrtoint i8* %7 to i64                    ; <i64> [#uses=1]
  %13 = ptrtoint i8* %11 to i64                   ; <i64> [#uses=1]
  %14 = add i64 %12, %13                          ; <i64> [#uses=1]
  %15 = inttoptr i64 %14 to i8*                   ; <i8*> [#uses=1]
  store i8* %15, i8** %addr.0, align 8

and I have optimized one inttoptr to a zext.

I guess it is safe in this case?


On Mon, Apr 4, 2011 at 9:10 AM, John Criswell <[hidden email]> wrote:
On 4/4/2011 6:45 PM, Eli Friedman wrote:
On Mon, Apr 4, 2011 at 5:02 PM, Arushi Aggarwal<[hidden email]>  wrote:

Hi,
Is it correct to convert,
  %196 = load i32* %195, align 8                  ;<i32>  [#uses=1]
  %197 = zext i32 %196 to i64                     ;<i64>  [#uses=1]
  %198 = ptrtoint i8* %193 to i64                 ;<i64>  [#uses=1]
  %199 = add i64 %198, %197                       ;<i64>  [#uses=1]
  %200 = inttoptr i64 %199 to i8*                 ;<i8*>  [#uses=1]
into
%200 = getelementptr %193, %196
Reducing the unnecessary casts of converting to integers and then back?
Thanks,
Arushi

See http://llvm.org/docs/LangRef.html#pointeraliasing ; it's not
correct in general.  It is correct if %196 isn't dependent on the
address of any memory object, though.

Can you clarify why the transform isn't correct?  Is it because in the original code, %200 is based on both the originally cast pointer (%193) and the indexed offset from it (%197) while the transformed code is only based on %193?

Arushi, is some transform converting a GEP into this ptrtoint/inttoptr sequence and thereby adding "based on" relationships that didn't exist previously in the code?

-- John T.

-Eli

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: GEP vs IntToPtr/PtrToInt

Eli Friedman-2
On Mon, Apr 4, 2011 at 6:38 PM, Arushi Aggarwal <[hidden email]> wrote:

> This code is generated for va_arg.
> %6 = getelementptr inbounds %struct.__va_list_tag* %5, i32 0, i32 3 ; <i8**>
> [#uses=1]
>   %7 = load i8** %6, align 8                      ; <i8*> [#uses=1]
>   %8 = getelementptr inbounds [1 x %struct.__va_list_tag]* %ap, i64 0, i64 0
> ; <%struct.__va_list_tag*> [#uses=1]
>  %9 = getelementptr inbounds %struct.__va_list_tag* %8, i32 0, i32 0 ;
> <i32*> [#uses=1]
>   %10 = load i32* %9, align 8                     ; <i32> [#uses=1]
>   %11 = inttoptr i32 %10 to i8*                   ; <i8*> [#uses=1]
>   %12 = ptrtoint i8* %7 to i64                    ; <i64> [#uses=1]
>   %13 = ptrtoint i8* %11 to i64                   ; <i64> [#uses=1]
>   %14 = add i64 %12, %13                          ; <i64> [#uses=1]
>   %15 = inttoptr i64 %14 to i8*                   ; <i8*> [#uses=1]
>   store i8* %15, i8** %addr.0, align 8
> and I have optimized one inttoptr to a zext.
> I guess it is safe in this case?

I haven't read your example carefully, but conceptually, the code for
va_arg should be able to use GEP instead of ptrtoint/inttoptr.

-Eli

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: GEP vs IntToPtr/PtrToInt

Jianzhou Zhao
In reply to this post by Eli Friedman-2
I have a question about when we should apply these pointer aliasing
rules. Do the rules tell us when a load/store is safe?
"Any memory access must be done through a pointer value associated
with an address range of the memory access, otherwise the behavior is
undefined."

So this means the conversion discussed here is still safe in terms of
memory safety, but its meaning after conversion could be weird. Am I
correct?

Then it comes to my another question. The base-on relation has this rule:
"A pointer value formed by an inttoptr is based on all pointer values
that contribute (directly or indirectly) to the computation of the
pointer's value."

Suppose an int value 'i'  is computed by a lot of int variables that
are converted from ptr (p1,p2...pn) by ptrtoint, then if we inttoptr i
to a point p, how should I decide which pointer value the 'p' forms?

If those p_j are ptrtoint to a i_j, and the computation for i is i =
i_0 + i_1 + ... i_n, does it mean
  we can take either p_j as a base pointer, and other int variables
its offset, say we take p_2 as the base pointer, and the p from i
points to
       p_2 + (i_0 + i_1 + i_3 + .. i_n)
  ?

So in the transformation example, the result is different when we take
%196 or %193 as a base pointer.

For alias-analysis, we may say the p can point to a memory any of the
p_j points to. But if we consider memory safety, should we say p is
safe to access if p is not out-of-bound no matter which p_j is taken
as a base pointer? Could anyone explain this rule more precisely? For
example, how can we find "
all pointer values that contribute (directly or indirectly)" ?

This would be helpful to understand
  http://llvm.org/docs/GetElementPtr.html#ptrdiff
http://llvm.org/docs/GetElementPtr.html#null
which suggest that we can do some 'wild' pointer arithmetic by
inttoptr and ptrtoint.

For example, given a pointer p, can we safely do?
   i = ptrtoint p;
   j = i + null;
   q = inttoptr j;
   v = load q;

Thanks a lot.

On Mon, Apr 4, 2011 at 9:34 PM, Eli Friedman <[hidden email]> wrote:

>
> On Mon, Apr 4, 2011 at 7:10 AM, John Criswell <[hidden email]> wrote:
> > On 4/4/2011 6:45 PM, Eli Friedman wrote:
> >>
> >> On Mon, Apr 4, 2011 at 5:02 PM, Arushi Aggarwal<[hidden email]>
> >>  wrote:
> >>>
> >>>> Hi,
> >>>> Is it correct to convert,
> >>>>   %196 = load i32* %195, align 8                  ;<i32>  [#uses=1]
> >>>>   %197 = zext i32 %196 to i64                     ;<i64>  [#uses=1]
> >>>>   %198 = ptrtoint i8* %193 to i64                 ;<i64>  [#uses=1]
> >>>>   %199 = add i64 %198, %197                       ;<i64>  [#uses=1]
> >>>>   %200 = inttoptr i64 %199 to i8*                 ;<i8*>  [#uses=1]
> >>>> into
> >>>> %200 = getelementptr %193, %196
> >>>> Reducing the unnecessary casts of converting to integers and then back?
> >>>> Thanks,
> >>>> Arushi
> >>>>
> >> See http://llvm.org/docs/LangRef.html#pointeraliasing ; it's not
> >> correct in general.  It is correct if %196 isn't dependent on the
> >> address of any memory object, though.
> >
> > Can you clarify why the transform isn't correct?  Is it because in the
> > original code, %200 is based on both the originally cast pointer (%193) and
> > the indexed offset from it (%197) while the transformed code is only based
> > on %193?
>
> Yes, exactly.
>
> -Eli
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



--
Jianzhou

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: GEP vs IntToPtr/PtrToInt

Eli Friedman-2
On Wed, Apr 20, 2011 at 8:08 AM, Jianzhou Zhao <[hidden email]> wrote:
> I have a question about when we should apply these pointer aliasing
> rules. Do the rules tell us when a load/store is safe?
> "Any memory access must be done through a pointer value associated
> with an address range of the memory access, otherwise the behavior is
> undefined."
>
> So this means the conversion discussed here is still safe in terms of
> memory safety, but its meaning after conversion could be weird. Am I
> correct?

Per http://llvm.org/docs/LangRef.html#pointeraliasing, it's undefined
behavior, so it isn't safe in any sense.  In practice, I can't think
of a common transformation that would cause a crash, but it's best not
to depend on that.

> Then it comes to my another question. The base-on relation has this rule:
> "A pointer value formed by an inttoptr is based on all pointer values
> that contribute (directly or indirectly) to the computation of the
> pointer's value."
>
> Suppose an int value 'i'  is computed by a lot of int variables that
> are converted from ptr (p1,p2...pn) by ptrtoint, then if we inttoptr i
> to a point p, how should I decide which pointer value the 'p' forms?
>
> If those p_j are ptrtoint to a i_j, and the computation for i is i =
> i_0 + i_1 + ... i_n, does it mean
>  we can take either p_j as a base pointer, and other int variables
> its offset, say we take p_2 as the base pointer, and the p from i
> points to
>       p_2 + (i_0 + i_1 + i_3 + .. i_n)
>  ?
>
> So in the transformation example, the result is different when we take
> %196 or %193 as a base pointer.
>
> For alias-analysis, we may say the p can point to a memory any of the
> p_j points to. But if we consider memory safety, should we say p is
> safe to access if p is not out-of-bound no matter which p_j is taken
> as a base pointer?

See above.

> Could anyone explain this rule more precisely? For
> example, how can we find "
> all pointer values that contribute (directly or indirectly)" ?

There isn't any straightforward way to calculate that set.  Another
way of stating the rule is that if changing the numerical value of the
address of some object might change the calculated value of the
operand of an inttoptr, it's one of the "pointer values that
contribute".  It's intentionally defined a bit loosely because there's
a lot of different ways for that to be the case.  You can extract
information about a pointer by a inttoptr, a load of part or all of
the address from memory, pointer comparisons, and possibly some other
ways I'm not thinking of.

> This would be helpful to understand
>  http://llvm.org/docs/GetElementPtr.html#ptrdiff
> http://llvm.org/docs/GetElementPtr.html#null
> which suggest that we can do some 'wild' pointer arithmetic by
> inttoptr and ptrtoint.
>
> For example, given a pointer p, can we safely do?
>   i = ptrtoint p;
>   j = i + null;
>   q = inttoptr j;
>   v = load q;
>
> Thanks a lot.

inttoptr(ptrtoint(x)) is just x; inttoptr(ptrtoint(x+10)) can be
safely translated to gep i8* x, 10.  Translating
inttoptr(ptrtoint(x+y)) to gep i8* x, y is not safe in general.

-Eli

> On Mon, Apr 4, 2011 at 9:34 PM, Eli Friedman <[hidden email]> wrote:
>>
>> On Mon, Apr 4, 2011 at 7:10 AM, John Criswell <[hidden email]> wrote:
>> > On 4/4/2011 6:45 PM, Eli Friedman wrote:
>> >>
>> >> On Mon, Apr 4, 2011 at 5:02 PM, Arushi Aggarwal<[hidden email]>
>> >>  wrote:
>> >>>
>> >>>> Hi,
>> >>>> Is it correct to convert,
>> >>>>   %196 = load i32* %195, align 8                  ;<i32>  [#uses=1]
>> >>>>   %197 = zext i32 %196 to i64                     ;<i64>  [#uses=1]
>> >>>>   %198 = ptrtoint i8* %193 to i64                 ;<i64>  [#uses=1]
>> >>>>   %199 = add i64 %198, %197                       ;<i64>  [#uses=1]
>> >>>>   %200 = inttoptr i64 %199 to i8*                 ;<i8*>  [#uses=1]
>> >>>> into
>> >>>> %200 = getelementptr %193, %196
>> >>>> Reducing the unnecessary casts of converting to integers and then back?
>> >>>> Thanks,
>> >>>> Arushi
>> >>>>
>> >> See http://llvm.org/docs/LangRef.html#pointeraliasing ; it's not
>> >> correct in general.  It is correct if %196 isn't dependent on the
>> >> address of any memory object, though.
>> >
>> > Can you clarify why the transform isn't correct?  Is it because in the
>> > original code, %200 is based on both the originally cast pointer (%193) and
>> > the indexed offset from it (%197) while the transformed code is only based
>> > on %193?
>>
>> Yes, exactly.
>>
>> -Eli
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> --
> Jianzhou
>

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: GEP vs IntToPtr/PtrToInt

John Criswell-4
In reply to this post by Jianzhou Zhao
On 4/20/11 10:08 AM, Jianzhou Zhao wrote:
> I have a question about when we should apply these pointer aliasing
> rules. Do the rules tell us when a load/store is safe?
> "Any memory access must be done through a pointer value associated
> with an address range of the memory access, otherwise the behavior is
> undefined."

I don't think the pointer aliasing rules indicate when a memory access
is safe.  Rather, they set down rules for what the compiler can consider
to be defined and undefined behavior.  It lays down the law for what
optimizations are considered correct and which are not.

> So this means the conversion discussed here is still safe in terms of
> memory safety, but its meaning after conversion could be weird. Am I
> correct?

I am not sure what you mean.  However, if you're asking whether casting
a pointer to an integer and then casting the integer back to a pointer
is correct, I believe the answer is yes.  We certainly treat it that way
in SAFECode although in the current implementation, it can weaken the
safety guarantees.  Our points-to analysis, DSA, doesn't track pointers
through integers, and so SAFECode uses more lenient checks on pointer
values coming from inttoptr casts; DSA can't always guarantee that it
knows everything about the memory objects feeding into it.

That is, consequently, one of the reasons why we'd like to do Arushi's
transformation.  It will make DSA less conservative and SAFECode more
stringent.

> Then it comes to my another question. The base-on relation has this rule:
> "A pointer value formed by an inttoptr is based on all pointer values
> that contribute (directly or indirectly) to the computation of the
> pointer's value."
>
> Suppose an int value 'i'  is computed by a lot of int variables that
> are converted from ptr (p1,p2...pn) by ptrtoint, then if we inttoptr i
> to a point p, how should I decide which pointer value the 'p' forms?
>
> If those p_j are ptrtoint to a i_j, and the computation for i is i =
> i_0 + i_1 + ... i_n, does it mean
>    we can take either p_j as a base pointer, and other int variables
> its offset, say we take p_2 as the base pointer, and the p from i
> points to
>         p_2 + (i_0 + i_1 + i_3 + .. i_n)
>    ?

So, in your example, if you do:

i1 = ptrtoint p1;
i2 = ptrtoint p2;
...
in = ptrtoint pn;

i = i1 + i2 ... + in;
p = inttoptr i;

..., then p can point to any memory object p1, p2, ... pn.  The
reasoning is that the integer add instruction obscures which integer is
the base pointer and which is the index, so the aliasing rules
conservatively assume that either operand is the base pointer.

> So in the transformation example, the result is different when we take
> %196 or %193 as a base pointer.

Yes, which is why the transform that Arushi suggested is not legal
unless you can prove that %196 can't be a pointer to a memory object.

> For alias-analysis, we may say the p can point to a memory any of the
> p_j points to. But if we consider memory safety, should we say p is
> safe to access if p is not out-of-bound no matter which p_j is taken
> as a base pointer?

That is how I would interpret memory safety: p is safe if it is within
the bounds of any of the p_j memory objects.

>   Could anyone explain this rule more precisely? For
> example, how can we find "
> all pointer values that contribute (directly or indirectly)" ?

I think this can be conservatively done using simple data-flow
analysis.  The only tricky part is when a pointer travels through memory
(i.e., it is stored into memory by a store instruction and loaded later
by a load instruction).  An enhanced version of DSA which tracks
pointers through integers could handle this.

> This would be helpful to understand
>    http://llvm.org/docs/GetElementPtr.html#ptrdiff
> http://llvm.org/docs/GetElementPtr.html#null
> which suggest that we can do some 'wild' pointer arithmetic by
> inttoptr and ptrtoint.
>
> For example, given a pointer p, can we safely do?
>     i = ptrtoint p;
>     j = i + null;
>     q = inttoptr j;
>     v = load q;
>

That's a weird one (aside: you need to cast NULL to int first before
using it in the add).  Since NULL doesn't point to a valid memory range,
it may be that you can technically consider q to just point to p.  
However, I'm not sure about that; maybe q is technically aliased with
null and can point to some offset of NULL.

However, in practice, even if the aliasing rules say that q can point to
p or some offset of NULL, I would say that q points to just p since you
know (for most implementations) that NULL is equivalent to zero.

-- John T.

> Thanks a lot.
>
> On Mon, Apr 4, 2011 at 9:34 PM, Eli Friedman<[hidden email]>  wrote:
>> On Mon, Apr 4, 2011 at 7:10 AM, John Criswell<[hidden email]>  wrote:
>>> On 4/4/2011 6:45 PM, Eli Friedman wrote:
>>>> On Mon, Apr 4, 2011 at 5:02 PM, Arushi Aggarwal<[hidden email]>
>>>>   wrote:
>>>>>> Hi,
>>>>>> Is it correct to convert,
>>>>>>    %196 = load i32* %195, align 8                  ;<i32>    [#uses=1]
>>>>>>    %197 = zext i32 %196 to i64                     ;<i64>    [#uses=1]
>>>>>>    %198 = ptrtoint i8* %193 to i64                 ;<i64>    [#uses=1]
>>>>>>    %199 = add i64 %198, %197                       ;<i64>    [#uses=1]
>>>>>>    %200 = inttoptr i64 %199 to i8*                 ;<i8*>    [#uses=1]
>>>>>> into
>>>>>> %200 = getelementptr %193, %196
>>>>>> Reducing the unnecessary casts of converting to integers and then back?
>>>>>> Thanks,
>>>>>> Arushi
>>>>>>
>>>> See http://llvm.org/docs/LangRef.html#pointeraliasing ; it's not
>>>> correct in general.  It is correct if %196 isn't dependent on the
>>>> address of any memory object, though.
>>> Can you clarify why the transform isn't correct?  Is it because in the
>>> original code, %200 is based on both the originally cast pointer (%193) and
>>> the indexed offset from it (%197) while the transformed code is only based
>>> on %193?
>> Yes, exactly.
>>
>> -Eli
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
> --
> Jianzhou

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: GEP vs IntToPtr/PtrToInt

Jianzhou Zhao
In reply to this post by Eli Friedman-2
On Wed, Apr 20, 2011 at 12:20 PM, Eli Friedman <[hidden email]> wrote:

> On Wed, Apr 20, 2011 at 8:08 AM, Jianzhou Zhao <[hidden email]> wrote:
>> I have a question about when we should apply these pointer aliasing
>> rules. Do the rules tell us when a load/store is safe?
>> "Any memory access must be done through a pointer value associated
>> with an address range of the memory access, otherwise the behavior is
>> undefined."
>>
>> So this means the conversion discussed here is still safe in terms of
>> memory safety, but its meaning after conversion could be weird. Am I
>> correct?
>
> Per http://llvm.org/docs/LangRef.html#pointeraliasing, it's undefined
> behavior, so it isn't safe in any sense.  In practice, I can't think
> of a common transformation that would cause a crash, but it's best not
> to depend on that.

My confusion could be what is considered to be undefined from the
rules. It says a memory access is defined if
 "Any memory access must be done through a pointer value associated
with an address range of the memory access".

Does this implicitly mean that the value of the pointer must be within
the address range of the memory access it is associated with? It seems
to be true to me from the rules about global variables, alloca and
even external pointers.

For example
    %p = alloca i32;
    %q = getelementptr %p, i32 42;
    store i32 0, i32* %q;

Is this a fine memory access (although I don't think it is)? Here, %q
is based on %p, and %p is associated with the address range from
alloca i32. But the range of the result from alloca is definitely
smaller than 42. Since the LLVM IR does not state that load/store-ing
out-of-bound address is undefined
   http://llvm.org/docs/LangRef.html#i_load
   http://llvm.org/docs/LangRef.html#i_store
I looked into the alias-rule to find answers.

Now, come back to the inttoptr and ptrtoint questions. When we
consider a memory access via pointers from int is defined, do we mean
  1) the value of the pointer happens to equal to an address within a
range of an allocated object, or
  2) the value of the pointer happens to be based on some allocated
objects per these rules, but it is fine if it is out of their ranges
(I don' think this is true, but the rules do not explicitly tell me if
this is legal). Here, the intuitive meaning of based-on is like you
explained in the bellow.

But I still have some questions about the 'based-on' things. It seems
to state an aliasing relation between pointers. Then in the case if a
result inttoptr is based on some objects, why can we consider this to
be a good memory access? Because it is very likely a pointer points
some other allocated objects that we don't want them to be changed. So
this comes to my question --- what property does a defined
memory-access give use?

>
>> Then it comes to my another question. The base-on relation has this rule:
>> "A pointer value formed by an inttoptr is based on all pointer values
>> that contribute (directly or indirectly) to the computation of the
>> pointer's value."
>>
>> Suppose an int value 'i'  is computed by a lot of int variables that
>> are converted from ptr (p1,p2...pn) by ptrtoint, then if we inttoptr i
>> to a point p, how should I decide which pointer value the 'p' forms?
>>
>> If those p_j are ptrtoint to a i_j, and the computation for i is i =
>> i_0 + i_1 + ... i_n, does it mean
>>  we can take either p_j as a base pointer, and other int variables
>> its offset, say we take p_2 as the base pointer, and the p from i
>> points to
>>       p_2 + (i_0 + i_1 + i_3 + .. i_n)
>>  ?
>>
>> So in the transformation example, the result is different when we take
>> %196 or %193 as a base pointer.
>>
>> For alias-analysis, we may say the p can point to a memory any of the
>> p_j points to. But if we consider memory safety, should we say p is
>> safe to access if p is not out-of-bound no matter which p_j is taken
>> as a base pointer?
>
> See above.
>
>> Could anyone explain this rule more precisely? For
>> example, how can we find "
>> all pointer values that contribute (directly or indirectly)" ?
>
> There isn't any straightforward way to calculate that set.  Another
> way of stating the rule is that if changing the numerical value of the
> address of some object might change the calculated value of the
> operand of an inttoptr, it's one of the "pointer values that
> contribute".  It's intentionally defined a bit loosely because there's
> a lot of different ways for that to be the case.  You can extract
> information about a pointer by a inttoptr, a load of part or all of
> the address from memory, pointer comparisons, and possibly some other
> ways I'm not thinking of.
>
>> This would be helpful to understand
>>  http://llvm.org/docs/GetElementPtr.html#ptrdiff
>> http://llvm.org/docs/GetElementPtr.html#null
>> which suggest that we can do some 'wild' pointer arithmetic by
>> inttoptr and ptrtoint.
>>
>> For example, given a pointer p, can we safely do?
>>   i = ptrtoint p;
>>   j = i + null;
>>   q = inttoptr j;
>>   v = load q;
>>
>> Thanks a lot.
>
> inttoptr(ptrtoint(x)) is just x; inttoptr(ptrtoint(x+10)) can be
> safely translated to gep i8* x, 10.  Translating
> inttoptr(ptrtoint(x+y)) to gep i8* x, y is not safe in general.

While in http://llvm.org/docs/GetElementPtr.html#ptrdiff, the
difference between two pointers computed from GEP has to be a
variable, but not a constant, how could that work?

Also, given p1 and p2 from GEP, if we do
  i1 = ptrtoint p1;
  i2 = ptrtoint p2;
  i3 = i2 - i1;
  i3' = f (i3);       // suppose f is an identical function that
returns i3 directly.
  i4 = i3' + i1;
  p = inttoptr i4;
  .. = load p;      // is this load defined?

 http://llvm.org/docs/GetElementPtr.html#ptrdiff seems to say, we can
access out-of-bound memory via GEP, but it is safe to do that from
inttoptr or ptrtoint as long as the result points an allocated object.
Is this the right way to understand it ?

>
> -Eli
>
>> On Mon, Apr 4, 2011 at 9:34 PM, Eli Friedman <[hidden email]> wrote:
>>>
>>> On Mon, Apr 4, 2011 at 7:10 AM, John Criswell <[hidden email]> wrote:
>>> > On 4/4/2011 6:45 PM, Eli Friedman wrote:
>>> >>
>>> >> On Mon, Apr 4, 2011 at 5:02 PM, Arushi Aggarwal<[hidden email]>
>>> >>  wrote:
>>> >>>
>>> >>>> Hi,
>>> >>>> Is it correct to convert,
>>> >>>>   %196 = load i32* %195, align 8                  ;<i32>  [#uses=1]
>>> >>>>   %197 = zext i32 %196 to i64                     ;<i64>  [#uses=1]
>>> >>>>   %198 = ptrtoint i8* %193 to i64                 ;<i64>  [#uses=1]
>>> >>>>   %199 = add i64 %198, %197                       ;<i64>  [#uses=1]
>>> >>>>   %200 = inttoptr i64 %199 to i8*                 ;<i8*>  [#uses=1]
>>> >>>> into
>>> >>>> %200 = getelementptr %193, %196
>>> >>>> Reducing the unnecessary casts of converting to integers and then back?
>>> >>>> Thanks,
>>> >>>> Arushi
>>> >>>>
>>> >> See http://llvm.org/docs/LangRef.html#pointeraliasing ; it's not
>>> >> correct in general.  It is correct if %196 isn't dependent on the
>>> >> address of any memory object, though.
>>> >
>>> > Can you clarify why the transform isn't correct?  Is it because in the
>>> > original code, %200 is based on both the originally cast pointer (%193) and
>>> > the indexed offset from it (%197) while the transformed code is only based
>>> > on %193?
>>>
>>> Yes, exactly.
>>>
>>> -Eli
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> [hidden email]         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>>
>> --
>> Jianzhou
>>
>



--
Jianzhou

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: GEP vs IntToPtr/PtrToInt

Jianzhou Zhao
In reply to this post by John Criswell-4
On Wed, Apr 20, 2011 at 12:59 PM, John Criswell <[hidden email]> wrote:

> On 4/20/11 10:08 AM, Jianzhou Zhao wrote:
>>
>> I have a question about when we should apply these pointer aliasing
>> rules. Do the rules tell us when a load/store is safe?
>> "Any memory access must be done through a pointer value associated
>> with an address range of the memory access, otherwise the behavior is
>> undefined."
>
> I don't think the pointer aliasing rules indicate when a memory access is
> safe.  Rather, they set down rules for what the compiler can consider to be
> defined and undefined behavior.  It lays down the law for what optimizations
> are considered correct and which are not.

I see. The rules are the 'abstract' semantics used to check aliasing.
I looked into the section because LLVM IR does not say out-of-bound
load/store is  undefined. Is it because if or not such access is
defined depends on the semantics of the high-level language from which
the IR is compiled?

>
>> So this means the conversion discussed here is still safe in terms of
>> memory safety, but its meaning after conversion could be weird. Am I
>> correct?
>
> I am not sure what you mean.  However, if you're asking whether casting a
> pointer to an integer and then casting the integer back to a pointer is
> correct, I believe the answer is yes.  We certainly treat it that way in
> SAFECode although in the current implementation, it can weaken the safety
> guarantees.  Our points-to analysis, DSA, doesn't track pointers through
> integers, and so SAFECode uses more lenient checks on pointer values coming
> from inttoptr casts; DSA can't always guarantee that it knows everything
> about the memory objects feeding into it.

Yes. That is what I meant.

>
> That is, consequently, one of the reasons why we'd like to do Arushi's
> transformation.  It will make DSA less conservative and SAFECode more
> stringent.
>
>> Then it comes to my another question. The base-on relation has this rule:
>> "A pointer value formed by an inttoptr is based on all pointer values
>> that contribute (directly or indirectly) to the computation of the
>> pointer's value."
>>
>> Suppose an int value 'i'  is computed by a lot of int variables that
>> are converted from ptr (p1,p2...pn) by ptrtoint, then if we inttoptr i
>> to a point p, how should I decide which pointer value the 'p' forms?
>>
>> If those p_j are ptrtoint to a i_j, and the computation for i is i =
>> i_0 + i_1 + ... i_n, does it mean
>>   we can take either p_j as a base pointer, and other int variables
>> its offset, say we take p_2 as the base pointer, and the p from i
>> points to
>>        p_2 + (i_0 + i_1 + i_3 + .. i_n)
>>   ?
>
> So, in your example, if you do:
>
> i1 = ptrtoint p1;
> i2 = ptrtoint p2;
> ...
> in = ptrtoint pn;
>
> i = i1 + i2 ... + in;
> p = inttoptr i;
>
> ..., then p can point to any memory object p1, p2, ... pn.  The reasoning is
> that the integer add instruction obscures which integer is the base pointer
> and which is the index, so the aliasing rules conservatively assume that
> either operand is the base pointer.
>
>> So in the transformation example, the result is different when we take
>> %196 or %193 as a base pointer.
>
> Yes, which is why the transform that Arushi suggested is not legal unless
> you can prove that %196 can't be a pointer to a memory object.
>
>> For alias-analysis, we may say the p can point to a memory any of the
>> p_j points to. But if we consider memory safety, should we say p is
>> safe to access if p is not out-of-bound no matter which p_j is taken
>> as a base pointer?
>
> That is how I would interpret memory safety: p is safe if it is within the
> bounds of any of the p_j memory objects.
>
>>  Could anyone explain this rule more precisely? For
>> example, how can we find "
>> all pointer values that contribute (directly or indirectly)" ?
>
> I think this can be conservatively done using simple data-flow analysis.
>  The only tricky part is when a pointer travels through memory (i.e., it is
> stored into memory by a store instruction and loaded later by a load
> instruction).  An enhanced version of DSA which tracks pointers through
> integers could handle this.
>
>> This would be helpful to understand
>>   http://llvm.org/docs/GetElementPtr.html#ptrdiff
>> http://llvm.org/docs/GetElementPtr.html#null
>> which suggest that we can do some 'wild' pointer arithmetic by
>> inttoptr and ptrtoint.
>>
>> For example, given a pointer p, can we safely do?
>>    i = ptrtoint p;
>>    j = i + null;
>>    q = inttoptr j;
>>    v = load q;
>>
>
> That's a weird one (aside: you need to cast NULL to int first before using
> it in the add).  Since NULL doesn't point to a valid memory range, it may be
> that you can technically consider q to just point to p.  However, I'm not
> sure about that; maybe q is technically aliased with null and can point to
> some offset of NULL.
>
> However, in practice, even if the aliasing rules say that q can point to p
> or some offset of NULL, I would say that q points to just p since you know
> (for most implementations) that NULL is equivalent to zero.
>
> -- John T.
>
>> Thanks a lot.
>>
>> On Mon, Apr 4, 2011 at 9:34 PM, Eli Friedman<[hidden email]>
>>  wrote:
>>>
>>> On Mon, Apr 4, 2011 at 7:10 AM, John Criswell<[hidden email]>
>>>  wrote:
>>>>
>>>> On 4/4/2011 6:45 PM, Eli Friedman wrote:
>>>>>
>>>>> On Mon, Apr 4, 2011 at 5:02 PM, Arushi Aggarwal<[hidden email]>
>>>>>  wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>> Is it correct to convert,
>>>>>>>   %196 = load i32* %195, align 8                  ;<i32>    [#uses=1]
>>>>>>>   %197 = zext i32 %196 to i64                     ;<i64>    [#uses=1]
>>>>>>>   %198 = ptrtoint i8* %193 to i64                 ;<i64>    [#uses=1]
>>>>>>>   %199 = add i64 %198, %197                       ;<i64>    [#uses=1]
>>>>>>>   %200 = inttoptr i64 %199 to i8*                 ;<i8*>    [#uses=1]
>>>>>>> into
>>>>>>> %200 = getelementptr %193, %196
>>>>>>> Reducing the unnecessary casts of converting to integers and then
>>>>>>> back?
>>>>>>> Thanks,
>>>>>>> Arushi
>>>>>>>
>>>>> See http://llvm.org/docs/LangRef.html#pointeraliasing ; it's not
>>>>> correct in general.  It is correct if %196 isn't dependent on the
>>>>> address of any memory object, though.
>>>>
>>>> Can you clarify why the transform isn't correct?  Is it because in the
>>>> original code, %200 is based on both the originally cast pointer (%193)
>>>> and
>>>> the indexed offset from it (%197) while the transformed code is only
>>>> based
>>>> on %193?
>>>
>>> Yes, exactly.
>>>
>>> -Eli
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> [hidden email]         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>> --
>> Jianzhou
>
>



--
Jianzhou

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: GEP vs IntToPtr/PtrToInt

Eli Friedman-2
In reply to this post by Jianzhou Zhao
On Wed, Apr 20, 2011 at 10:21 AM, Jianzhou Zhao <[hidden email]> wrote:

> On Wed, Apr 20, 2011 at 12:20 PM, Eli Friedman <[hidden email]> wrote:
>> On Wed, Apr 20, 2011 at 8:08 AM, Jianzhou Zhao <[hidden email]> wrote:
>>> I have a question about when we should apply these pointer aliasing
>>> rules. Do the rules tell us when a load/store is safe?
>>> "Any memory access must be done through a pointer value associated
>>> with an address range of the memory access, otherwise the behavior is
>>> undefined."
>>>
>>> So this means the conversion discussed here is still safe in terms of
>>> memory safety, but its meaning after conversion could be weird. Am I
>>> correct?
>>
>> Per http://llvm.org/docs/LangRef.html#pointeraliasing, it's undefined
>> behavior, so it isn't safe in any sense.  In practice, I can't think
>> of a common transformation that would cause a crash, but it's best not
>> to depend on that.
>
> My confusion could be what is considered to be undefined from the
> rules. It says a memory access is defined if
>  "Any memory access must be done through a pointer value associated
> with an address range of the memory access".
>
> Does this implicitly mean that the value of the pointer must be within
> the address range of the memory access it is associated with? It seems
> to be true to me from the rules about global variables, alloca and
> even external pointers.
>
> For example
>    %p = alloca i32;
>    %q = getelementptr %p, i32 42;
>    store i32 0, i32* %q;
>
> Is this a fine memory access (although I don't think it is)? Here, %q
> is based on %p, and %p is associated with the address range from
> alloca i32. But the range of the result from alloca is definitely
> smaller than 42. Since the LLVM IR does not state that load/store-ing
> out-of-bound address is undefined
>   http://llvm.org/docs/LangRef.html#i_load
>   http://llvm.org/docs/LangRef.html#i_store
> I looked into the alias-rule to find answers.

That doesn't really have anything to do with aliasing, but it's
definitely undefined.  Don't know off the top of my head where that is
stated in LangRef.

> Now, come back to the inttoptr and ptrtoint questions. When we
> consider a memory access via pointers from int is defined, do we mean
>  1) the value of the pointer happens to equal to an address within a
> range of an allocated object, or
>  2) the value of the pointer happens to be based on some allocated
> objects per these rules, but it is fine if it is out of their ranges
> (I don' think this is true, but the rules do not explicitly tell me if
> this is legal). Here, the intuitive meaning of based-on is like you
> explained in the bellow.
>
> But I still have some questions about the 'based-on' things. It seems
> to state an aliasing relation between pointers. Then in the case if a
> result inttoptr is based on some objects, why can we consider this to
> be a good memory access? Because it is very likely a pointer points
> some other allocated objects that we don't want them to be changed. So
> this comes to my question --- what property does a defined
> memory-access give use?

A properly-defined memory access is fully within the bounds of some
defined object, and "based" (in the LangRef.html#pointeraliasing
sense) on that object.

>>
>>> Then it comes to my another question. The base-on relation has this rule:
>>> "A pointer value formed by an inttoptr is based on all pointer values
>>> that contribute (directly or indirectly) to the computation of the
>>> pointer's value."
>>>
>>> Suppose an int value 'i'  is computed by a lot of int variables that
>>> are converted from ptr (p1,p2...pn) by ptrtoint, then if we inttoptr i
>>> to a point p, how should I decide which pointer value the 'p' forms?
>>>
>>> If those p_j are ptrtoint to a i_j, and the computation for i is i =
>>> i_0 + i_1 + ... i_n, does it mean
>>>  we can take either p_j as a base pointer, and other int variables
>>> its offset, say we take p_2 as the base pointer, and the p from i
>>> points to
>>>       p_2 + (i_0 + i_1 + i_3 + .. i_n)
>>>  ?
>>>
>>> So in the transformation example, the result is different when we take
>>> %196 or %193 as a base pointer.
>>>
>>> For alias-analysis, we may say the p can point to a memory any of the
>>> p_j points to. But if we consider memory safety, should we say p is
>>> safe to access if p is not out-of-bound no matter which p_j is taken
>>> as a base pointer?
>>
>> See above.
>>
>>> Could anyone explain this rule more precisely? For
>>> example, how can we find "
>>> all pointer values that contribute (directly or indirectly)" ?
>>
>> There isn't any straightforward way to calculate that set.  Another
>> way of stating the rule is that if changing the numerical value of the
>> address of some object might change the calculated value of the
>> operand of an inttoptr, it's one of the "pointer values that
>> contribute".  It's intentionally defined a bit loosely because there's
>> a lot of different ways for that to be the case.  You can extract
>> information about a pointer by a inttoptr, a load of part or all of
>> the address from memory, pointer comparisons, and possibly some other
>> ways I'm not thinking of.
>>
>>> This would be helpful to understand
>>>  http://llvm.org/docs/GetElementPtr.html#ptrdiff
>>> http://llvm.org/docs/GetElementPtr.html#null
>>> which suggest that we can do some 'wild' pointer arithmetic by
>>> inttoptr and ptrtoint.
>>>
>>> For example, given a pointer p, can we safely do?
>>>   i = ptrtoint p;
>>>   j = i + null;
>>>   q = inttoptr j;
>>>   v = load q;
>>>
>>> Thanks a lot.
>>
>> inttoptr(ptrtoint(x)) is just x; inttoptr(ptrtoint(x+10)) can be
>> safely translated to gep i8* x, 10.  Translating
>> inttoptr(ptrtoint(x+y)) to gep i8* x, y is not safe in general.
>
> While in http://llvm.org/docs/GetElementPtr.html#ptrdiff, the
> difference between two pointers computed from GEP has to be a
> variable, but not a constant, how could that work?

In my example I was assuming "y" was some unknown value.  I'm not sure
what you're asking here.

> Also, given p1 and p2 from GEP, if we do
>  i1 = ptrtoint p1;
>  i2 = ptrtoint p2;
>  i3 = i2 - i1;
>  i3' = f (i3);       // suppose f is an identical function that
> returns i3 directly.
>  i4 = i3' + i1;
>  p = inttoptr i4;
>  .. = load p;      // is this load defined?
>
>  http://llvm.org/docs/GetElementPtr.html#ptrdiff seems to say, we can
> access out-of-bound memory via GEP, but it is safe to do that from
> inttoptr or ptrtoint as long as the result points an allocated object.
> Is this the right way to understand it ?

I assume this is supposed to be "we cannot access out-of-bounds memory via GEP".

The load in the given code is well-defined, and equivalent to a load
directly from p2.  The issue with translating i3' + i1 into gep p1,
i3' is that you end up with a load from a pointer into p2 that is not
"based" on p2.

-Eli

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: GEP vs IntToPtr/PtrToInt

Jianzhou Zhao
On Wed, Apr 20, 2011 at 2:11 PM, Eli Friedman <[hidden email]> wrote:

> On Wed, Apr 20, 2011 at 10:21 AM, Jianzhou Zhao <[hidden email]> wrote:
>> On Wed, Apr 20, 2011 at 12:20 PM, Eli Friedman <[hidden email]> wrote:
>>> On Wed, Apr 20, 2011 at 8:08 AM, Jianzhou Zhao <[hidden email]> wrote:
>>>> I have a question about when we should apply these pointer aliasing
>>>> rules. Do the rules tell us when a load/store is safe?
>>>> "Any memory access must be done through a pointer value associated
>>>> with an address range of the memory access, otherwise the behavior is
>>>> undefined."
>>>>
>>>> So this means the conversion discussed here is still safe in terms of
>>>> memory safety, but its meaning after conversion could be weird. Am I
>>>> correct?
>>>
>>> Per http://llvm.org/docs/LangRef.html#pointeraliasing, it's undefined
>>> behavior, so it isn't safe in any sense.  In practice, I can't think
>>> of a common transformation that would cause a crash, but it's best not
>>> to depend on that.
>>
>> My confusion could be what is considered to be undefined from the
>> rules. It says a memory access is defined if
>>  "Any memory access must be done through a pointer value associated
>> with an address range of the memory access".
>>
>> Does this implicitly mean that the value of the pointer must be within
>> the address range of the memory access it is associated with? It seems
>> to be true to me from the rules about global variables, alloca and
>> even external pointers.
>>
>> For example
>>    %p = alloca i32;
>>    %q = getelementptr %p, i32 42;
>>    store i32 0, i32* %q;
>>
>> Is this a fine memory access (although I don't think it is)? Here, %q
>> is based on %p, and %p is associated with the address range from
>> alloca i32. But the range of the result from alloca is definitely
>> smaller than 42. Since the LLVM IR does not state that load/store-ing
>> out-of-bound address is undefined
>>   http://llvm.org/docs/LangRef.html#i_load
>>   http://llvm.org/docs/LangRef.html#i_store
>> I looked into the alias-rule to find answers.
>
> That doesn't really have anything to do with aliasing, but it's
> definitely undefined.  Don't know off the top of my head where that is
> stated in LangRef.
>
>> Now, come back to the inttoptr and ptrtoint questions. When we
>> consider a memory access via pointers from int is defined, do we mean
>>  1) the value of the pointer happens to equal to an address within a
>> range of an allocated object, or
>>  2) the value of the pointer happens to be based on some allocated
>> objects per these rules, but it is fine if it is out of their ranges
>> (I don' think this is true, but the rules do not explicitly tell me if
>> this is legal). Here, the intuitive meaning of based-on is like you
>> explained in the bellow.
>>
>> But I still have some questions about the 'based-on' things. It seems
>> to state an aliasing relation between pointers. Then in the case if a
>> result inttoptr is based on some objects, why can we consider this to
>> be a good memory access? Because it is very likely a pointer points
>> some other allocated objects that we don't want them to be changed. So
>> this comes to my question --- what property does a defined
>> memory-access give use?
>
> A properly-defined memory access is fully within the bounds of some
> defined object, and "based" (in the LangRef.html#pointeraliasing
> sense) on that object.
>
>>>
>>>> Then it comes to my another question. The base-on relation has this rule:
>>>> "A pointer value formed by an inttoptr is based on all pointer values
>>>> that contribute (directly or indirectly) to the computation of the
>>>> pointer's value."
>>>>
>>>> Suppose an int value 'i'  is computed by a lot of int variables that
>>>> are converted from ptr (p1,p2...pn) by ptrtoint, then if we inttoptr i
>>>> to a point p, how should I decide which pointer value the 'p' forms?
>>>>
>>>> If those p_j are ptrtoint to a i_j, and the computation for i is i =
>>>> i_0 + i_1 + ... i_n, does it mean
>>>>  we can take either p_j as a base pointer, and other int variables
>>>> its offset, say we take p_2 as the base pointer, and the p from i
>>>> points to
>>>>       p_2 + (i_0 + i_1 + i_3 + .. i_n)
>>>>  ?
>>>>
>>>> So in the transformation example, the result is different when we take
>>>> %196 or %193 as a base pointer.
>>>>
>>>> For alias-analysis, we may say the p can point to a memory any of the
>>>> p_j points to. But if we consider memory safety, should we say p is
>>>> safe to access if p is not out-of-bound no matter which p_j is taken
>>>> as a base pointer?
>>>
>>> See above.
>>>
>>>> Could anyone explain this rule more precisely? For
>>>> example, how can we find "
>>>> all pointer values that contribute (directly or indirectly)" ?
>>>
>>> There isn't any straightforward way to calculate that set.  Another
>>> way of stating the rule is that if changing the numerical value of the
>>> address of some object might change the calculated value of the
>>> operand of an inttoptr, it's one of the "pointer values that
>>> contribute".  It's intentionally defined a bit loosely because there's
>>> a lot of different ways for that to be the case.  You can extract
>>> information about a pointer by a inttoptr, a load of part or all of
>>> the address from memory, pointer comparisons, and possibly some other
>>> ways I'm not thinking of.
>>>
>>>> This would be helpful to understand
>>>>  http://llvm.org/docs/GetElementPtr.html#ptrdiff
>>>> http://llvm.org/docs/GetElementPtr.html#null
>>>> which suggest that we can do some 'wild' pointer arithmetic by
>>>> inttoptr and ptrtoint.
>>>>
>>>> For example, given a pointer p, can we safely do?
>>>>   i = ptrtoint p;
>>>>   j = i + null;
>>>>   q = inttoptr j;
>>>>   v = load q;
>>>>
>>>> Thanks a lot.
>>>
>>> inttoptr(ptrtoint(x)) is just x; inttoptr(ptrtoint(x+10)) can be
>>> safely translated to gep i8* x, 10.  Translating
>>> inttoptr(ptrtoint(x+y)) to gep i8* x, y is not safe in general.
>>
>> While in http://llvm.org/docs/GetElementPtr.html#ptrdiff, the
>> difference between two pointers computed from GEP has to be a
>> variable, but not a constant, how could that work?
>
> In my example I was assuming "y" was some unknown value.  I'm not sure
> what you're asking here.
>
>> Also, given p1 and p2 from GEP, if we do
>>  i1 = ptrtoint p1;
>>  i2 = ptrtoint p2;
>>  i3 = i2 - i1;
>>  i3' = f (i3);       // suppose f is an identical function that
>> returns i3 directly.
>>  i4 = i3' + i1;
>>  p = inttoptr i4;
>>  .. = load p;      // is this load defined?
>>
>>  http://llvm.org/docs/GetElementPtr.html#ptrdiff seems to say, we can
>> access out-of-bound memory via GEP, but it is safe to do that from
>> inttoptr or ptrtoint as long as the result points an allocated object.
>> Is this the right way to understand it ?
>
> I assume this is supposed to be "we cannot access out-of-bounds memory via GEP".

Yes.

>
> The load in the given code is well-defined, and equivalent to a load
> directly from p2.  The issue with translating i3' + i1 into gep p1,
> i3' is that you end up with a load from a pointer into p2 that is not
> "based" on p2.

Is it supposed to be "end up with a load from a pointer into p2 that
is "based" on p2"? Otherwise this is not well-defined. Because

" A properly-defined memory access is fully within the bounds of some
 defined object, and "based" (in the LangRef.html#pointeraliasing
 sense) on that object."

I think p2 contributes to the computation of p, because changing the
value at p2 affects the value at p. By the intuitive meaning of
base-on you stated, p1 also contributes if we analyze this by a coarse
alias-analysis.

>
> -Eli
>



--
Jianzhou

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: GEP vs IntToPtr/PtrToInt

Eli Friedman-2
On Wed, Apr 20, 2011 at 1:30 PM, Jianzhou Zhao <[hidden email]> wrote:

> On Wed, Apr 20, 2011 at 2:11 PM, Eli Friedman <[hidden email]> wrote:
>> On Wed, Apr 20, 2011 at 10:21 AM, Jianzhou Zhao <[hidden email]> wrote:
>>> On Wed, Apr 20, 2011 at 12:20 PM, Eli Friedman <[hidden email]> wrote:
>>>> On Wed, Apr 20, 2011 at 8:08 AM, Jianzhou Zhao <[hidden email]> wrote:
>>>>> I have a question about when we should apply these pointer aliasing
>>>>> rules. Do the rules tell us when a load/store is safe?
>>>>> "Any memory access must be done through a pointer value associated
>>>>> with an address range of the memory access, otherwise the behavior is
>>>>> undefined."
>>>>>
>>>>> So this means the conversion discussed here is still safe in terms of
>>>>> memory safety, but its meaning after conversion could be weird. Am I
>>>>> correct?
>>>>
>>>> Per http://llvm.org/docs/LangRef.html#pointeraliasing, it's undefined
>>>> behavior, so it isn't safe in any sense.  In practice, I can't think
>>>> of a common transformation that would cause a crash, but it's best not
>>>> to depend on that.
>>>
>>> My confusion could be what is considered to be undefined from the
>>> rules. It says a memory access is defined if
>>>  "Any memory access must be done through a pointer value associated
>>> with an address range of the memory access".
>>>
>>> Does this implicitly mean that the value of the pointer must be within
>>> the address range of the memory access it is associated with? It seems
>>> to be true to me from the rules about global variables, alloca and
>>> even external pointers.
>>>
>>> For example
>>>    %p = alloca i32;
>>>    %q = getelementptr %p, i32 42;
>>>    store i32 0, i32* %q;
>>>
>>> Is this a fine memory access (although I don't think it is)? Here, %q
>>> is based on %p, and %p is associated with the address range from
>>> alloca i32. But the range of the result from alloca is definitely
>>> smaller than 42. Since the LLVM IR does not state that load/store-ing
>>> out-of-bound address is undefined
>>>   http://llvm.org/docs/LangRef.html#i_load
>>>   http://llvm.org/docs/LangRef.html#i_store
>>> I looked into the alias-rule to find answers.
>>
>> That doesn't really have anything to do with aliasing, but it's
>> definitely undefined.  Don't know off the top of my head where that is
>> stated in LangRef.
>>
>>> Now, come back to the inttoptr and ptrtoint questions. When we
>>> consider a memory access via pointers from int is defined, do we mean
>>>  1) the value of the pointer happens to equal to an address within a
>>> range of an allocated object, or
>>>  2) the value of the pointer happens to be based on some allocated
>>> objects per these rules, but it is fine if it is out of their ranges
>>> (I don' think this is true, but the rules do not explicitly tell me if
>>> this is legal). Here, the intuitive meaning of based-on is like you
>>> explained in the bellow.
>>>
>>> But I still have some questions about the 'based-on' things. It seems
>>> to state an aliasing relation between pointers. Then in the case if a
>>> result inttoptr is based on some objects, why can we consider this to
>>> be a good memory access? Because it is very likely a pointer points
>>> some other allocated objects that we don't want them to be changed. So
>>> this comes to my question --- what property does a defined
>>> memory-access give use?
>>
>> A properly-defined memory access is fully within the bounds of some
>> defined object, and "based" (in the LangRef.html#pointeraliasing
>> sense) on that object.
>>
>>>>
>>>>> Then it comes to my another question. The base-on relation has this rule:
>>>>> "A pointer value formed by an inttoptr is based on all pointer values
>>>>> that contribute (directly or indirectly) to the computation of the
>>>>> pointer's value."
>>>>>
>>>>> Suppose an int value 'i'  is computed by a lot of int variables that
>>>>> are converted from ptr (p1,p2...pn) by ptrtoint, then if we inttoptr i
>>>>> to a point p, how should I decide which pointer value the 'p' forms?
>>>>>
>>>>> If those p_j are ptrtoint to a i_j, and the computation for i is i =
>>>>> i_0 + i_1 + ... i_n, does it mean
>>>>>  we can take either p_j as a base pointer, and other int variables
>>>>> its offset, say we take p_2 as the base pointer, and the p from i
>>>>> points to
>>>>>       p_2 + (i_0 + i_1 + i_3 + .. i_n)
>>>>>  ?
>>>>>
>>>>> So in the transformation example, the result is different when we take
>>>>> %196 or %193 as a base pointer.
>>>>>
>>>>> For alias-analysis, we may say the p can point to a memory any of the
>>>>> p_j points to. But if we consider memory safety, should we say p is
>>>>> safe to access if p is not out-of-bound no matter which p_j is taken
>>>>> as a base pointer?
>>>>
>>>> See above.
>>>>
>>>>> Could anyone explain this rule more precisely? For
>>>>> example, how can we find "
>>>>> all pointer values that contribute (directly or indirectly)" ?
>>>>
>>>> There isn't any straightforward way to calculate that set.  Another
>>>> way of stating the rule is that if changing the numerical value of the
>>>> address of some object might change the calculated value of the
>>>> operand of an inttoptr, it's one of the "pointer values that
>>>> contribute".  It's intentionally defined a bit loosely because there's
>>>> a lot of different ways for that to be the case.  You can extract
>>>> information about a pointer by a inttoptr, a load of part or all of
>>>> the address from memory, pointer comparisons, and possibly some other
>>>> ways I'm not thinking of.
>>>>
>>>>> This would be helpful to understand
>>>>>  http://llvm.org/docs/GetElementPtr.html#ptrdiff
>>>>> http://llvm.org/docs/GetElementPtr.html#null
>>>>> which suggest that we can do some 'wild' pointer arithmetic by
>>>>> inttoptr and ptrtoint.
>>>>>
>>>>> For example, given a pointer p, can we safely do?
>>>>>   i = ptrtoint p;
>>>>>   j = i + null;
>>>>>   q = inttoptr j;
>>>>>   v = load q;
>>>>>
>>>>> Thanks a lot.
>>>>
>>>> inttoptr(ptrtoint(x)) is just x; inttoptr(ptrtoint(x+10)) can be
>>>> safely translated to gep i8* x, 10.  Translating
>>>> inttoptr(ptrtoint(x+y)) to gep i8* x, y is not safe in general.
>>>
>>> While in http://llvm.org/docs/GetElementPtr.html#ptrdiff, the
>>> difference between two pointers computed from GEP has to be a
>>> variable, but not a constant, how could that work?
>>
>> In my example I was assuming "y" was some unknown value.  I'm not sure
>> what you're asking here.
>>
>>> Also, given p1 and p2 from GEP, if we do
>>>  i1 = ptrtoint p1;
>>>  i2 = ptrtoint p2;
>>>  i3 = i2 - i1;
>>>  i3' = f (i3);       // suppose f is an identical function that
>>> returns i3 directly.
>>>  i4 = i3' + i1;
>>>  p = inttoptr i4;
>>>  .. = load p;      // is this load defined?
>>>
>>>  http://llvm.org/docs/GetElementPtr.html#ptrdiff seems to say, we can
>>> access out-of-bound memory via GEP, but it is safe to do that from
>>> inttoptr or ptrtoint as long as the result points an allocated object.
>>> Is this the right way to understand it ?
>>
>> I assume this is supposed to be "we cannot access out-of-bounds memory via GEP".
>
> Yes.
>
>>
>> The load in the given code is well-defined, and equivalent to a load
>> directly from p2.  The issue with translating i3' + i1 into gep p1,
>> i3' is that you end up with a load from a pointer into p2 that is not
>> "based" on p2.
>
> Is it supposed to be "end up with a load from a pointer into p2 that
> is "based" on p2"? Otherwise this is not well-defined. Because
>
> " A properly-defined memory access is fully within the bounds of some
>  defined object, and "based" (in the LangRef.html#pointeraliasing
>  sense) on that object."
>
> I think p2 contributes to the computation of p, because changing the
> value at p2 affects the value at p. By the intuitive meaning of
> base-on you stated, p1 also contributes if we analyze this by a coarse
> alias-analysis.

Let me try restating.  The given code is legal.

If you tried to transform the given code to do "p = gep p2, i3'", it
would not be legal.

-Eli

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev