[llvm-dev] LoopVectorize fails to vectorize loops with induction variables with PtrToInt/IntToPtr conversions

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] LoopVectorize fails to vectorize loops with induction variables with PtrToInt/IntToPtr conversions

ORiordan, Martin via llvm-dev
Hello all,

There is a missing vectorization opportunity issue with clang 4.0 with
the file attached.

Indeed, when compiled with -O2, the "op_distance" function get
vectorized, but not the "op" one.

For information, this test case has been reduced from a file generated
by the Pythran compiler (https://github.com/serge-sans-paille/pythran).

If we take a look at the generated IR without vectorization (using the
-fno-vectorize clang flag), we get:

> $ clang -O2 -S -emit-llvm op_zip_iterator.cpp -std=c++11 -o - -fno-vectorize

> ; Function Attrs: norecurse uwtable
> define void @_Z11op_distancePi16add_zip_iteratorS0_(i32* nocapture, i32*, i32* nocapture readonly, i32*, i32* nocapture readnone) local_unnamed_addr #0 {
> ; This one is vectorized!
>   %6 = ptrtoint i32* %1 to i64
>   %7 = ptrtoint i32* %3 to i64
>   %8 = sub i64 %7, %6
>   %9 = icmp sgt i64 %8, 0
>   br i1 %9, label %10, label %26
>
> ; <label>:10:                                     ; preds = %5
>   %11 = lshr exact i64 %8, 2
>   br label %12
>
> ; <label>:12:                                     ; preds = %12, %10
>   %13 = phi i64 [ %23, %12 ], [ %11, %10 ]
>   %14 = phi i32* [ %22, %12 ], [ %0, %10 ]
>   %15 = phi i32* [ %21, %12 ], [ %2, %10 ]
>   %16 = phi i32* [ %20, %12 ], [ %1, %10 ]
>   %17 = load i32, i32* %16, align 4, !tbaa !1
>   %18 = load i32, i32* %15, align 4, !tbaa !1
>   %19 = add nsw i32 %18, %17
>   store i32 %19, i32* %14, align 4, !tbaa !1
>   %20 = getelementptr inbounds i32, i32* %16, i64 1
>   %21 = getelementptr inbounds i32, i32* %15, i64 1
>   %22 = getelementptr inbounds i32, i32* %14, i64 1
>   %23 = add nsw i64 %13, -1
>   %24 = icmp sgt i64 %13, 1
>   br i1 %24, label %12, label %25
>
> ; <label>:25:                                     ; preds = %12
>   br label %26
>
> ; <label>:26:                                     ; preds = %25, %5
>   ret void
> }
>
> ; Function Attrs: norecurse uwtable
> define void @_Z2opPi16add_zip_iteratorS0_(i32* nocapture, i32*, i32* nocapture readonly, i32*, i32* nocapture readnone) local_unnamed_addr #0 {
> ; This one isn't!
>   %6 = ptrtoint i32* %1 to i64
>   %7 = ptrtoint i32* %3 to i64
>   %8 = sub i64 %6, %7
>   %9 = icmp sgt i64 %8, 0
>   br i1 %9, label %10, label %28
>
> ; <label>:10:                                     ; preds = %5
>   %11 = lshr exact i64 %8, 2
>   br label %12
>
> ; <label>:12:                                     ; preds = %12, %10
>   %13 = phi i64 [ %25, %12 ], [ %11, %10 ]
>   %14 = phi i32* [ %24, %12 ], [ %0, %10 ]
>   %15 = phi i32* [ %23, %12 ], [ %2, %10 ]
>   %16 = phi i64 [ %22, %12 ], [ %6, %10 ]
>   %17 = inttoptr i64 %16 to i32*
>   %18 = load i32, i32* %17, align 4, !tbaa !1
>   %19 = load i32, i32* %15, align 4, !tbaa !1
>   %20 = add nsw i32 %19, %18
>   store i32 %20, i32* %14, align 4, !tbaa !1
>   %21 = getelementptr inbounds i32, i32* %17, i64 1
>   %22 = ptrtoint i32* %21 to i64
>   %23 = getelementptr inbounds i32, i32* %15, i64 1
>   %24 = getelementptr inbounds i32, i32* %14, i64 1
>   %25 = add nsw i64 %13, -1
>   %26 = icmp sgt i64 %13, 1
>   br i1 %26, label %12, label %27
>
> ; <label>:27:                                     ; preds = %12
>   br label %28
>
> ; <label>:28:                                     ; preds = %27, %5
>   ret void
> }
If we compile only the "op" function while activation the debug mode,
here is the output:

> $ clang -O2 -S -emit-llvm op_zip_iterator.cpp -std=c++11 -o - -fno-vectorize |~/dev/epona-llvm/build_debug_shared/bin/opt -debug -debug-only loop-vectorize -O2 -S
>
> LV: Checking a loop in "_Z2opPi16add_zip_iteratorS0_" from <stdin>
> LV: Loop hints: force=? width=0 unroll=0
> LV: Found a loop:
> LV: Found an induction variable.
> LV: Found an induction variable.
> LV: Found an induction variable.
> LV: Found an unidentified PHI.  %16 = phi i64 [ %22, %12 ], [ %6, %10 ]
> LV: Can't vectorize the instructions or CFG
> LV: Not vectorizing: Cannot prove legality.
> [...]
The issue seems to be that the phi node "%16" can't be deduced as an
induction variable. If we take a closer look, the cause seems to be in
ScalarEvolution, in the createSCEV function
(http://llvm.org/docs/doxygen/html/ScalarEvolution_8cpp_source.html#l04770)
:

>  // It's tempting to handle inttoptr and ptrtoint as no-ops, however this can
>  // lead to pointer expressions which cannot safely be expanded to GEPs,
>  // because ScalarEvolution doesn't respect the GEP aliasing rules when
>  // simplifying integer expressions.

Indeed, SCEV does not (legitimately) consider inttoptr/ptrtoint as
no-op, and does not handle them. The thing is that, in our case, the GEP
in %23 is thus not analyzed by SCEV, and the PHI %16 is thus not
considered as an induction variable.

To confirm this hypothesis, I created a small out-of-tree pass
(https://github.com/aguinet/llvm-intptrcleanup) which registers before
loop vectorization and does the following:

* first, it search for phi nodes who have those properties:
  - every incoming value of the phi node is a ptrtoint instruction. The
original pointer type of every ptrtoint instruction must be the same type T.
  - every user of this PHI node is an inttoptr instruction of the
previous type T
* for each of these PHI nodes, it creates a new PHI node which takes the
original pointers as incoming values, and replace the uses of the
inttoptr instructions that uses the original PHI node by the new one
* it then removes the previous inttoptr instructions and the original
PHI node

The way I understand inttoptr and ptrtoint, this transformation should
be valid (but I might have missed something!). Please note that this is
a quick'n'dirty pass, which hasn't been heavily tested. Using this pass,
the previous example is now vectorized correctly by the loop vectorizer.
This can be seen by looking at the output of:

> $ clang -Xclang -load -Xclang IntToPtrCleanup.so -O2 ./example/op_zip_operator.cpp -S -emit-llvm -o - -std=c++11

The question that remains to me is how this should be correctly fixed:

1) Making SCEV support these no-op (in this case) inttoptr/ptrtoint
conversions
2) insert the above transformation at some point in the optimization
pipeline
3) clean the pass(es?) that somehow generated this case.

I have to admit I'm not really sure which options is the best. 3) seems
to be the way to go but might require some tedious work, and does not
prevent the issue to come again in the future. 2) seems to be a quick
patch that could be inserted in some "canonicalization" pass, let it be
a valid transformation in the first place. I don't know SCEV enough to
judge of the difficulty/faisability of 1).

This mail is thus to discuss this issue and how to fix this properly :)

Thanks everyone :)

--
Adrien Guinet.

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

op_zip_iterator.cpp (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] LoopVectorize fails to vectorize loops with induction variables with PtrToInt/IntToPtr conversions

ORiordan, Martin via llvm-dev
FYI.

On Sat, Jun 17, 2017 at 3:41 PM, Adrien Guinet via llvm-dev
<[hidden email]> wrote:

> Hello all,
>
> There is a missing vectorization opportunity issue with clang 4.0 with
> the file attached.
>
> Indeed, when compiled with -O2, the "op_distance" function get
> vectorized, but not the "op" one.
>
> For information, this test case has been reduced from a file generated
> by the Pythran compiler (https://github.com/serge-sans-paille/pythran).
>
> If we take a look at the generated IR without vectorization (using the
> -fno-vectorize clang flag), we get:
>
>> $ clang -O2 -S -emit-llvm op_zip_iterator.cpp -std=c++11 -o - -fno-vectorize
>
>> ; Function Attrs: norecurse uwtable
>> define void @_Z11op_distancePi16add_zip_iteratorS0_(i32* nocapture, i32*, i32* nocapture readonly, i32*, i32* nocapture readnone) local_unnamed_addr #0 {
>> ; This one is vectorized!
>>   %6 = ptrtoint i32* %1 to i64
>>   %7 = ptrtoint i32* %3 to i64
>>   %8 = sub i64 %7, %6
>>   %9 = icmp sgt i64 %8, 0
>>   br i1 %9, label %10, label %26
>>
>> ; <label>:10:                                     ; preds = %5
>>   %11 = lshr exact i64 %8, 2
>>   br label %12
>>
>> ; <label>:12:                                     ; preds = %12, %10
>>   %13 = phi i64 [ %23, %12 ], [ %11, %10 ]
>>   %14 = phi i32* [ %22, %12 ], [ %0, %10 ]
>>   %15 = phi i32* [ %21, %12 ], [ %2, %10 ]
>>   %16 = phi i32* [ %20, %12 ], [ %1, %10 ]
>>   %17 = load i32, i32* %16, align 4, !tbaa !1
>>   %18 = load i32, i32* %15, align 4, !tbaa !1
>>   %19 = add nsw i32 %18, %17
>>   store i32 %19, i32* %14, align 4, !tbaa !1
>>   %20 = getelementptr inbounds i32, i32* %16, i64 1
>>   %21 = getelementptr inbounds i32, i32* %15, i64 1
>>   %22 = getelementptr inbounds i32, i32* %14, i64 1
>>   %23 = add nsw i64 %13, -1
>>   %24 = icmp sgt i64 %13, 1
>>   br i1 %24, label %12, label %25
>>
>> ; <label>:25:                                     ; preds = %12
>>   br label %26
>>
>> ; <label>:26:                                     ; preds = %25, %5
>>   ret void
>> }
>>
>> ; Function Attrs: norecurse uwtable
>> define void @_Z2opPi16add_zip_iteratorS0_(i32* nocapture, i32*, i32* nocapture readonly, i32*, i32* nocapture readnone) local_unnamed_addr #0 {
>> ; This one isn't!
>>   %6 = ptrtoint i32* %1 to i64
>>   %7 = ptrtoint i32* %3 to i64
>>   %8 = sub i64 %6, %7
>>   %9 = icmp sgt i64 %8, 0
>>   br i1 %9, label %10, label %28
>>
>> ; <label>:10:                                     ; preds = %5
>>   %11 = lshr exact i64 %8, 2
>>   br label %12
>>
>> ; <label>:12:                                     ; preds = %12, %10
>>   %13 = phi i64 [ %25, %12 ], [ %11, %10 ]
>>   %14 = phi i32* [ %24, %12 ], [ %0, %10 ]
>>   %15 = phi i32* [ %23, %12 ], [ %2, %10 ]
>>   %16 = phi i64 [ %22, %12 ], [ %6, %10 ]
>>   %17 = inttoptr i64 %16 to i32*
>>   %18 = load i32, i32* %17, align 4, !tbaa !1
>>   %19 = load i32, i32* %15, align 4, !tbaa !1
>>   %20 = add nsw i32 %19, %18
>>   store i32 %20, i32* %14, align 4, !tbaa !1
>>   %21 = getelementptr inbounds i32, i32* %17, i64 1
>>   %22 = ptrtoint i32* %21 to i64
>>   %23 = getelementptr inbounds i32, i32* %15, i64 1
>>   %24 = getelementptr inbounds i32, i32* %14, i64 1
>>   %25 = add nsw i64 %13, -1
>>   %26 = icmp sgt i64 %13, 1
>>   br i1 %26, label %12, label %27
>>
>> ; <label>:27:                                     ; preds = %12
>>   br label %28
>>
>> ; <label>:28:                                     ; preds = %27, %5
>>   ret void
>> }
>
> If we compile only the "op" function while activation the debug mode,
> here is the output:
>
>> $ clang -O2 -S -emit-llvm op_zip_iterator.cpp -std=c++11 -o - -fno-vectorize |~/dev/epona-llvm/build_debug_shared/bin/opt -debug -debug-only loop-vectorize -O2 -S
>>
>> LV: Checking a loop in "_Z2opPi16add_zip_iteratorS0_" from <stdin>
>> LV: Loop hints: force=? width=0 unroll=0
>> LV: Found a loop:
>> LV: Found an induction variable.
>> LV: Found an induction variable.
>> LV: Found an induction variable.
>> LV: Found an unidentified PHI.  %16 = phi i64 [ %22, %12 ], [ %6, %10 ]
>> LV: Can't vectorize the instructions or CFG
>> LV: Not vectorizing: Cannot prove legality.
>> [...]
>
> The issue seems to be that the phi node "%16" can't be deduced as an
> induction variable. If we take a closer look, the cause seems to be in
> ScalarEvolution, in the createSCEV function
> (http://llvm.org/docs/doxygen/html/ScalarEvolution_8cpp_source.html#l04770)
> :
>
>>  // It's tempting to handle inttoptr and ptrtoint as no-ops, however this can
>>  // lead to pointer expressions which cannot safely be expanded to GEPs,
>>  // because ScalarEvolution doesn't respect the GEP aliasing rules when
>>  // simplifying integer expressions.
>
> Indeed, SCEV does not (legitimately) consider inttoptr/ptrtoint as
> no-op, and does not handle them. The thing is that, in our case, the GEP
> in %23 is thus not analyzed by SCEV, and the PHI %16 is thus not
> considered as an induction variable.
>
> To confirm this hypothesis, I created a small out-of-tree pass
> (https://github.com/aguinet/llvm-intptrcleanup) which registers before
> loop vectorization and does the following:
>
> * first, it search for phi nodes who have those properties:
>   - every incoming value of the phi node is a ptrtoint instruction. The
> original pointer type of every ptrtoint instruction must be the same type T.
>   - every user of this PHI node is an inttoptr instruction of the
> previous type T
> * for each of these PHI nodes, it creates a new PHI node which takes the
> original pointers as incoming values, and replace the uses of the
> inttoptr instructions that uses the original PHI node by the new one
> * it then removes the previous inttoptr instructions and the original
> PHI node
>
> The way I understand inttoptr and ptrtoint, this transformation should
> be valid (but I might have missed something!). Please note that this is
> a quick'n'dirty pass, which hasn't been heavily tested. Using this pass,
> the previous example is now vectorized correctly by the loop vectorizer.
> This can be seen by looking at the output of:
>
>> $ clang -Xclang -load -Xclang IntToPtrCleanup.so -O2 ./example/op_zip_operator.cpp -S -emit-llvm -o - -std=c++11
>
> The question that remains to me is how this should be correctly fixed:
>
> 1) Making SCEV support these no-op (in this case) inttoptr/ptrtoint
> conversions
> 2) insert the above transformation at some point in the optimization
> pipeline
> 3) clean the pass(es?) that somehow generated this case.
>
> I have to admit I'm not really sure which options is the best. 3) seems
> to be the way to go but might require some tedious work, and does not
> prevent the issue to come again in the future. 2) seems to be a quick
> patch that could be inserted in some "canonicalization" pass, let it be
> a valid transformation in the first place. I don't know SCEV enough to
> judge of the difficulty/faisability of 1).
>
> This mail is thus to discuss this issue and how to fix this properly :)
>
> Thanks everyone :)
>
> --
> Adrien Guinet.
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>



--
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] LoopVectorize fails to vectorize loops with induction variables with PtrToInt/IntToPtr conversions

ORiordan, Martin via llvm-dev
Sorry, hit reply instead of forward :)

On Sat, Jun 17, 2017 at 4:07 PM, Davide Italiano <[hidden email]> wrote:

> FYI.
>
> On Sat, Jun 17, 2017 at 3:41 PM, Adrien Guinet via llvm-dev
> <[hidden email]> wrote:
>> Hello all,
>>
>> There is a missing vectorization opportunity issue with clang 4.0 with
>> the file attached.
>>
>> Indeed, when compiled with -O2, the "op_distance" function get
>> vectorized, but not the "op" one.
>>
>> For information, this test case has been reduced from a file generated
>> by the Pythran compiler (https://github.com/serge-sans-paille/pythran).
>>
>> If we take a look at the generated IR without vectorization (using the
>> -fno-vectorize clang flag), we get:
>>
>>> $ clang -O2 -S -emit-llvm op_zip_iterator.cpp -std=c++11 -o - -fno-vectorize
>>
>>> ; Function Attrs: norecurse uwtable
>>> define void @_Z11op_distancePi16add_zip_iteratorS0_(i32* nocapture, i32*, i32* nocapture readonly, i32*, i32* nocapture readnone) local_unnamed_addr #0 {
>>> ; This one is vectorized!
>>>   %6 = ptrtoint i32* %1 to i64
>>>   %7 = ptrtoint i32* %3 to i64
>>>   %8 = sub i64 %7, %6
>>>   %9 = icmp sgt i64 %8, 0
>>>   br i1 %9, label %10, label %26
>>>
>>> ; <label>:10:                                     ; preds = %5
>>>   %11 = lshr exact i64 %8, 2
>>>   br label %12
>>>
>>> ; <label>:12:                                     ; preds = %12, %10
>>>   %13 = phi i64 [ %23, %12 ], [ %11, %10 ]
>>>   %14 = phi i32* [ %22, %12 ], [ %0, %10 ]
>>>   %15 = phi i32* [ %21, %12 ], [ %2, %10 ]
>>>   %16 = phi i32* [ %20, %12 ], [ %1, %10 ]
>>>   %17 = load i32, i32* %16, align 4, !tbaa !1
>>>   %18 = load i32, i32* %15, align 4, !tbaa !1
>>>   %19 = add nsw i32 %18, %17
>>>   store i32 %19, i32* %14, align 4, !tbaa !1
>>>   %20 = getelementptr inbounds i32, i32* %16, i64 1
>>>   %21 = getelementptr inbounds i32, i32* %15, i64 1
>>>   %22 = getelementptr inbounds i32, i32* %14, i64 1
>>>   %23 = add nsw i64 %13, -1
>>>   %24 = icmp sgt i64 %13, 1
>>>   br i1 %24, label %12, label %25
>>>
>>> ; <label>:25:                                     ; preds = %12
>>>   br label %26
>>>
>>> ; <label>:26:                                     ; preds = %25, %5
>>>   ret void
>>> }
>>>
>>> ; Function Attrs: norecurse uwtable
>>> define void @_Z2opPi16add_zip_iteratorS0_(i32* nocapture, i32*, i32* nocapture readonly, i32*, i32* nocapture readnone) local_unnamed_addr #0 {
>>> ; This one isn't!
>>>   %6 = ptrtoint i32* %1 to i64
>>>   %7 = ptrtoint i32* %3 to i64
>>>   %8 = sub i64 %6, %7
>>>   %9 = icmp sgt i64 %8, 0
>>>   br i1 %9, label %10, label %28
>>>
>>> ; <label>:10:                                     ; preds = %5
>>>   %11 = lshr exact i64 %8, 2
>>>   br label %12
>>>
>>> ; <label>:12:                                     ; preds = %12, %10
>>>   %13 = phi i64 [ %25, %12 ], [ %11, %10 ]
>>>   %14 = phi i32* [ %24, %12 ], [ %0, %10 ]
>>>   %15 = phi i32* [ %23, %12 ], [ %2, %10 ]
>>>   %16 = phi i64 [ %22, %12 ], [ %6, %10 ]
>>>   %17 = inttoptr i64 %16 to i32*
>>>   %18 = load i32, i32* %17, align 4, !tbaa !1
>>>   %19 = load i32, i32* %15, align 4, !tbaa !1
>>>   %20 = add nsw i32 %19, %18
>>>   store i32 %20, i32* %14, align 4, !tbaa !1
>>>   %21 = getelementptr inbounds i32, i32* %17, i64 1
>>>   %22 = ptrtoint i32* %21 to i64
>>>   %23 = getelementptr inbounds i32, i32* %15, i64 1
>>>   %24 = getelementptr inbounds i32, i32* %14, i64 1
>>>   %25 = add nsw i64 %13, -1
>>>   %26 = icmp sgt i64 %13, 1
>>>   br i1 %26, label %12, label %27
>>>
>>> ; <label>:27:                                     ; preds = %12
>>>   br label %28
>>>
>>> ; <label>:28:                                     ; preds = %27, %5
>>>   ret void
>>> }
>>
>> If we compile only the "op" function while activation the debug mode,
>> here is the output:
>>
>>> $ clang -O2 -S -emit-llvm op_zip_iterator.cpp -std=c++11 -o - -fno-vectorize |~/dev/epona-llvm/build_debug_shared/bin/opt -debug -debug-only loop-vectorize -O2 -S
>>>
>>> LV: Checking a loop in "_Z2opPi16add_zip_iteratorS0_" from <stdin>
>>> LV: Loop hints: force=? width=0 unroll=0
>>> LV: Found a loop:
>>> LV: Found an induction variable.
>>> LV: Found an induction variable.
>>> LV: Found an induction variable.
>>> LV: Found an unidentified PHI.  %16 = phi i64 [ %22, %12 ], [ %6, %10 ]
>>> LV: Can't vectorize the instructions or CFG
>>> LV: Not vectorizing: Cannot prove legality.
>>> [...]
>>
>> The issue seems to be that the phi node "%16" can't be deduced as an
>> induction variable. If we take a closer look, the cause seems to be in
>> ScalarEvolution, in the createSCEV function
>> (http://llvm.org/docs/doxygen/html/ScalarEvolution_8cpp_source.html#l04770)
>> :
>>
>>>  // It's tempting to handle inttoptr and ptrtoint as no-ops, however this can
>>>  // lead to pointer expressions which cannot safely be expanded to GEPs,
>>>  // because ScalarEvolution doesn't respect the GEP aliasing rules when
>>>  // simplifying integer expressions.
>>
>> Indeed, SCEV does not (legitimately) consider inttoptr/ptrtoint as
>> no-op, and does not handle them. The thing is that, in our case, the GEP
>> in %23 is thus not analyzed by SCEV, and the PHI %16 is thus not
>> considered as an induction variable.
>>
>> To confirm this hypothesis, I created a small out-of-tree pass
>> (https://github.com/aguinet/llvm-intptrcleanup) which registers before
>> loop vectorization and does the following:
>>
>> * first, it search for phi nodes who have those properties:
>>   - every incoming value of the phi node is a ptrtoint instruction. The
>> original pointer type of every ptrtoint instruction must be the same type T.
>>   - every user of this PHI node is an inttoptr instruction of the
>> previous type T
>> * for each of these PHI nodes, it creates a new PHI node which takes the
>> original pointers as incoming values, and replace the uses of the
>> inttoptr instructions that uses the original PHI node by the new one
>> * it then removes the previous inttoptr instructions and the original
>> PHI node
>>
>> The way I understand inttoptr and ptrtoint, this transformation should
>> be valid (but I might have missed something!). Please note that this is
>> a quick'n'dirty pass, which hasn't been heavily tested. Using this pass,
>> the previous example is now vectorized correctly by the loop vectorizer.
>> This can be seen by looking at the output of:
>>
>>> $ clang -Xclang -load -Xclang IntToPtrCleanup.so -O2 ./example/op_zip_operator.cpp -S -emit-llvm -o - -std=c++11
>>
>> The question that remains to me is how this should be correctly fixed:
>>
>> 1) Making SCEV support these no-op (in this case) inttoptr/ptrtoint
>> conversions
>> 2) insert the above transformation at some point in the optimization
>> pipeline
>> 3) clean the pass(es?) that somehow generated this case.
>>
>> I have to admit I'm not really sure which options is the best. 3) seems
>> to be the way to go but might require some tedious work, and does not
>> prevent the issue to come again in the future. 2) seems to be a quick
>> patch that could be inserted in some "canonicalization" pass, let it be
>> a valid transformation in the first place. I don't know SCEV enough to
>> judge of the difficulty/faisability of 1).
>>
>> This mail is thus to discuss this issue and how to fix this properly :)
>>
>> Thanks everyone :)
>>
>> --
>> Adrien Guinet.
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
>
> --
> Davide
>
> "There are no solved problems; there are only problems that are more
> or less solved" -- Henri Poincare



--
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] LoopVectorize fails to vectorize loops with induction variables with PtrToInt/IntToPtr conversions

ORiordan, Martin via llvm-dev
In reply to this post by ORiordan, Martin via llvm-dev
For the record, I tried with the latest snapshot available on
apt.llvm.org, and the issue is still present.

> $ clang-5.0 --version
> clang version 5.0.0-svn305634-1~exp1 (trunk)
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] LoopVectorize fails to vectorize loops with induction variables with PtrToInt/IntToPtr conversions

ORiordan, Martin via llvm-dev
In reply to this post by ORiordan, Martin via llvm-dev

Hi, Adrien,

Thanks for reporting this. I recommend that you file a bug report at https://bugs.llvm.org/

Whenever I see reports of missed optimization opportunities in the face of ptrtoint/inttoptr, my first question is: why are these instructions present in the first place? At the IR level, use of inttoptr is highly discouraged. Our aliasing analysis, for example, does not look through them, and so you'll generally see a lot of missed optimizations when they're around.

In this case, inttoptr seems to be introduced by SROA. SROA should not be introducing inttoptr, but rather should be using GEPs on i8* (at least), to avoid introducing pointers that our AA can't analyze.

Beyond that, if we need to handle inttoptr/ptrtoint in SCEV, then maybe there's a way to make it smarter about the expressions with which it can deal. I'm actually not sure to what "aliasing rules" the comment you quote below is referring. I can certainly understand not being able to place "inbounds" on some generated GEPs, but otherwise this seems non-obvious to me (i.e. either the expander can identify a base pointer from which to generate the GEP, or it can't, in which case it needs to generate a inttoptr).

Sanjoy, thoughts?

 -Hal

On 06/17/2017 05:41 PM, Adrien Guinet via llvm-dev wrote:
Hello all,

There is a missing vectorization opportunity issue with clang 4.0 with
the file attached.

Indeed, when compiled with -O2, the "op_distance" function get
vectorized, but not the "op" one.

For information, this test case has been reduced from a file generated
by the Pythran compiler (https://github.com/serge-sans-paille/pythran).

If we take a look at the generated IR without vectorization (using the
-fno-vectorize clang flag), we get:

$ clang -O2 -S -emit-llvm op_zip_iterator.cpp -std=c++11 -o - -fno-vectorize

      
; Function Attrs: norecurse uwtable
define void @_Z11op_distancePi16add_zip_iteratorS0_(i32* nocapture, i32*, i32* nocapture readonly, i32*, i32* nocapture readnone) local_unnamed_addr #0 {
; This one is vectorized!
  %6 = ptrtoint i32* %1 to i64
  %7 = ptrtoint i32* %3 to i64
  %8 = sub i64 %7, %6
  %9 = icmp sgt i64 %8, 0
  br i1 %9, label %10, label %26

; <label>:10:                                     ; preds = %5
  %11 = lshr exact i64 %8, 2
  br label %12

; <label>:12:                                     ; preds = %12, %10
  %13 = phi i64 [ %23, %12 ], [ %11, %10 ]
  %14 = phi i32* [ %22, %12 ], [ %0, %10 ]
  %15 = phi i32* [ %21, %12 ], [ %2, %10 ]
  %16 = phi i32* [ %20, %12 ], [ %1, %10 ]
  %17 = load i32, i32* %16, align 4, !tbaa !1
  %18 = load i32, i32* %15, align 4, !tbaa !1
  %19 = add nsw i32 %18, %17
  store i32 %19, i32* %14, align 4, !tbaa !1
  %20 = getelementptr inbounds i32, i32* %16, i64 1
  %21 = getelementptr inbounds i32, i32* %15, i64 1
  %22 = getelementptr inbounds i32, i32* %14, i64 1
  %23 = add nsw i64 %13, -1
  %24 = icmp sgt i64 %13, 1
  br i1 %24, label %12, label %25

; <label>:25:                                     ; preds = %12
  br label %26

; <label>:26:                                     ; preds = %25, %5
  ret void
}

; Function Attrs: norecurse uwtable
define void @_Z2opPi16add_zip_iteratorS0_(i32* nocapture, i32*, i32* nocapture readonly, i32*, i32* nocapture readnone) local_unnamed_addr #0 {
; This one isn't!
  %6 = ptrtoint i32* %1 to i64
  %7 = ptrtoint i32* %3 to i64
  %8 = sub i64 %6, %7
  %9 = icmp sgt i64 %8, 0
  br i1 %9, label %10, label %28

; <label>:10:                                     ; preds = %5
  %11 = lshr exact i64 %8, 2
  br label %12

; <label>:12:                                     ; preds = %12, %10
  %13 = phi i64 [ %25, %12 ], [ %11, %10 ]
  %14 = phi i32* [ %24, %12 ], [ %0, %10 ]
  %15 = phi i32* [ %23, %12 ], [ %2, %10 ]
  %16 = phi i64 [ %22, %12 ], [ %6, %10 ]
  %17 = inttoptr i64 %16 to i32*
  %18 = load i32, i32* %17, align 4, !tbaa !1
  %19 = load i32, i32* %15, align 4, !tbaa !1
  %20 = add nsw i32 %19, %18
  store i32 %20, i32* %14, align 4, !tbaa !1
  %21 = getelementptr inbounds i32, i32* %17, i64 1
  %22 = ptrtoint i32* %21 to i64
  %23 = getelementptr inbounds i32, i32* %15, i64 1
  %24 = getelementptr inbounds i32, i32* %14, i64 1
  %25 = add nsw i64 %13, -1
  %26 = icmp sgt i64 %13, 1
  br i1 %26, label %12, label %27

; <label>:27:                                     ; preds = %12
  br label %28

; <label>:28:                                     ; preds = %27, %5
  ret void
}
If we compile only the "op" function while activation the debug mode,
here is the output:

$ clang -O2 -S -emit-llvm op_zip_iterator.cpp -std=c++11 -o - -fno-vectorize |~/dev/epona-llvm/build_debug_shared/bin/opt -debug -debug-only loop-vectorize -O2 -S

LV: Checking a loop in "_Z2opPi16add_zip_iteratorS0_" from <stdin>
LV: Loop hints: force=? width=0 unroll=0
LV: Found a loop: 
LV: Found an induction variable.
LV: Found an induction variable.
LV: Found an induction variable.
LV: Found an unidentified PHI.  %16 = phi i64 [ %22, %12 ], [ %6, %10 ]
LV: Can't vectorize the instructions or CFG
LV: Not vectorizing: Cannot prove legality.
[...]
The issue seems to be that the phi node "%16" can't be deduced as an
induction variable. If we take a closer look, the cause seems to be in
ScalarEvolution, in the createSCEV function
(http://llvm.org/docs/doxygen/html/ScalarEvolution_8cpp_source.html#l04770)
:

 // It's tempting to handle inttoptr and ptrtoint as no-ops, however this can
 // lead to pointer expressions which cannot safely be expanded to GEPs,
 // because ScalarEvolution doesn't respect the GEP aliasing rules when
 // simplifying integer expressions.
Indeed, SCEV does not (legitimately) consider inttoptr/ptrtoint as
no-op, and does not handle them. The thing is that, in our case, the GEP
in %23 is thus not analyzed by SCEV, and the PHI %16 is thus not
considered as an induction variable.

To confirm this hypothesis, I created a small out-of-tree pass
(https://github.com/aguinet/llvm-intptrcleanup) which registers before
loop vectorization and does the following:

* first, it search for phi nodes who have those properties:
  - every incoming value of the phi node is a ptrtoint instruction. The
original pointer type of every ptrtoint instruction must be the same type T.
  - every user of this PHI node is an inttoptr instruction of the
previous type T
* for each of these PHI nodes, it creates a new PHI node which takes the
original pointers as incoming values, and replace the uses of the
inttoptr instructions that uses the original PHI node by the new one
* it then removes the previous inttoptr instructions and the original
PHI node

The way I understand inttoptr and ptrtoint, this transformation should
be valid (but I might have missed something!). Please note that this is
a quick'n'dirty pass, which hasn't been heavily tested. Using this pass,
the previous example is now vectorized correctly by the loop vectorizer.
This can be seen by looking at the output of:

$ clang -Xclang -load -Xclang IntToPtrCleanup.so -O2 ./example/op_zip_operator.cpp -S -emit-llvm -o - -std=c++11
The question that remains to me is how this should be correctly fixed:

1) Making SCEV support these no-op (in this case) inttoptr/ptrtoint
conversions
2) insert the above transformation at some point in the optimization
pipeline
3) clean the pass(es?) that somehow generated this case.

I have to admit I'm not really sure which options is the best. 3) seems
to be the way to go but might require some tedious work, and does not
prevent the issue to come again in the future. 2) seems to be a quick
patch that could be inserted in some "canonicalization" pass, let it be
a valid transformation in the first place. I don't know SCEV enough to
judge of the difficulty/faisability of 1).

This mail is thus to discuss this issue and how to fix this properly :)

Thanks everyone :)



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] LoopVectorize fails to vectorize loops with induction variables with PtrToInt/IntToPtr conversions

ORiordan, Martin via llvm-dev
On 06/20/2017 03:26 AM, Hal Finkel wrote:
> Hi, Adrien,

Hello Hal!

Thanks for your answer!

> Thanks for reporting this. I recommend that you file a bug report at
> https://bugs.llvm.org/

Will do!

> Whenever I see reports of missed optimization opportunities in the face
> of ptrtoint/inttoptr, my first question is: why are these instructions
> present in the first place? At the IR level, use of inttoptr is highly
> discouraged. Our aliasing analysis, for example, does not look through
> them, and so you'll generally see a lot of missed optimizations when
> they're around.
> In this case, inttoptr seems to be introduced by SROA. SROA should not
> be introducing inttoptr, but rather should be using GEPs on i8* (at
> least), to avoid introducing pointers that our AA can't analyze.

It looks so indeed!

> Beyond that, if we need to handle inttoptr/ptrtoint in SCEV, then maybe
> there's a way to make it smarter about the expressions with which it can
> deal.

One of the solution I was thinking of was making SCEV "understand"
inttoptr(ptrtoint()), as this is clearly a no-op (and what is happening
here), but it seems not that straightforward to do (hence this
discussion to discuss what should be done :)).

> I'm actually not sure to what "aliasing rules" the comment you
> quote below is referring. I can certainly understand not being able to
> place "inbounds" on some generated GEPs, but otherwise this seems
> non-obvious to me (i.e. either the expander can identify a base pointer
> from which to generate the GEP, or it can't, in which case it needs to
> generate a inttoptr).

The way I understand it is that, for a pointer to a fixed-size array P,
by using I=ptrtoint(P)/arithmetic operations(I)/inttoptr(I) we can
potentially end-up with a pointer which is not in the original values
that could be computed from the original array (like an out-of-bound
pointer), and thus not "reconstruct" a GEP from it. That's why SCEV
wouldn't consider them. IIRC, I tried to add ptrtoint/inttoptr as
"no-op" in SCEV, but then the cost-model seemed to be broken (a very
high value was given at some point), and thus vectorization still didn't
happen.

I was also thinking: even if SROA is fixed, nothing prevents another
pass from introducing such constructions in the future, so maybe the
solution is to have a pass (like the one I wrote) that try and cleanup
before any usage of SCEV? (or done by InstCombine for instance, as this
basically removes the inttoptr(ptrtoint()) combination).
Or having the "inttoptr(ptrtoint())" composition understood by SCEV as a
no-op might also be a cleaner "long-term" solution...

What do you think?
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] LoopVectorize fails to vectorize loops with induction variables with PtrToInt/IntToPtr conversions

ORiordan, Martin via llvm-dev
On 06/20/2017 08:41 PM, Adrien Guinet via llvm-dev wrote:

> On 06/20/2017 03:26 AM, Hal Finkel wrote:
>> Hi, Adrien,
>
> Hello Hal!
>
> Thanks for your answer!
>
>> Thanks for reporting this. I recommend that you file a bug report at
>> https://bugs.llvm.org/
>
> Will do!

FTR, the bug report is here: https://bugs.llvm.org/show_bug.cgi?id=33532
. Maybe discussions should be continued there!

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] LoopVectorize fails to vectorize loops with induction variables with PtrToInt/IntToPtr conversions

ORiordan, Martin via llvm-dev
In reply to this post by ORiordan, Martin via llvm-dev

On 06/20/2017 01:41 PM, Adrien Guinet wrote:

> On 06/20/2017 03:26 AM, Hal Finkel wrote:
>> Hi, Adrien,
> Hello Hal!
>
> Thanks for your answer!
>
>> Thanks for reporting this. I recommend that you file a bug report at
>> https://bugs.llvm.org/
> Will do!
>
>> Whenever I see reports of missed optimization opportunities in the face
>> of ptrtoint/inttoptr, my first question is: why are these instructions
>> present in the first place? At the IR level, use of inttoptr is highly
>> discouraged. Our aliasing analysis, for example, does not look through
>> them, and so you'll generally see a lot of missed optimizations when
>> they're around.
>> In this case, inttoptr seems to be introduced by SROA. SROA should not
>> be introducing inttoptr, but rather should be using GEPs on i8* (at
>> least), to avoid introducing pointers that our AA can't analyze.
> It looks so indeed!
>
>> Beyond that, if we need to handle inttoptr/ptrtoint in SCEV, then maybe
>> there's a way to make it smarter about the expressions with which it can
>> deal.
> One of the solution I was thinking of was making SCEV "understand"
> inttoptr(ptrtoint()), as this is clearly a no-op (and what is happening
> here), but it seems not that straightforward to do (hence this
> discussion to discuss what should be done :)).
>
>> I'm actually not sure to what "aliasing rules" the comment you
>> quote below is referring. I can certainly understand not being able to
>> place "inbounds" on some generated GEPs, but otherwise this seems
>> non-obvious to me (i.e. either the expander can identify a base pointer
>> from which to generate the GEP, or it can't, in which case it needs to
>> generate a inttoptr).
> The way I understand it is that, for a pointer to a fixed-size array P,
> by using I=ptrtoint(P)/arithmetic operations(I)/inttoptr(I) we can
> potentially end-up with a pointer which is not in the original values
> that could be computed from the original array (like an out-of-bound
> pointer), and thus not "reconstruct" a GEP from it. That's why SCEV
> wouldn't consider them.

This is exactly what I meant by not being able to create "inbounds"
GEPs. However, aside from that, I don't recall our semantics restricting
the formation of "out of bounds" pointers (although there certainly are
restrictions on the uses of such pointers).

>   IIRC, I tried to add ptrtoint/inttoptr as
> "no-op" in SCEV, but then the cost-model seemed to be broken (a very
> high value was given at some point), and thus vectorization still didn't
> happen.
>
> I was also thinking: even if SROA is fixed, nothing prevents another
> pass from introducing such constructions in the future, so maybe the
> solution is to have a pass (like the one I wrote) that try and cleanup
> before any usage of SCEV? (or done by InstCombine for instance, as this
> basically removes the inttoptr(ptrtoint()) combination).
> Or having the "inttoptr(ptrtoint())" composition understood by SCEV as a
> no-op might also be a cleaner "long-term" solution...
>
> What do you think?

While it is true that nothing prevents it, technically, over time we've
been removing code from LLVM that creates these in favor of code that
uses GEPs. I think it would be counter productive to teach some parts of
the optimizer about inttoptr/ptrtoint, and not BasicAA and all of the
rest of it. There are certainly some language constructs that must be
lowered using inttoptr/ptrtoint at the IR level, but usage should be
limited to those contexts.

The right solution here is to fix SROA, and any other passes that are
doing this, to use GEPs instead. Only if that's not possible, should we
consider other solutions. The reason we've chosen this path is that the
task of teaching the rest of the optimizer to handle inttoptr/ptrtoint
as truly first-class pointer-manipulation constructs, along with GEPs,
is highly non-trivial. If, on top of that, we need some canonicalization
procedure which transforms uses of inttoptr/ptrtoint into GEP-based
expressions, we can do that too.

  -Hal


--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] LoopVectorize fails to vectorize loops with induction variables with PtrToInt/IntToPtr conversions

ORiordan, Martin via llvm-dev
Hi,

On Tue, Jun 20, 2017 at 5:54 PM, Hal Finkel <[hidden email]> wrote:
>>> Whenever I see reports of missed optimization opportunities in the face
>>> of ptrtoint/inttoptr, my first question is: why are these instructions
>>> present in the first place? At the IR level, use of inttoptr is highly
>>> discouraged. Our aliasing analysis, for example, does not look through
>>> them, and so you'll generally see a lot of missed optimizations when
>>> they're around.

+1

> This is exactly what I meant by not being able to create "inbounds" GEPs.
> However, aside from that, I don't recall our semantics restricting the
> formation of "out of bounds" pointers (although there certainly are
> restrictions on the uses of such pointers).

One problem is that there is a difference between (assume A and B are
distinct allocations):

A = i8* some_ptr
B = i8* some_ptr
C = inttoptr(ptrtoint(A) + ptrtoint(B))

and

A = i8* some_ptr
B = i8* some_ptr
C = gep(A, ptrtoint(B))

As far as I know, in the former case C aliases both A and B, where in
the latter case C aliases only A (and not B).

The comment in SCEV is very old, but I suspect it is trying to avoid
accidentally converting the former into the latter via a SCEV creation
-> SCEV expansion step.

-- Sanjoy
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev