Failure to optimize vector select

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Failure to optimize vector select

Matt Arsenault-2
Hi,

I've found a case I would expect would optimize easily, but it doesn't. A simple implementation of vector select:

float4 simple_select(float4 a, float4 b, int4 c)
{
    float4 result;

    result.x = c.x ? a.x : b.x;
    result.y = c.y ? a.y : b.y;
    result.z = c.z ? a.z : b.z;
    result.w = c.w ? a.w : b.w;

    return result;
}

I would expect this would be optimized to

%bool = icmp eq <4 x i32> %c, 0
%result = select <4 x i1> %bool, <4 x float> %a, <4x float> %b
ret <4 x float> %result

However, it actually ends up as the 4 separate extractelement/icmp/select sequence.

Where would be the best place to fix this? Should InstCombine be taking care of this or the vectorizer?


Thanks
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Failure to optimize vector select

Bugzilla from eltoder@gmail.com
Have you tried running SLP vectorizer pass (-vectorize-slp)?

Eugene


On Mon, Aug 19, 2013 at 9:04 PM, Matt Arsenault <[hidden email]> wrote:
Hi,

I've found a case I would expect would optimize easily, but it doesn't. A simple implementation of vector select:

float4 simple_select(float4 a, float4 b, int4 c)
{
    float4 result;

    result.x = c.x ? a.x : b.x;
    result.y = c.y ? a.y : b.y;
    result.z = c.z ? a.z : b.z;
    result.w = c.w ? a.w : b.w;

    return result;
}

I would expect this would be optimized to

%bool = icmp eq <4 x i32> %c, 0
%result = select <4 x i1> %bool, <4 x float> %a, <4x float> %b
ret <4 x float> %result

However, it actually ends up as the 4 separate extractelement/icmp/select sequence.

Where would be the best place to fix this? Should InstCombine be taking care of this or the vectorizer?


Thanks
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Failure to optimize vector select

Matt Arsenault-2

On Aug 19, 2013, at 18:47 , Eugene Toder <[hidden email]> wrote:

> Have you tried running SLP vectorizer pass (-vectorize-slp)?
Yes. That was the first thing i tried, and it didn't do anything. I was looking the vectorizer, but then I saw some things that made me wonder if it was even supposed to do this
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Failure to optimize vector select

Nadav Rotem
Can you send the IR of the function ?  

On Aug 20, 2013, at 8:36 AM, Matt Arsenault <[hidden email]> wrote:

>
> On Aug 19, 2013, at 18:47 , Eugene Toder <[hidden email]> wrote:
>
>> Have you tried running SLP vectorizer pass (-vectorize-slp)?
> Yes. That was the first thing i tried, and it didn't do anything. I was looking the vectorizer, but then I saw some things that made me wonder if it was even supposed to do this
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Failure to optimize vector select

Nadav Rotem
I suspect that in the IR you will see a sequence of inserts. At the moment the SLP-vectorizer does not look at “insert” sequences. But it should be really easy (and beneficial) to.

On Aug 20, 2013, at 10:22 AM, Nadav Rotem <[hidden email]> wrote:

> Can you send the IR of the function ?  
>
> On Aug 20, 2013, at 8:36 AM, Matt Arsenault <[hidden email]> wrote:
>
>>
>> On Aug 19, 2013, at 18:47 , Eugene Toder <[hidden email]> wrote:
>>
>>> Have you tried running SLP vectorizer pass (-vectorize-slp)?
>> Yes. That was the first thing i tried, and it didn't do anything. I was looking the vectorizer, but then I saw some things that made me wonder if it was even supposed to do this
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Failure to optimize vector select

Matt Arsenault-2
In reply to this post by Nadav Rotem
On Aug 20, 2013, at 10:22 , Nadav Rotem <[hidden email]> wrote:

> Can you send the IR of the function ?  

Attached is the -O0 and -O3 IR


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

vselect_optimized.ll (1K) Download Attachment
vselect_unoptimized.ll (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Failure to optimize vector select

Nadav Rotem
Hi Matt,

This code maintains a vector of float4 and it inserts and extracts values from this vector. The ’select’ operations are already vectorized. Maybe a sequence of inst-combines (or DAG-combines) can help. If you re-write this code using scalars then the slp-vectorizer, with some tweaks, will be able to catch it.

Thanks,
Nadav


On Aug 20, 2013, at 1:14 PM, Matt Arsenault <[hidden email]> wrote:

> On Aug 20, 2013, at 10:22 , Nadav Rotem <[hidden email]> wrote:
>
>> Can you send the IR of the function ?  
>
> Attached is the -O0 and -O3 IR
>
> <vselect_optimized.ll><vselect_unoptimized.ll>


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Failure to optimize vector select

Micah Villmow
Nadav,
 I think what matt was looking for is why the slp-vectorizer is not vectorizing the booleans? To me it seems like the vectorizer got the first step right(vectorizing the operands), but not the second step(vectorizing the comparison operation). I actually would expect a single icmp ne <4 x i32> %c, <4 x i32><i32 0, i32 0, i32 0, i32 0> instead of 4 icmp's.


Micah

> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On
> Behalf Of Nadav Rotem
> Sent: Tuesday, August 20, 2013 2:49 PM
> To: Matt Arsenault
> Cc: Mailing List
> Subject: Re: [LLVMdev] Failure to optimize vector select
>
> Hi Matt,
>
> This code maintains a vector of float4 and it inserts and extracts values from
> this vector. The 'select' operations are already vectorized. Maybe a sequence
> of inst-combines (or DAG-combines) can help. If you re-write this code using
> scalars then the slp-vectorizer, with some tweaks, will be able to catch it.
>
> Thanks,
> Nadav
>
>
> On Aug 20, 2013, at 1:14 PM, Matt Arsenault <[hidden email]> wrote:
>
> > On Aug 20, 2013, at 10:22 , Nadav Rotem <[hidden email]> wrote:
> >
> >> Can you send the IR of the function ?
> >
> > Attached is the -O0 and -O3 IR
> >
> > <vselect_optimized.ll><vselect_unoptimized.ll>
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Failure to optimize vector select

Matt Arsenault-2
In reply to this post by Nadav Rotem

On Aug 20, 2013, at 14:49 , Nadav Rotem <[hidden email]> wrote:

> Hi Matt,
>
> This code maintains a vector of float4 and it inserts and extracts values from this vector. The ’select’ operations are already vectorized. Maybe a sequence of inst-combines (or DAG-combines) can help. If you re-write this code using scalars then the slp-vectorizer, with some tweaks, will be able to catch it.
>

I've tried manually scalarizing the arguments so the other select arguments are scalars, but the vectorizer still doesn't change it. Here is the scalarized IR.




_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

manual_scalarize.ll (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Failure to optimize vector select

Nadav Rotem
Hi Matt,

We are really close. :)  Now, all you have to do is teach the SLP-vectorizer to start looking at “trees” that start with this pattern:

"
  %ra = insertelement <4 x float> undef, float %s0, i32 0
  %rb = insertelement <4 x float> %ra, float %s1, i32 1
  %rc = insertelement <4 x float> %rb, float %s2, i32 2
  %rd = insertelement <4 x float> %rc, float %s3, i32 3
  ret <4 x float> %rd
"

It’s really easy to do. Look at the code in runOnFunction in SLPVectorizer.cpp ;  Just put %s0, %s1, %s2 and %s3 in a list and call tryToVectorize(…).

Thanks,
Nadav

On Aug 20, 2013, at 3:29 PM, Matt Arsenault <[hidden email]> wrote:

>
> On Aug 20, 2013, at 14:49 , Nadav Rotem <[hidden email]> wrote:
>
>> Hi Matt,
>>
>> This code maintains a vector of float4 and it inserts and extracts values from this vector. The ’select’ operations are already vectorized. Maybe a sequence of inst-combines (or DAG-combines) can help. If you re-write this code using scalars then the slp-vectorizer, with some tweaks, will be able to catch it.
>>
>
> I've tried manually scalarizing the arguments so the other select arguments are scalars, but the vectorizer still doesn't change it. Here is the scalarized IR.
>
> <manual_scalarize.ll>


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev