Limit loop vectorizer to SSE

classic Classic list List threaded Threaded
36 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Limit loop vectorizer to SSE

Frank Winter
The AVX + JIT bug is hitting more frequently now. On a AVX machine the
loop vectorizer goes for a vector length of 8 for some of my functions
which in turn causes a SEGFAULT.

Is there a way to limit the loop vectorizer to a certain vector length,
say 4, such that I can work around the bug?

Frank


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Limit loop vectorizer to SSE

Frank Winter
I am asking because the option 'force-vector-width' is too restrictive.
I would like to leave open the possibility to use vector width 2.

Frank


On 12/11/13 10:05, Frank Winter wrote:

> The AVX + JIT bug is hitting more frequently now. On a AVX machine the
> loop vectorizer goes for a vector length of 8 for some of my functions
> which in turn causes a SEGFAULT.
>
> Is there a way to limit the loop vectorizer to a certain vector length,
> say 4, such that I can work around the bug?
>
> Frank
>
>


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Limit loop vectorizer to SSE

Renato Golin-2
On 12 November 2013 15:14, Frank Winter <[hidden email]> wrote:
I am asking because the option 'force-vector-width' is too restrictive.
I would like to leave open the possibility to use vector width 2.

I was about to say that, and you saved us both one cycle. ;)

What you could do is to force an architecture that doesn't have AVX, only SSE. I'm not sure how to do that on the JIT, I suppose setting the Target attributes would be enough. Nor I know what CPU string limits support to SSE, but that should do it.

cheers,
--renato

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Limit loop vectorizer to SSE

Frank Winter
.. forcing the vector size to 4 does not prevent using AVX. I just hit the following:

LV: We can vectorize this loop!
LV: Found trip count: 4
LV: The Widest type: 64 bits.
LV: The Widest register is: 256 bits.
LV: Using user VF 4.

Looks like I have to disable AVX somehow. (Which is sad on its own.)

Frank



On 12/11/13 10:34, Renato Golin wrote:
On 12 November 2013 15:14, Frank Winter <[hidden email]> wrote:
I am asking because the option 'force-vector-width' is too restrictive.
I would like to leave open the possibility to use vector width 2.

I was about to say that, and you saved us both one cycle. ;)

What you could do is to force an architecture that doesn't have AVX, only SSE. I'm not sure how to do that on the JIT, I suppose setting the Target attributes would be enough. Nor I know what CPU string limits support to SSE, but that should do it.

cheers,
--renato



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Limit loop vectorizer to SSE

Renato Golin-2
On 12 November 2013 15:53, Frank Winter <[hidden email]> wrote:
.. forcing the vector size to 4 does not prevent using AVX.

Sure. That's more for tests than anything else.

So, there are ways of disabling stuf in Clang, for instance "-mattr=-avx" or "-target-feature -avx", but I'm not sure how you're doing it in the JIT. I'm also not sure how to set target parameters in JIT, you'll have to do that by hand.

cheers,
--renato

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Limit loop vectorizer to SSE

Frank Winter
On 12/11/13 11:01, Renato Golin wrote:
On 12 November 2013 15:53, Frank Winter <[hidden email]> wrote:
.. forcing the vector size to 4 does not prevent using AVX.

Sure. That's more for tests than anything else.

So, there are ways of disabling stuf in Clang, for instance "-mattr=-avx" or "-target-feature -avx", but I'm not sure how you're doing it in the JIT. I'm also not sure how to set target parameters in JIT, you'll have to do that by hand.

I don't know that either. I set the CPU via

engineBuilder.setMCPU(llvm::sys::getHostCPUName());

and that figures out all target parameters, I assume.
I would need to still use this, and then disable just the AVX feature.

cheers,
--renato



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Limit loop vectorizer to SSE

Renato Golin-2
On 12 November 2013 16:05, Frank Winter <[hidden email]> wrote:
engineBuilder.setMCPU(llvm::sys::getHostCPUName());

Try:

engineBuilder.setMAttrs("-avx");

--renato

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Limit loop vectorizer to SSE

Frank Winter
Well educated guess! (It must be a sequence container of strings, but that's technical.)

Thanks,
Frank


On 12/11/13 11:11, Renato Golin wrote:
On 12 November 2013 16:05, Frank Winter <[hidden email]> wrote:
engineBuilder.setMCPU(llvm::sys::getHostCPUName());

Try:

engineBuilder.setMAttrs("-avx");

--renato



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Limit loop vectorizer to SSE

Josh Klontz
In reply to this post by Renato Golin-2
Porting my project from JIT to MCJIT did not fix the code generation bug Frank is also experiencing. However, Renato's "-avx" suggestion did resolve the issue for me. Hopefully we can get some traction on this bug, happy to help where possible!
v/r,
Josh
Reply | Threaded
Open this post in threaded view
|

Re: Limit loop vectorizer to SSE

Renato Golin-2
On 12 November 2013 21:10, Josh Klontz <[hidden email]> wrote:
Porting my project from JIT to MCJIT did not fix the code generation bug
Frank is also experiencing. However, Renato's "-avx" suggestion did resolve
the issue for me. Hopefully we can get some traction on this bug, happy to
help where possible!

Hi Josh, Frank,

Glad to see you can continue with your work, regardless of the AVX bug. It would be great if you guys could reduce the IR and report the AVX bug in bugzilla, I'm hoping you both found the same error (fingers crossed), but feel free to open separate bugs, and we'll join later if they are the same.

Thanks!
--renato

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Limit loop vectorizer to SSE

Frank Winter
My case is submitted. bug 17878

In my case the segfault happens when calling the JIT'ed function. Thus some sort of 'payload' has to be created. Not sure if it's the same what Josh is hitting.

Frank


On 13/11/13 04:26, Renato Golin wrote:
On 12 November 2013 21:10, Josh Klontz <[hidden email]> wrote:
Porting my project from JIT to MCJIT did not fix the code generation bug
Frank is also experiencing. However, Renato's "-avx" suggestion did resolve
the issue for me. Hopefully we can get some traction on this bug, happy to
help where possible!

Hi Josh, Frank,

Glad to see you can continue with your work, regardless of the AVX bug. It would be great if you guys could reduce the IR and report the AVX bug in bugzilla, I'm hoping you both found the same error (fingers crossed), but feel free to open separate bugs, and we'll join later if they are the same.

Thanks!
--renato



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Limit loop vectorizer to SSE

Renato Golin-2
On 13 November 2013 14:59, Frank Winter <[hidden email]> wrote:
My case is submitted. bug 17878

Thanks!

--renato

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Limit loop vectorizer to SSE

Josh Klontz
In reply to this post by Frank Winter
I'm embarrassed to say my bug ended up being a user error. I was passing in pointers that were 16-byte aligned instead of 32. Explains why they worked fine for SSE but not AVX :) Sorry for the noise!
Reply | Threaded
Open this post in threaded view
|

Re: Limit loop vectorizer to SSE

Frank Winter
Good catch! That was the problem in my case too. I totally
overlooked the alignment requirement for AVX.
Frank


On 15/11/13 14:43, Josh Klontz wrote:

> I'm embarrassed to say my bug ended up being a user error. I was passing in
> pointers that were 16-byte aligned instead of 32. Explains why they worked
> fine for SSE but not AVX :) Sorry for the noise!
>
>
>
> --
> View this message in context: http://llvm.1065342.n5.nabble.com/Limit-loop-vectorizer-to-SSE-tp63175p63419.html
> Sent from the LLVM - Dev mailing list archive at Nabble.com.
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Limit loop vectorizer to SSE

Renato Golin-2
On 15 November 2013 20:05, Frank Winter <[hidden email]> wrote:
Good catch! That was the problem in my case too. I totally
overlooked the alignment requirement for AVX.

Wow! Two bugs closed without even looking at them! I must be a wizard! :D

Good work Josh, thanks for letting us know.

cheers,
--renato

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Limit loop vectorizer to SSE

Renato Golin-2
In reply to this post by Frank Winter
On 15 November 2013 20:05, Frank Winter <[hidden email]> wrote:
Good catch! That was the problem in my case too. I totally
overlooked the alignment requirement for AVX.

I wonder if the validation mechanism shouldn't have caught it earlier... Do you guys run validate on the modules before JIT-ing?

--renato

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Limit loop vectorizer to SSE

Frank Winter
Hmm.. I don't quite understand. How can a module validator
catch this, when it's the pointers, i.e. the payload, you pass
as function arguments that need to be aligned.. ?!
Frank

On 15/11/13 15:16, Renato Golin wrote:
On 15 November 2013 20:05, Frank Winter <[hidden email]> wrote:
Good catch! That was the problem in my case too. I totally
overlooked the alignment requirement for AVX.

I wonder if the validation mechanism shouldn't have caught it earlier... Do you guys run validate on the modules before JIT-ing?

--renato



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Limit loop vectorizer to SSE

Josh Klontz
Agreed, is there a pass that will insert a runtime alignment check? Also, what's the easiest way to get at TargetTransformInfo::getRegisterBitWidth() so I don't have to hard code 32? Thanks!
-Josh


On Fri, Nov 15, 2013 at 3:20 PM, Frank Winter <[hidden email]> wrote:
Hmm.. I don't quite understand. How can a module validator
catch this, when it's the pointers, i.e. the payload, you pass
as function arguments that need to be aligned.. ?!
Frank


On 15/11/13 15:16, Renato Golin wrote:
On 15 November 2013 20:05, Frank Winter <[hidden email]> wrote:
Good catch! That was the problem in my case too. I totally
overlooked the alignment requirement for AVX.

I wonder if the validation mechanism shouldn't have caught it earlier... Do you guys run validate on the modules before JIT-ing?

--renato




_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Limit loop vectorizer to SSE

Renato Golin-2
In reply to this post by Frank Winter
On 15 November 2013 20:20, Frank Winter <[hidden email]> wrote:
Hmm.. I don't quite understand. How can a module validator
catch this, when it's the pointers, i.e. the payload, you pass
as function arguments that need to be aligned.. ?!

My mistake, I thought it was something in your front-end, generating bad IR. Ignore that comment.

--renato

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Limit loop vectorizer to SSE

Renato Golin-2
In reply to this post by Josh Klontz
On 15 November 2013 20:24, Joshua Klontz <[hidden email]> wrote:
Agreed, is there a pass that will insert a runtime alignment check? Also, what's the easiest way to get at TargetTransformInfo::getRegisterBitWidth() so I don't have to hard code 32? Thanks!

I think that's a fair question, and it's about safety. If you're getting this on the JIT, means we may be generating unsafe transformations on the vectorizer.

Arnold, Nadav, I don't remember seeing code to generate any run-time alignment checks on the incoming pointer, is there such a thing? If not, shouldn't we add one?

cheers,
--renato

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
12