stack alignment (again)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

stack alignment (again)

Chuck Rose III

Hola LLVMers,

 

I was curious about the state of stack alignment on x86.  I noticed there are a few bugs outstanding on the issue.  I recently added some code which had the effect of throwing an extra function parameter on our stack at runtime, a 4 byte pointer. 

 

Esp is now not 16-byte aligned, so instructions like unpcklps xmm1, dword ptr [eps] cause grief.  My AllocaInstr instructions are told to be 16 byte aligned, so the addition of a 4-byte parameter shouldn’t have changed alignment on the objects.

 

The unpcklps instruction is coming from an ExtractElementInst or InsertElementInst.  I can always hard code these by cyling my vectors to memory and doing things one scalar at a time, though perf will suffer.  I’ll try it Monday to see if it gets rid of the alignment sensitive instructions.

 

I’m noticing this under windows via JIT.  I’m going to check to see if my mac has similar issues.

 

Thanks,

Chuck.

 

 


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: stack alignment (again)

Dale Johannesen

On Mar 28, 2008, at 5:17 PM, Chuck Rose III wrote:

Hola LLVMers,
 
I was curious about the state of stack alignment on x86.  I noticed there are a few bugs outstanding on the issue.  I recently added some code which had the effect of throwing an extra function parameter on our stack at runtime, a 4 byte pointer. 
 
Esp is now not 16-byte aligned, so instructions like unpcklps xmm1, dword ptr [eps] cause grief.  My AllocaInstr instructions are told to be 16 byte aligned, so the addition of a 4-byte parameter shouldn’t have changed alignment on the objects.
 
The unpcklps instruction is coming from an ExtractElementInst or InsertElementInst.  I can always hard code these by cyling my vectors to memory and doing things one scalar at a time, though perf will suffer.  I’ll try it Monday to see if it gets rid of the alignment sensitive instructions.
 
I’m noticing this under windows via JIT.  I’m going to check to see if my mac has similar issues.

The stack on MacOS is supposed to be kept 16-byte aligned.


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: stack alignment (again)

Chris Lattner
In reply to this post by Chuck Rose III
On Mar 28, 2008, at 5:17 PM, Chuck Rose III wrote:
I was curious about the state of stack alignment on x86.  I noticed there are a few bugs outstanding on the issue.  I recently added some code which had the effect of throwing an extra function parameter on our stack at runtime, a 4 byte pointer. 
 
Esp is now not 16-byte aligned, so instructions like unpcklps xmm1, dword ptr [eps] cause grief.  My AllocaInstr instructions are told to be 16 byte aligned, so the addition of a 4-byte parameter shouldn’t have changed alignment on the objects.

Hi Chuck,

I think the basic problem is that the stack pointer on windows/linux is not guaranteed to be 16 byte aligned.  This means that any use of an instruction which requires 16-byte alignment (e.g. sse stuff) and accesses a frameindex can cause a problem.  The issue is that the frameindex will be marked as needing 16+ byte alignment, but the code generator just won't respect this.

The fix for this is somewhat simple: in Prolog/Epilog Insertion, the PEI pass should notice when frame indices have alignment greater than the guaranteed stack alignment.  When this happens, it should emit code into the prolog to dynamically align the stack (e.g. by emitting 'and esp, -16').

This doesn't occur on the mac, because the stack is always guaranteed to be 16-byte aligned.

-Chris

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: stack alignment (again)

Dale Johannesen

On Mar 30, 2008, at 10:21 AM, Chris Lattner wrote:

On Mar 28, 2008, at 5:17 PM, Chuck Rose III wrote:
I was curious about the state of stack alignment on x86.  I noticed there are a few bugs outstanding on the issue.  I recently added some code which had the effect of throwing an extra function parameter on our stack at runtime, a 4 byte pointer. 
 
Esp is now not 16-byte aligned, so instructions like unpcklps xmm1, dword ptr [eps] cause grief.  My AllocaInstr instructions are told to be 16 byte aligned, so the addition of a 4-byte parameter shouldn’t have changed alignment on the objects.

Hi Chuck,

I think the basic problem is that the stack pointer on windows/linux is not guaranteed to be 16 byte aligned.  This means that any use of an instruction which requires 16-byte alignment (e.g. sse stuff) and accesses a frameindex can cause a problem.  The issue is that the frameindex will be marked as needing 16+ byte alignment, but the code generator just won't respect this.

The fix for this is somewhat simple: in Prolog/Epilog Insertion, the PEI pass should notice when frame indices have alignment greater than the guaranteed stack alignment.  When this happens, it should emit code into the prolog to dynamically align the stack (e.g. by emitting 'and esp, -16').

This doesn't occur on the mac, because the stack is always guaranteed to be 16-byte aligned.

Another possibility, which only works if you have control of all the code being compiled, is to 16-byte align the stack in main and keep it 16-byte aligned thereafter.  Different versions of gcc have used this method and the method Chris suggests.

Still another is to disable generation of SSE instructions.  -mattr=-sse2 should work.  The gcc switch, -mno-sse2, did not work in llvm-gcc last time I tried; this is on my list of things to make work, but fairly far down.


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: stack alignment (again)

Anton Korobeynikov
In reply to this post by Chris Lattner
Hello, Dale

> Another possibility, which only works if you have control of all the
> code being compiled, is to 16-byte align the stack in main and keep it
> 16-byte aligned thereafter.
Yep, this is already done for cygwin/mingw inside llvm codegen. So, only
different callbacks can be broken there.

--
With best regards, Anton Korobeynikov.

Faculty of Mathematics & Mechanics, Saint Petersburg State University.


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: stack alignment (again)

Evan Cheng-2
In reply to this post by Chris Lattner

On Mar 30, 2008, at 10:21 AM, Chris Lattner wrote:
On Mar 28, 2008, at 5:17 PM, Chuck Rose III wrote:
I was curious about the state of stack alignment on x86.  I noticed there are a few bugs outstanding on the issue.  I recently added some code which had the effect of throwing an extra function parameter on our stack at runtime, a 4 byte pointer. 
 
Esp is now not 16-byte aligned, so instructions like unpcklps xmm1, dword ptr [eps] cause grief.  My AllocaInstr instructions are told to be 16 byte aligned, so the addition of a 4-byte parameter shouldn’t have changed alignment on the objects.

Hi Chuck,

I think the basic problem is that the stack pointer on windows/linux is not guaranteed to be 16 byte aligned.  This means that any use of an instruction which requires 16-byte alignment (e.g. sse stuff) and accesses a frameindex can cause a problem.  The issue is that the frameindex will be marked as needing 16+ byte alignment, but the code generator just won't respect this.

The fix for this is somewhat simple: in Prolog/Epilog Insertion, the PEI pass should notice when frame indices have alignment greater than the guaranteed stack alignment.  When this happens, it should emit code into the prolog to dynamically align the stack (e.g. by emitting 'and esp, -16').

This only works if frame pointer is not being omitted. Otherwise, you can't restore to the old stack pointer in the epilogue. So X86RegisterInfo::hasFP() must return true for Windows when any of the frame slots have alignment greater than the default alignment.

Evan



This doesn't occur on the mac, because the stack is always guaranteed to be 16-byte aligned.

-Chris
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev