movaps being generated despite alignment 1 being specified

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

movaps being generated despite alignment 1 being specified

Chuck Rose III

Hello LLVMers,

 

High order bit: 

 

Presence of a called function is causing a store on an unrelated vector to generate an aligned store rather an unaligned one despite unaligned store being indicated in the associated StoreInst.

 

Details:

 

I pulled down the latest source, so this is something I’m finding with the current LLVM.  I’m hoping you’ll have an idea what’s going on or at least know if it’s a new issue I should log.  It’s related to the stack alignment issue that I know is being worked on, but seems sufficiently different to ask about it here.   I checked the bug database for “align” and “movaps” and didn’t see this issue raised.

 

Ok, the first bit of code here seems to generate correct assembly for me.  Basically, it copies the float4 stored at globalV and copies it into the address pointed to by dependentV.  Along the way, it creates a <4 x float> and copies globalV into a temporary.  I’m working on bridging the gap between the outside of our system and the LLVM generated code, so there is a little extra copying from and to parameters at the boundaries of this function.  Since this is just a repro-example, there is very little besides the boundaries here. J  I fully admit the constructions below may not be optimal.

 

   ; ModuleID = 'hydra'

   target datalayout = "E-p:32:32:32-i1:8:8:8-i8:8:8:8-i32:32:32:32-f32:32:32:32"

 

   define void @evaluateDependents(float* %dependentV, float* %globalV) {

   Entry_evaluateDependents:

        %Promoted_dependentV_Ptr = alloca <4 x float>, align 16         ; <<4 x float>*> [#uses=2]

        %Promoted_globalV_Ptr = alloca <4 x float>, align 16            ; <<4 x float>*> [#uses=2]

        %externalVectorPtrCast = bitcast float* %globalV to <4 x float>*                ; <<4 x float>*> [#uses=1]

        %externalVectorLoaded = load <4 x float>* %externalVectorPtrCast, align 1               ; <<4 x float>> [#uses=1]

        store <4 x float> %externalVectorLoaded, <4 x float>* %Promoted_globalV_Ptr, align 1

        %globalV1 = load <4 x float>* %Promoted_globalV_Ptr, align 1            ; <<4 x float>> [#uses=1]

        br label %Body_evaluateDependents

 

   Body_evaluateDependents:             ; preds = %Entry_evaluateDependents

        store <4 x float> %globalV1, <4 x float>* %Promoted_dependentV_Ptr, align 1

        br label %Exit_evaluateDependents

 

   Exit_evaluateDependents:             ; preds = %Body_evaluateDependents

        %vectorToDemote = load <4 x float>* %Promoted_dependentV_Ptr, align 1           ; <<4 x float>> [#uses=1]

        %externalVectorPtrCast2 = bitcast float* %dependentV to <4 x float>*            ; <<4 x float>*> [#uses=1]

        store <4 x float> %vectorToDemote, <4 x float>* %externalVectorPtrCast2, align 1

        ret void

   }

 

Produces these instructions which obeys all the align 1 directives on the LoadInsts and StoreInsts..

 

15D10010  sub         esp,2Ch

15D10013  mov         eax,dword ptr [esp+34h]

15D10017  movups      xmm0,xmmword ptr [eax]

15D1001A  movups      xmmword ptr [esp],xmm0

15D1001E  mov         eax,dword ptr [esp+30h]

15D10022  movups      xmmword ptr [esp+10h],xmm0

15D10027  movups      xmm0,xmmword ptr [esp+10h]

15D1002C  movups      xmmword ptr [eax],xmm0

15D1002F  add         esp,2Ch

15D10032  ret             

 

Here’s where it gets weird and confusing to me.  Let’s make our evaluateDependents function do something else.  In addition to copying globalV into dependentV, it’s also going to set a singleton float pointed to by dependentF.  We’ll call a function foo to get the value.  (I tried setting dependentF directly and that did NOT cause the problem with the generated code).  Here’s the LLVM code:

 

   ; ModuleID = 'hydra'

   target datalayout = "E-p:32:32:32-i1:8:8:8-i8:8:8:8-i32:32:32:32-f32:32:32:32"

 

   define float @foo(float %Y) {

   Entry_foo:

        %_ReturnValuePtr = alloca float         ; <float*> [#uses=2]

        br label %Body_foo

 

   Body_foo:            ; preds = %Entry_foo

        store float %Y, float* %_ReturnValuePtr, align 1

        br label %Exit_foo

 

   Exit_foo:            ; preds = %Body_foo

        %finalValue = load float* %_ReturnValuePtr, align 1             ; <float> [#uses=1]

        ret float %finalValue

   }

 

   define void @evaluateDependents(float* %dependentF, float* %dependentV, float* %globalV) {

   Entry_evaluateDependents:

        %Promoted_dependentV_Ptr = alloca <4 x float>, align 16         ; <<4 x float>*> [#uses=2]

        %Promoted_globalV_Ptr = alloca <4 x float>, align 16            ; <<4 x float>*> [#uses=2]

        %externalVectorPtrCast = bitcast float* %globalV to <4 x float>*                ; <<4 x float>*> [#uses=1]

        %externalVectorLoaded = load <4 x float>* %externalVectorPtrCast, align 1               ; <<4 x float>> [#uses=1]

        store <4 x float> %externalVectorLoaded, <4 x float>* %Promoted_globalV_Ptr, align 1

        %globalV1 = load <4 x float>* %Promoted_globalV_Ptr, align 1            ; <<4 x float>> [#uses=1]

        br label %Body_evaluateDependents

 

   Body_evaluateDependents:             ; preds = %Entry_evaluateDependents

        %fooResult = call float @foo( float 2.000000e+000 )             ; <float> [#uses=1]

        store float %fooResult, float* %dependentF, align 1

        store <4 x float> %globalV1, <4 x float>* %Promoted_dependentV_Ptr, align 1

        br label %Exit_evaluateDependents

 

   Exit_evaluateDependents:             ; preds = %Body_evaluateDependents

        %vectorToDemote = load <4 x float>* %Promoted_dependentV_Ptr, align 1           ; <<4 x float>> [#uses=1]

        %externalVectorPtrCast2 = bitcast float* %dependentV to <4 x float>*            ; <<4 x float>*> [#uses=1]

        store <4 x float> %vectorToDemote, <4 x float>* %externalVectorPtrCast2, align 1

        ret void

   }

 

Here are the instructions for evaluateDependents.  The JITter hasn’t compiled foo yet.  What’s confusing to me is why did my movups suddenly become a movaps?  All the stores and loads have align 1 on them.

 

15D10012  sub         esp,4Ch

15D10015  mov         eax,dword ptr [esp+60h]

15D10019  movups      xmm0,xmmword ptr [eax]

15D1001C  movaps      xmmword ptr [esp+8],xmm0    ß why did this become a movaps?

15D10021  movups      xmmword ptr [esp+28h],xmm0

15D10026  mov         esi,dword ptr [esp+58h]

15D1002A  mov         edi,dword ptr [esp+5Ch]

15D1002E  mov         dword ptr [esp],40000000h

15D10035  call        X86CompilationCallback (1335030h)

 

Thanks for the help!

 

Chuck.


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: movaps being generated despite alignment 1 being specified

Dale Johannesen

On Oct 18, 2007, at 1:52 PM, Chuck Rose III wrote:

High order bit:  

Presence of a called function is causing a store on an unrelated vector to generate an aligned store rather an unaligned one despite unaligned store being indicated in the associated StoreInst.

This probably means the compiler believes the stack pointer is 16-byte aligned in non-leaf functions.
This would be correct if (a) the SP was aligned coming in and (b) the size of the stack decrement
(including return address, etc.) is a multiple of 16.  I haven't been following the Linux problems
closely, but I think "the stack issue being worked on" is that (a) is not always correct?



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: movaps being generated despite alignment 1 being specified

Evan Cheng-2
In reply to this post by Chuck Rose III

On Oct 18, 2007, at 1:52 PM, Chuck Rose III wrote:


Here are the instructions for evaluateDependents.  The JITter hasn’t compiled foo yet.  What’s confusing to me is why did my movups suddenly become a movaps?  All the stores and loads have align 1 on them.


Hi Chuck,

I believe this is a bug but am unable to reproduce it with the test case you've provided. I should be able to see the same problem using llc since the code generator is going through all the same passes. The only difference should be the relocation model.

Please file a bug and provide us with a test case. You should be able to set a break point somewhere in ExecutionEngine.cpp / JIT.cpp and just dump out the bitcode with Module->dump() / print().

Evan

 

15D10012  sub         esp,4Ch

15D10015  mov         eax,dword ptr [esp+60h]

15D10019  movups      xmm0,xmmword ptr [eax]

15D1001C  movaps      xmmword ptr [esp+8],xmm0    ß why did this become a movaps?

15D10021  movups      xmmword ptr [esp+28h],xmm0

15D10026  mov         esi,dword ptr [esp+58h]

15D1002A  mov         edi,dword ptr [esp+5Ch]

15D1002E  mov         dword ptr [esp],40000000h

15D10035  call        X86CompilationCallback (1335030h)

 

Thanks for the help!

 

Chuck.

_______________________________________________
LLVM Developers mailing list


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: movaps being generated despite alignment 1 being specified

Evan Cheng-2
Fixed. See PR1776 and http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20071105/055148.html

Evan

On Oct 18, 2007, at 11:56 PM, Evan Cheng wrote:


On Oct 18, 2007, at 1:52 PM, Chuck Rose III wrote:


Here are the instructions for evaluateDependents.  The JITter hasn’t compiled foo yet.  What’s confusing to me is why did my movups suddenly become a movaps?  All the stores and loads have align 1 on them.


Hi Chuck,

I believe this is a bug but am unable to reproduce it with the test case you've provided. I should be able to see the same problem using llc since the code generator is going through all the same passes. The only difference should be the relocation model.

Please file a bug and provide us with a test case. You should be able to set a break point somewhere in ExecutionEngine.cpp / JIT.cpp and just dump out the bitcode with Module->dump() / print().

Evan

 

15D10012  sub         esp,4Ch

15D10015  mov         eax,dword ptr [esp+60h]

15D10019  movups      xmm0,xmmword ptr [eax]

15D1001C  movaps      xmmword ptr [esp+8],xmm0    ß why did this become a movaps?

15D10021  movups      xmmword ptr [esp+28h],xmm0

15D10026  mov         esi,dword ptr [esp+58h]

15D1002A  mov         edi,dword ptr [esp+5Ch]

15D1002E  mov         dword ptr [esp],40000000h

15D10035  call        X86CompilationCallback (1335030h)

 

Thanks for the help!

 

Chuck.

_______________________________________________
LLVM Developers mailing list

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev