Vector troubles

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Vector troubles

Chuck Rose III

Hola LLVMers,

 

I’m working on engaging SSE via the LLVM vector ops on x86.  I had some questions a while back that you all helped out on, but I’m seeing similar issues and was hoping you’d have some ideas.  Below is the dump of the LLVM IR of a program which is designed to take a vector stored in a float*, build an LLVM vector from it, copy it to another vector, and then take it apart and store it back out in another float*.  This will live on the boundary of our system and would be a function designed to promote a raw, potentially unaligned, value into a vector that the LLVM system can work with a whole bunch. 

 

It is dying trying to store a our working vector into one of the LLVM vectors created on the stack.  Despite the align-16 directive on the alloca instruction, it is not always aligning to a 16-byte boundary. 

 

I did a sync and build this morning, so my LLVM is quite fresh.

 

Thank you for any help!

 

Chuck.

 

My program:

 

   target datalayout = "E-p:32:32:32-i1:8:8:8-i8:8:8:8-i32:32:32:32-f32:32:32:32"

 

   define void @promoteCopyAndReturn(float* %promoteReturn, float* %toPromote) {

   Entry:

        %Promoted_promoteReturn_Ptr = alloca <4 x float>, align 16              ; <<4 x float>*> [#uses=2]

        %Promoted_toPromote_Ptr = alloca <4 x float>, align 16          ; <<4 x float>*> [#uses=2]

        %elemPtr = getelementptr float* %toPromote, i32 0               ; <float*> [#uses=1]

        %elemLoaded = load float* %elemPtr              ; <float> [#uses=1]

        %vectorPromotion = insertelement <4 x float> undef, float %elemLoaded, i32 0            ; <<4 x float>> [#uses=1]

        %elemPtr1 = getelementptr float* %toPromote, i32 1              ; <float*> [#uses=1]

        %elemLoaded2 = load float* %elemPtr1            ; <float> [#uses=1]

        %vectorPromotion3 = insertelement <4 x float> %vectorPromotion, float %elemLoaded2, i32 1               ; <<4 x float>> [#uses=1]

        %elemPtr4 = getelementptr float* %toPromote, i32 2              ; <float*> [#uses=1]

        %elemLoaded5 = load float* %elemPtr4            ; <float> [#uses=1]

        %vectorPromotion6 = insertelement <4 x float> %vectorPromotion3, float %elemLoaded5, i32 2              ; <<4 x float>> [#uses=1]

        %elemPtr7 = getelementptr float* %toPromote, i32 3              ; <float*> [#uses=1]

        %elemLoaded8 = load float* %elemPtr7            ; <float> [#uses=1]

        %vectorPromotion9 = insertelement <4 x float> %vectorPromotion6, float %elemLoaded8, i32 3              ; <<4 x float>> [#uses=1]

        store <4 x float> %vectorPromotion9, <4 x float>* %Promoted_toPromote_Ptr    <<<<<<<<--------  dying when it executes this line (assembly below)

        %toPromote10 = load <4 x float>* %Promoted_toPromote_Ptr                ; <<4 x float>> [#uses=1]

        br label %Body

 

   Body:             ; preds = %Entry

        store <4 x float> %toPromote10, <4 x float>* %Promoted_promoteReturn_Ptr

        br label %Exit

 

   Exit:             ; preds = %Body

        %vectorToDemote = load <4 x float>* %Promoted_promoteReturn_Ptr         ; <<4 x float>> [#uses=4]

        %elemToDemote = extractelement <4 x float> %vectorToDemote, i32 0               ; <float> [#uses=1]

        %elemPtr11 = getelementptr float* %promoteReturn, i32 0         ; <float*> [#uses=1]

        store float %elemToDemote, float* %elemPtr11

        %elemToDemote12 = extractelement <4 x float> %vectorToDemote, i32 1             ; <float> [#uses=1]

        %elemPtr13 = getelementptr float* %promoteReturn, i32 1         ; <float*> [#uses=1]

        store float %elemToDemote12, float* %elemPtr13

        %elemToDemote14 = extractelement <4 x float> %vectorToDemote, i32 2             ; <float> [#uses=1]

        %elemPtr15 = getelementptr float* %promoteReturn, i32 2         ; <float*> [#uses=1]

        store float %elemToDemote14, float* %elemPtr15

        %elemToDemote16 = extractelement <4 x float> %vectorToDemote, i32 3             ; <float> [#uses=1]

        %elemPtr17 = getelementptr float* %promoteReturn, i32 3         ; <float*> [#uses=1]

        store float %elemToDemote16, float* %elemPtr17

        ret void

   }

 

Assembler (intel format):

 

15c00010 83ec2c          sub     esp,2Ch

15c00013 8b442434        mov     eax,dword ptr [esp+34h]

15c00017 f30f10400c      movss   xmm0,dword ptr [eax+0Ch]

15c0001c f30f104804      movss   xmm1,dword ptr [eax+4]

15c00021 0f14c8          unpcklps xmm1,xmm0

15c00024 f30f104008      movss   xmm0,dword ptr [eax+8]

15c00029 f30f1010        movss   xmm2,dword ptr [eax]

15c0002d 0f14d0          unpcklps xmm2,xmm0

15c00030 0f14d1          unpcklps xmm2,xmm1

15c00033 0f291424        movaps  xmmword ptr [esp],xmm2 ss:0023:0012f238=0012f2580122ef730000000100000000

 

The relevant registers:

 

Xmm2 8.000000e+000: 4.000000e+000: 2.000000e+000: 1.000000e+000    // the vector got nicely constructed

Esp 12f238    // but it has noplace to go and throws a general-protection exception.


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Vector troubles

Chris Lattner
On Sep 28, 2007, at 2:31 PM, Chuck Rose III wrote:

Hola LLVMers,

Hey Chuck,

I'm not certain (Evan and Anton should chime in :), but here is some info:

 I’m working on engaging SSE via the LLVM vector ops on x86.  I had some questions a while back that you all helped out on, but I’m seeing similar issues and was hoping you’d have some ideas.  Below is the dump of the LLVM IR of a program which is designed to take a vector stored in a float*, build an LLVM vector from it, copy it to another vector, and then take it apart and store it back out in another float*.  This will live on the boundary of our system and would be a function designed to promote a raw, potentially unaligned, value into a vector that the LLVM system can work with a whole bunch. 

Two issues with alignment come to mind.  First, LLVM has some issues apparently still on systems that don't have a 16-byte aligned stack:  http://llvm.org/bugs/show_bug.cgi?id=1649

The other issue can be that you're emitting an LLVM load to a pointer that is not on the stack and that doesn't have the right alignment.  In this case, a movaps will be generated and you'll get a fault.  In this case, you can mark the load as having an alignment of one byte, and the codegen will produce movups instead.  Using this is generally more efficient than doing 4 scalar loads and insertelements.

It is dying trying to store a our working vector into one of the LLVM vectors created on the stack.  Despite the align-16 directive on the alloca instruction, it is not always aligning to a 16-byte boundary. 

This sounds like the bugzilla entry.

-Chris



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Vector troubles

Anton Korobeynikov
In reply to this post by Chuck Rose III
Chuck,

> It is dying trying to store a our working vector into one of the LLVM
> vectors created on the stack.  Despite the align-16 directive on the
> alloca instruction, it is not always aligning to a 16-byte boundary.  
The stack is not necessary 16 bytes aligned on linux/windows. The vector
is really sotred aligned relative to %esp, but %esp value is not good.
This is known problem (PR1636 / PR1649) and I'm currently working on the
solution (actually - stack realignment).

--
With best regards, Anton Korobeynikov.

Faculty of Mathematics & Mechanics, Saint Petersburg State University.


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Vector troubles

Daniel Berlin
On 9/28/07, Anton Korobeynikov <[hidden email]> wrote:
> Chuck,
>
> > It is dying trying to store a our working vector into one of the LLVM
> > vectors created on the stack.  Despite the align-16 directive on the
> > alloca instruction, it is not always aligning to a 16-byte boundary.
> The stack is not necessary 16 bytes aligned on linux/windows.

On recent versions of linux (anything in the past 2 years), the stack
will be aligned by gcc, the kernel, and glibc, in all the right
functions.

So unless you misalign it, ...
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Vector troubles

Evan Cheng-2
What is gcc's caller stack alignment assumption on Linux? Unless it's  
16 byte or more, the callee will have to dynamically align the stack.

Evan

On Sep 28, 2007, at 5:49 PM, Daniel Berlin <[hidden email]> wrote:

> On 9/28/07, Anton Korobeynikov <[hidden email]> wrote:
>> Chuck,
>>
>>> It is dying trying to store a our working vector into one of the  
>>> LLVM
>>> vectors created on the stack.  Despite the align-16 directive on the
>>> alloca instruction, it is not always aligning to a 16-byte boundary.
>> The stack is not necessary 16 bytes aligned on linux/windows.
>
> On recent versions of linux (anything in the past 2 years), the stack
> will be aligned by gcc, the kernel, and glibc, in all the right
> functions.
>
> So unless you misalign it, ...
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Vector troubles

Daniel Johansson-2
In reply to this post by Chuck Rose III
Chuck Rose III wrote:
>
> Hola LLVMers,
>
Hi Chuck,
>
> It is dying trying to store a our working vector into one of the LLVM
> vectors created on the stack.  Despite the align-16 directive on the
> alloca instruction, it is not always aligning to a 16-byte boundary.
>
I also encountered this problem, and temporarily worked around this
problem by using the __fastcall calling convention and aligning the
stack pointer to a 16 byte boundary just before the function call... i.e
something like:

ASSERT(fcnMain->getCallingConv() == llvm::CallingConv::X86_FastCall);
float (__fastcall *fcnMainPtr)(void*) = (float (__fastcall
*)(void*))ctx->executionEngine().getPointerToFunction(fcnMain);

void* params = inputParams.get();
u32 oldStackPtr(0);
_asm
{
    mov oldStackPtr, esp
    and esp, 0xfffffff0
}
m_data[i] = fcnMainPtr(params);
_asm
{
    mov esp, oldStackPtr
}

This is clearly not platform independent (and also rather hacky), so a
proper fix would be really nice indeed. I currently develop on windows,
using MSVC 8.

Cheers,

-- Daniel Johansson

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Vector troubles

Daniel Berlin
In reply to this post by Evan Cheng-2
We force 16 byte alignment in main, and keep it through all gcc
compiled functions.

glibc < 2.4 don't reliably keep stack at 16 bytes through some calls
(qsort, etc), but otherwise, it stays 16 byte aligned.

On 9/29/07, Evan Cheng <[hidden email]> wrote:

> What is gcc's caller stack alignment assumption on Linux? Unless it's
> 16 byte or more, the callee will have to dynamically align the stack.
>
> Evan
>
> On Sep 28, 2007, at 5:49 PM, Daniel Berlin <[hidden email]> wrote:
>
> > On 9/28/07, Anton Korobeynikov <[hidden email]> wrote:
> >> Chuck,
> >>
> >>> It is dying trying to store a our working vector into one of the
> >>> LLVM
> >>> vectors created on the stack.  Despite the align-16 directive on the
> >>> alloca instruction, it is not always aligning to a 16-byte boundary.
> >> The stack is not necessary 16 bytes aligned on linux/windows.
> >
> > On recent versions of linux (anything in the past 2 years), the stack
> > will be aligned by gcc, the kernel, and glibc, in all the right
> > functions.
> >
> > So unless you misalign it, ...
> > _______________________________________________
> > LLVM Developers mailing list
> > [hidden email]         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Vector troubles

Anton Korobeynikov
In reply to this post by Evan Cheng-2
Hello, Daniel.

> glibc < 2.4 don't reliably keep stack at 16 bytes through some calls
> (qsort, etc), but otherwise, it stays 16 byte aligned.
Interesting, but why in this case stuff like 'force_align_arg_pointer'
required?

--
With best regards, Anton Korobeynikov.

Faculty of Mathematics & Mechanics, Saint Petersburg State University.


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Vector troubles

Daniel Berlin
If you mix with older gcc versions (say 2.95), they will default to a
4 byte aligned stack, not a 16 byte one.

See
http://gcc.gnu.org/ml/gcc-patches/2006-02/txt00052.txt


On 9/30/07, Anton Korobeynikov <[hidden email]> wrote:

> Hello, Daniel.
>
> > glibc < 2.4 don't reliably keep stack at 16 bytes through some calls
> > (qsort, etc), but otherwise, it stays 16 byte aligned.
> Interesting, but why in this case stuff like 'force_align_arg_pointer'
> required?
>
> --
> With best regards, Anton Korobeynikov.
>
> Faculty of Mathematics & Mechanics, Saint Petersburg State University.
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Vector troubles

Evan Cheng-2
In reply to this post by Anton Korobeynikov
You can always ask for > 16 byte stack alignment. :-)

Evan

On Sep 30, 2007, at 10:47 AM, Anton Korobeynikov wrote:

> Hello, Daniel.
>
>> glibc < 2.4 don't reliably keep stack at 16 bytes through some calls
>> (qsort, etc), but otherwise, it stays 16 byte aligned.
> Interesting, but why in this case stuff like 'force_align_arg_pointer'
> required?
>
> --
> With best regards, Anton Korobeynikov.
>
> Faculty of Mathematics & Mechanics, Saint Petersburg State University.
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Vector troubles

Chuck Rose III
I tried to ask for 32 and that didn't seem to help.  MallocInst also
seemed to ignore the 16 byte directive.  For now, I'm just issuing all
my loads as unaligned and that's working ok.  

Thanks,
Chuck.

-----Original Message-----
From: [hidden email] [mailto:[hidden email]]
On Behalf Of Evan Cheng
Sent: Monday, October 01, 2007 10:35 AM
To: [hidden email]; LLVM Developers Mailing List
Subject: Re: [LLVMdev] Vector troubles

You can always ask for > 16 byte stack alignment. :-)

Evan

On Sep 30, 2007, at 10:47 AM, Anton Korobeynikov wrote:

> Hello, Daniel.
>
>> glibc < 2.4 don't reliably keep stack at 16 bytes through some calls
>> (qsort, etc), but otherwise, it stays 16 byte aligned.
> Interesting, but why in this case stuff like 'force_align_arg_pointer'
> required?
>
> --
> With best regards, Anton Korobeynikov.
>
> Faculty of Mathematics & Mechanics, Saint Petersburg State University.
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Vector troubles

Evan Cheng-2
That's not what I meant. I was replying to Anton that  
force_align_arg_pointer is still needed even if stack alignment is  
already 16-byte.

Evan

On Oct 1, 2007, at 11:39 AM, Chuck Rose III wrote:

> I tried to ask for 32 and that didn't seem to help.  MallocInst also
> seemed to ignore the 16 byte directive.  For now, I'm just issuing all
> my loads as unaligned and that's working ok.
>
> Thanks,
> Chuck.
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]]
> On Behalf Of Evan Cheng
> Sent: Monday, October 01, 2007 10:35 AM
> To: [hidden email]; LLVM Developers Mailing List
> Subject: Re: [LLVMdev] Vector troubles
>
> You can always ask for > 16 byte stack alignment. :-)
>
> Evan
>
> On Sep 30, 2007, at 10:47 AM, Anton Korobeynikov wrote:
>
>> Hello, Daniel.
>>
>>> glibc < 2.4 don't reliably keep stack at 16 bytes through some calls
>>> (qsort, etc), but otherwise, it stays 16 byte aligned.
>> Interesting, but why in this case stuff like  
>> 'force_align_arg_pointer'
>> required?
>>
>> --
>> With best regards, Anton Korobeynikov.
>>
>> Faculty of Mathematics & Mechanics, Saint Petersburg State  
>> University.
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev