Seg faulting on vector ops

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Seg faulting on vector ops

Chuck Rose III

Hola LLVMers,

 

I’m looking to make use of the vectorization primitives in the Intel chip with the code we generate from LLVM and so I’ve started experimenting with it.  What is the state of the machine code generated for vectors?  In my tinkering, I seem to be getting some wonky machine instructions, but I’m most likely just doing something wrong and I’m hoping you can set me in the correct course.

 

My minimal function creates a float4 vector with a specified scalar in all the elements.  It then extracts the third element and returns it.

 

We are currently using the JIT and I’m currently synced to about a week after the 2.0 branch, so I’m admittedly stale by about a month.

 

In LLVM IR:

 

; ModuleID = 'test vectors'

 

define float @vSelect3(float %x) {

body:

        %pv = alloca <4 x float>                ; <<4 x float>*> [#uses=1]

        %v = load <4 x float>* %pv              ; <<4 x float>> [#uses=1]

        %v1 = insertelement <4 x float> %v, float %x, i32 0             ; <<4 x

float>> [#uses=1]

        %v2 = insertelement <4 x float> %v1, float %x, i32 1            ; <<4 x

float>> [#uses=1]

        %v3 = insertelement <4 x float> %v2, float %x, i32 2            ; <<4 x

float>> [#uses=1]

        %v4 = insertelement <4 x float> %v3, float %x, i32 3            ; <<4 x

float>> [#uses=1]

        %s = extractelement <4 x float> %v4, i32 3              ; <float> [#uses

=1]

        ret float %s

}

 

In Intel assembly, I get the following:

 

00000000`01b80010 83ec20          sub     esp,20h

00000000`01b80013 f30f10442424    movss   xmm0,dword ptr [esp+24h]   ß this loads x into the low float of xmm0

00000000`01b80019 0f284c2404      movaps  xmm1,xmmword ptr [esp+4]   ß this seg faults because esp+4 isn’t 16-byte aligned

What is that line trying to achieve?  X is at [esp+24].  There weren’t any other parameters. 

 

00000000`01b8001e f30f10c8        movss   xmm1,xmm0

00000000`01b80022 8b442424        mov     eax,dword ptr [esp+24h]

00000000`01b80026 660fc4c802      pinsrw  xmm1,eax,2

00000000`01b8002b 89c1            mov     ecx,eax

00000000`01b8002d c1e910          shr     ecx,10h

00000000`01b80030 660fc4c903      pinsrw  xmm1,ecx,3

00000000`01b80035 660fc4c804      pinsrw  xmm1,eax,4

00000000`01b8003a 660fc4c905      pinsrw  xmm1,ecx,5

00000000`01b8003f 660fc4c806      pinsrw  xmm1,eax,6

00000000`01b80044 660fc4c907      pinsrw  xmm1,ecx,7

00000000`01b80049 0fc6c903        shufps  xmm1,xmm1,3

00000000`01b8004d f30f110c24      movss   dword ptr [esp],xmm1

00000000`01b80052 d90424          fld     dword ptr [esp]

00000000`01b80055 83c420          add     esp,20h

00000000`01b80058 c3              ret

 

The code used to generate and run the program was:

 

#include "llvm/Module.h"

#include "llvm/DerivedTypes.h"

#include "llvm/Constants.h"

#include "llvm/Instructions.h"

#include "llvm/ModuleProvider.h"

#include "llvm/Analysis/Verifier.h"

#include "llvm/System/DynamicLibrary.h"

#include "llvm/ExecutionEngine/JIT.h"

#include "llvm/ExecutionEngine/Interpreter.h"

#include "llvm/ExecutionEngine/GenericValue.h"

#include "llvm/Support/ManagedStatic.h"

#include <iostream>

using namespace llvm;

 

Value* makeVector(Value* s, unsigned int dim, BasicBlock* basicBlock)

{

    AllocaInst* pV = new AllocaInst(VectorType::get(Type::FloatTy,dim),"pv",basicBlock);

    Value* v = new LoadInst(pV,"v",basicBlock);

 

    for (unsigned int i = 0 ; i < dim ; ++i)

        v = new InsertElementInst(v,s,i,"v",basicBlock);

 

    return v;

}

 

Function* generateVectorAndSelect(Module* pModule)

{

    std::vector<Type const*> params;

 

    params.push_back(Type::FloatTy);

 

    FunctionType* funcType = FunctionType::get(Type::FloatTy,params,NULL);

    Function* func = cast<Function>(pModule->getOrInsertFunction("vSelect3",funcType));

 

    BasicBlock* basicBlock = new BasicBlock("body",func);

 

    Function::arg_iterator args = func->arg_begin();

    Argument* x = args;

    x->setName("x");

   

    Value* v1 = makeVector(x,4,basicBlock);

   

    Value* s = new ExtractElementInst(v1,3,"s",basicBlock);

 

    new ReturnInst(s,basicBlock);

 

    return func;

}

 

// modified from the fibonacci example

int main(int argc, char **argv)

{

    Module* pVectorModule = new Module("test vectors");

 

    Function* pMain = generateVectorAndSelect(pVectorModule);

 

    pVectorModule->print(std::cout);

 

    GenericValue gv1, gv2, gvR;

 

    gv1.FloatVal = 2.0f;

 

    ExistingModuleProvider *pMP = new ExistingModuleProvider(pVectorModule);

    pMP->getModule()->setDataLayout("e-p:32:32:32-i1:8:8:8-i8:8:8:8-i32:32:32:32-f32:32:32:32");

    ExecutionEngine *pEE = ExecutionEngine::create(pMP, false);

 

    std::vector<GenericValue> args;

 

    args.push_back(gv1);

 

    GenericValue result = pEE->runFunction(pMain, args);

 

    return 0;

}

 

 

Any help would be appreciated. 

Thanks,

Chuck.


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Seg faulting on vector ops

Evan Cheng-2
Hi Chuck!

On Jul 20, 2007, at 11:36 AM, Chuck Rose III wrote:

Hola LLVMers,

 

I’m looking to make use of the vectorization primitives in the Intel chip with the code we generate from LLVM and so I’ve started experimenting with it.  What is the state of the machine code generated for vectors?  In my tinkering, I seem to be getting some wonky machine instructions, but I’m most likely just doing something wrong and I’m hoping you can set me in the correct course.


Using SSE? The X86 backend is usually doing a pretty good job of it. 

 

My minimal function creates a float4 vector with a specified scalar in all the elements.  It then extracts the third element and returns it.

 

We are currently using the JIT and I’m currently synced to about a week after the 2.0 branch, so I’m admittedly stale by about a month.

 

In LLVM IR:

 

; ModuleID = 'test vectors'

 

define float @vSelect3(float %x) {

body:

        %pv = alloca <4 x float>                ; <<4 x float>*> [#uses=1]

        %v = load <4 x float>* %pv              ; <<4 x float>> [#uses=1]

        %v1 = insertelement <4 x float> %v, float %x, i32 0             ; <<4 x


You are allocating a chunk of memory on the stack then loading the undefined value back. I suppose this should be legal. So perhaps there is a codegen bug. With tot, I see sub $28, %esp. Maybe that's already fixed.

But still, this is not what you want. You want to do this:

        %v1 = insertelement <4 x float> undef, float %x, i32 0             ; <<4 x float>> [#uses=1]                                                      
        %v2 = insertelement <4 x float> %v1, float %x, i32 1            ; <<4 x float>> [#uses=1]                                                         
        %v3 = insertelement <4 x float> %v2, float %x, i32 2            ; <<4 xfloat>> [#uses=1]                                                          
        %v4 = insertelement <4 x float> %v3, float %x, i32 3 

Starting from an undef and insert elements to form a vector.

Hope that helps.

Evan


float>> [#uses=1]

        %v2 = insertelement <4 x float> %v1, float %x, i32 1            ; <<4 x

float>> [#uses=1]

        %v3 = insertelement <4 x float> %v2, float %x, i32 2            ; <<4 x

float>> [#uses=1]

        %v4 = insertelement <4 x float> %v3, float %x, i32 3            ; <<4 x

float>> [#uses=1]

        %s = extractelement <4 x float> %v4, i32 3              ; <float> [#uses

=1]

        ret float %s

}

 

In Intel assembly, I get the following:

 

00000000`01b80010 83ec20          sub     esp,20h

00000000`01b80013 f30f10442424    movss   xmm0,dword ptr [esp+24h]   ß this loads x into the low float of xmm0

00000000`01b80019 0f284c2404      movaps  xmm1,xmmword ptr [esp+4]   ß this seg faults because esp+4 isn’t 16-byte aligned

What is that line trying to achieve?  X is at [esp+24].  There weren’t any other parameters. 

 

00000000`01b8001e f30f10c8        movss   xmm1,xmm0

00000000`01b80022 8b442424        mov     eax,dword ptr [esp+24h]

00000000`01b80026 660fc4c802      pinsrw  xmm1,eax,2

00000000`01b8002b 89c1            mov     ecx,eax

00000000`01b8002d c1e910          shr     ecx,10h

00000000`01b80030 660fc4c903      pinsrw  xmm1,ecx,3

00000000`01b80035 660fc4c804      pinsrw  xmm1,eax,4

00000000`01b8003a 660fc4c905      pinsrw  xmm1,ecx,5

00000000`01b8003f 660fc4c806      pinsrw  xmm1,eax,6

00000000`01b80044 660fc4c907      pinsrw  xmm1,ecx,7

00000000`01b80049 0fc6c903        shufps  xmm1,xmm1,3

00000000`01b8004d f30f110c24      movss   dword ptr [esp],xmm1

00000000`01b80052 d90424          fld     dword ptr [esp]

00000000`01b80055 83c420          add     esp,20h

00000000`01b80058 c3              ret

 

The code used to generate and run the program was:

 

#include "llvm/Module.h"

#include "llvm/DerivedTypes.h"

#include "llvm/Constants.h"

#include "llvm/Instructions.h"

#include "llvm/ModuleProvider.h"

#include "llvm/Analysis/Verifier.h"

#include "llvm/System/DynamicLibrary.h"

#include "llvm/ExecutionEngine/JIT.h"

#include "llvm/ExecutionEngine/Interpreter.h"

#include "llvm/ExecutionEngine/GenericValue.h"

#include "llvm/Support/ManagedStatic.h"

#include <iostream>

using namespace llvm;

 

Value* makeVector(Value* s, unsigned int dim, BasicBlock* basicBlock)

{

    AllocaInst* pV = new AllocaInst(VectorType::get(Type::FloatTy,dim),"pv",basicBlock);

    Value* v = new LoadInst(pV,"v",basicBlock);

 

    for (unsigned int i = 0 ; i < dim ; ++i)

        v = new InsertElementInst(v,s,i,"v",basicBlock);

 

    return v;

}

 

Function* generateVectorAndSelect(Module* pModule)

{

    std::vector<Type const*> params;

 

    params.push_back(Type::FloatTy);

 

    FunctionType* funcType = FunctionType::get(Type::FloatTy,params,NULL);

    Function* func = cast<Function>(pModule->getOrInsertFunction("vSelect3",funcType));

 

    BasicBlock* basicBlock = new BasicBlock("body",func);

 

    Function::arg_iterator args = func->arg_begin();

    Argument* x = args;

    x->setName("x");

   

    Value* v1 = makeVector(x,4,basicBlock);

   

    Value* s = new ExtractElementInst(v1,3,"s",basicBlock);

 

    new ReturnInst(s,basicBlock);

 

    return func;

}

 

// modified from the fibonacci example

int main(int argc, char **argv)

{

    Module* pVectorModule = new Module("test vectors");

 

    Function* pMain = generateVectorAndSelect(pVectorModule);

 

    pVectorModule->print(std::cout);

 

    GenericValue gv1, gv2, gvR;

 

    gv1.FloatVal = 2.0f;

 

    ExistingModuleProvider *pMP = new ExistingModuleProvider(pVectorModule);

    pMP->getModule()->setDataLayout("e-p:32:32:32-i1:8:8:8-i8:8:8:8-i32:32:32:32-f32:32:32:32");

    ExecutionEngine *pEE = ExecutionEngine::create(pMP, false);

 

    std::vector<GenericValue> args;

 

    args.push_back(gv1);

 

    GenericValue result = pEE->runFunction(pMain, args);

 

    return 0;

}

 

 

Any help would be appreciated. 

Thanks,

Chuck.

_______________________________________________
LLVM Developers mailing list


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Seg faulting on vector ops

Chuck Rose III

Switching to the undef value got my experiment working.  I’ll be updating to the SVN trunk next week before I begin making much wider use of the vector code.  I’ll be sure to email if I still see the problem with the version which used the alloca instead of the undef.

 

Thanks for the help Evan!

 


From: [hidden email] [mailto:[hidden email]] On Behalf Of Evan Cheng
Sent: Friday, July 20, 2007 2:11 PM
To: LLVM Developers Mailing List
Subject: Re: [LLVMdev] Seg faulting on vector ops

 

Hi Chuck!

 

On Jul 20, 2007, at 11:36 AM, Chuck Rose III wrote:



Hola LLVMers,

 

I’m looking to make use of the vectorization primitives in the Intel chip with the code we generate from LLVM and so I’ve started experimenting with it.  What is the state of the machine code generated for vectors?  In my tinkering, I seem to be getting some wonky machine instructions, but I’m most likely just doing something wrong and I’m hoping you can set me in the correct course.

 

Using SSE? The X86 backend is usually doing a pretty good job of it. 

 

My minimal function creates a float4 vector with a specified scalar in all the elements.  It then extracts the third element and returns it.

 

We are currently using the JIT and I’m currently synced to about a week after the 2.0 branch, so I’m admittedly stale by about a month.

 

In LLVM IR:

 

; ModuleID = 'test vectors'

 

define float @vSelect3(float %x) {

body:

        %pv = alloca <4 x float>                ; <<4 x float>*> [#uses=1]

        %v = load <4 x float>* %pv              ; <<4 x float>> [#uses=1]

        %v1 = insertelement <4 x float> %v, float %x, i32 0             ; <<4 x

 

You are allocating a chunk of memory on the stack then loading the undefined value back. I suppose this should be legal. So perhaps there is a codegen bug. With tot, I see sub $28, %esp. Maybe that's already fixed.

 

But still, this is not what you want. You want to do this:

 

        %v1 = insertelement <4 x float> undef, float %x, i32 0             ; <<4 x float>> [#uses=1]                                                      

        %v2 = insertelement <4 x float> %v1, float %x, i32 1            ; <<4 x float>> [#uses=1]                                                         

        %v3 = insertelement <4 x float> %v2, float %x, i32 2            ; <<4 xfloat>> [#uses=1]                                                          

        %v4 = insertelement <4 x float> %v3, float %x, i32 3 

 

Starting from an undef and insert elements to form a vector.

 

Hope that helps.

 

Evan

 



float>> [#uses=1]

        %v2 = insertelement <4 x float> %v1, float %x, i32 1            ; <<4 x

float>> [#uses=1]

        %v3 = insertelement <4 x float> %v2, float %x, i32 2            ; <<4 x

float>> [#uses=1]

        %v4 = insertelement <4 x float> %v3, float %x, i32 3            ; <<4 x

float>> [#uses=1]

        %s = extractelement <4 x float> %v4, i32 3              ; <float> [#uses

=1]

        ret float %s

}

 

In Intel assembly, I get the following:

 

00000000`01b80010 83ec20          sub     esp,20h

00000000`01b80013 f30f10442424    movss   xmm0,dword ptr [esp+24h]   ß this loads x into the low float of xmm0

00000000`01b80019 0f284c2404      movaps  xmm1,xmmword ptr [esp+4]   ß this seg faults because esp+4 isn’t 16-byte aligned

What is that line trying to achieve?  X is at [esp+24].  There weren’t any other parameters. 

 

00000000`01b8001e f30f10c8        movss   xmm1,xmm0

00000000`01b80022 8b442424        mov     eax,dword ptr [esp+24h]

00000000`01b80026 660fc4c802      pinsrw  xmm1,eax,2

00000000`01b8002b 89c1            mov     ecx,eax

00000000`01b8002d c1e910          shr     ecx,10h

00000000`01b80030 660fc4c903      pinsrw  xmm1,ecx,3

00000000`01b80035 660fc4c804      pinsrw  xmm1,eax,4

00000000`01b8003a 660fc4c905      pinsrw  xmm1,ecx,5

00000000`01b8003f 660fc4c806      pinsrw  xmm1,eax,6

00000000`01b80044 660fc4c907      pinsrw  xmm1,ecx,7

00000000`01b80049 0fc6c903        shufps  xmm1,xmm1,3

00000000`01b8004d f30f110c24      movss   dword ptr [esp],xmm1

00000000`01b80052 d90424          fld     dword ptr [esp]

00000000`01b80055 83c420          add     esp,20h

00000000`01b80058 c3              ret

 

The code used to generate and run the program was:

 

#include "llvm/Module.h"

#include "llvm/DerivedTypes.h"

#include "llvm/Constants.h"

#include "llvm/Instructions.h"

#include "llvm/ModuleProvider.h"

#include "llvm/Analysis/Verifier.h"

#include "llvm/System/DynamicLibrary.h"

#include "llvm/ExecutionEngine/JIT.h"

#include "llvm/ExecutionEngine/Interpreter.h"

#include "llvm/ExecutionEngine/GenericValue.h"

#include "llvm/Support/ManagedStatic.h"

#include <iostream>

using namespace llvm;

 

Value* makeVector(Value* s, unsigned int dim, BasicBlock* basicBlock)

{

    AllocaInst* pV = new AllocaInst(VectorType::get(Type::FloatTy,dim),"pv",basicBlock);

    Value* v = new LoadInst(pV,"v",basicBlock);

 

    for (unsigned int i = 0 ; i < dim ; ++i)

        v = new InsertElementInst(v,s,i,"v",basicBlock);

 

    return v;

}

 

Function* generateVectorAndSelect(Module* pModule)

{

    std::vector<Type const*> params;

 

    params.push_back(Type::FloatTy);

 

    FunctionType* funcType = FunctionType::get(Type::FloatTy,params,NULL);

    Function* func = cast<Function>(pModule->getOrInsertFunction("vSelect3",funcType));

 

    BasicBlock* basicBlock = new BasicBlock("body",func);

 

    Function::arg_iterator args = func->arg_begin();

    Argument* x = args;

    x->setName("x");

   

    Value* v1 = makeVector(x,4,basicBlock);

   

    Value* s = new ExtractElementInst(v1,3,"s",basicBlock);

 

    new ReturnInst(s,basicBlock);

 

    return func;

}

 

// modified from the fibonacci example

int main(int argc, char **argv)

{

    Module* pVectorModule = new Module("test vectors");

 

    Function* pMain = generateVectorAndSelect(pVectorModule);

 

    pVectorModule->print(std::cout);

 

    GenericValue gv1, gv2, gvR;

 

    gv1.FloatVal = 2.0f;

 

    ExistingModuleProvider *pMP = new ExistingModuleProvider(pVectorModule);

    pMP->getModule()->setDataLayout("e-p:32:32:32-i1:8:8:8-i8:8:8:8-i32:32:32:32-f32:32:32:32");

    ExecutionEngine *pEE = ExecutionEngine::create(pMP, false);

 

    std::vector<GenericValue> args;

 

    args.push_back(gv1);

 

    GenericValue result = pEE->runFunction(pMain, args);

 

    return 0;

}

 

 

Any help would be appreciated. 

Thanks,

Chuck.

_______________________________________________

LLVM Developers mailing list

 


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Seg faulting on vector ops

Chris Lattner
In reply to this post by Chuck Rose III
On Fri, 20 Jul 2007, Chuck Rose III wrote:
> I'm looking to make use of the vectorization primitives in the Intel
> chip with the code we generate from LLVM and so I've started
> experimenting with it.  What is the state of the machine code generated
> for vectors?  In my tinkering, I seem to be getting some wonky machine
> instructions, but I'm most likely just doing something wrong and I'm
> hoping you can set me in the correct course.

Hi Chuck,

Evan's solution is the right one.  However, your code is valid, so it
shouldn't crash.  I think it dies because linux does not guarantee that
the stack is 16 byte aligned, and the vector operations expect this.  The
code generator should compensate and dynamically align the stack on entry
to the function.  This should be a relatively straight-forward extension
to the x86 backend if you're interested.

-Chris

> My minimal function creates a float4 vector with a specified scalar in
> all the elements.  It then extracts the third element and returns it.
>
>
>
> We are currently using the JIT and I'm currently synced to about a week
> after the 2.0 branch, so I'm admittedly stale by about a month.
>
>
>
> In LLVM IR:
>
>
>
> ; ModuleID = 'test vectors'
>
>
>
> define float @vSelect3(float %x) {
>
> body:
>
>        %pv = alloca <4 x float>                ; <<4 x float>*>
> [#uses=1]
>
>        %v = load <4 x float>* %pv              ; <<4 x float>>
> [#uses=1]
>
>        %v1 = insertelement <4 x float> %v, float %x, i32 0
> ; <<4 x
>
> float>> [#uses=1]
>
>        %v2 = insertelement <4 x float> %v1, float %x, i32 1
> ; <<4 x
>
> float>> [#uses=1]
>
>        %v3 = insertelement <4 x float> %v2, float %x, i32 2
> ; <<4 x
>
> float>> [#uses=1]
>
>        %v4 = insertelement <4 x float> %v3, float %x, i32 3
> ; <<4 x
>
> float>> [#uses=1]
>
>        %s = extractelement <4 x float> %v4, i32 3              ;
> <float> [#uses
>
> =1]
>
>        ret float %s
>
> }
>
>
>
> In Intel assembly, I get the following:
>
>
>
> 00000000`01b80010 83ec20          sub     esp,20h
>
> 00000000`01b80013 f30f10442424    movss   xmm0,dword ptr [esp+24h]   <--
> this loads x into the low float of xmm0
>
> 00000000`01b80019 0f284c2404      movaps  xmm1,xmmword ptr [esp+4]   <--
> this seg faults because esp+4 isn't 16-byte aligned
>
> What is that line trying to achieve?  X is at [esp+24].  There weren't
> any other parameters.
>
>
>
> 00000000`01b8001e f30f10c8        movss   xmm1,xmm0
>
> 00000000`01b80022 8b442424        mov     eax,dword ptr [esp+24h]
>
> 00000000`01b80026 660fc4c802      pinsrw  xmm1,eax,2
>
> 00000000`01b8002b 89c1            mov     ecx,eax
>
> 00000000`01b8002d c1e910          shr     ecx,10h
>
> 00000000`01b80030 660fc4c903      pinsrw  xmm1,ecx,3
>
> 00000000`01b80035 660fc4c804      pinsrw  xmm1,eax,4
>
> 00000000`01b8003a 660fc4c905      pinsrw  xmm1,ecx,5
>
> 00000000`01b8003f 660fc4c806      pinsrw  xmm1,eax,6
>
> 00000000`01b80044 660fc4c907      pinsrw  xmm1,ecx,7
>
> 00000000`01b80049 0fc6c903        shufps  xmm1,xmm1,3
>
> 00000000`01b8004d f30f110c24      movss   dword ptr [esp],xmm1
>
> 00000000`01b80052 d90424          fld     dword ptr [esp]
>
> 00000000`01b80055 83c420          add     esp,20h
>
> 00000000`01b80058 c3              ret
>
>
>
> The code used to generate and run the program was:
>
>
>
> #include "llvm/Module.h"
>
> #include "llvm/DerivedTypes.h"
>
> #include "llvm/Constants.h"
>
> #include "llvm/Instructions.h"
>
> #include "llvm/ModuleProvider.h"
>
> #include "llvm/Analysis/Verifier.h"
>
> #include "llvm/System/DynamicLibrary.h"
>
> #include "llvm/ExecutionEngine/JIT.h"
>
> #include "llvm/ExecutionEngine/Interpreter.h"
>
> #include "llvm/ExecutionEngine/GenericValue.h"
>
> #include "llvm/Support/ManagedStatic.h"
>
> #include <iostream>
>
> using namespace llvm;
>
>
>
> Value* makeVector(Value* s, unsigned int dim, BasicBlock* basicBlock)
>
> {
>
>    AllocaInst* pV = new
> AllocaInst(VectorType::get(Type::FloatTy,dim),"pv",basicBlock);
>
>    Value* v = new LoadInst(pV,"v",basicBlock);
>
>
>
>    for (unsigned int i = 0 ; i < dim ; ++i)
>
>        v = new InsertElementInst(v,s,i,"v",basicBlock);
>
>
>
>    return v;
>
> }
>
>
>
> Function* generateVectorAndSelect(Module* pModule)
>
> {
>
>    std::vector<Type const*> params;
>
>
>
>    params.push_back(Type::FloatTy);
>
>
>
>    FunctionType* funcType =
> FunctionType::get(Type::FloatTy,params,NULL);
>
>    Function* func =
> cast<Function>(pModule->getOrInsertFunction("vSelect3",funcType));
>
>
>
>    BasicBlock* basicBlock = new BasicBlock("body",func);
>
>
>
>    Function::arg_iterator args = func->arg_begin();
>
>    Argument* x = args;
>
>    x->setName("x");
>
>
>
>    Value* v1 = makeVector(x,4,basicBlock);
>
>
>
>    Value* s = new ExtractElementInst(v1,3,"s",basicBlock);
>
>
>
>    new ReturnInst(s,basicBlock);
>
>
>
>    return func;
>
> }
>
>
>
> // modified from the fibonacci example
>
> int main(int argc, char **argv)
>
> {
>
>    Module* pVectorModule = new Module("test vectors");
>
>
>
>    Function* pMain = generateVectorAndSelect(pVectorModule);
>
>
>
>    pVectorModule->print(std::cout);
>
>
>
>    GenericValue gv1, gv2, gvR;
>
>
>
>    gv1.FloatVal = 2.0f;
>
>
>
>    ExistingModuleProvider *pMP = new
> ExistingModuleProvider(pVectorModule);
>
>
> pMP->getModule()->setDataLayout("e-p:32:32:32-i1:8:8:8-i8:8:8:8-i32:32:3
> 2:32-f32:32:32:32");
>
>    ExecutionEngine *pEE = ExecutionEngine::create(pMP, false);
>
>
>
>    std::vector<GenericValue> args;
>
>
>
>    args.push_back(gv1);
>
>
>
>    GenericValue result = pEE->runFunction(pMain, args);
>
>
>
>    return 0;
>
> }
>
>
>
>
>
> Any help would be appreciated.
>
> .
>
> Thanks,
>
> Chuck.
>
>

-Chris

--
http://nondot.org/sabre/
http://llvm.org/
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Seg faulting on vector ops

Christophe Avoinne
In reply to this post by Chuck Rose III
On 21/07/2007 23:51:28, Chris Lattner ([hidden email]) wrote:
 > Evan's solution is the right one. However, your code is valid, so it
 > shouldn't
 > crash. I think it dies because linux does not guarantee that
 > the stack is 16 byte aligned, and the vector operations expect this. The
 > code generator should compensate and dynamically align the stack on
entry
 > to the function. This should be a relatively straight-forward extension
 > to the x86 backend if you're interested.

in fact, if llvm is able to warant (?) that the stack pointer is aligned
to 16-byte, there is no reason to DYNAMICALLY align the stack at all
entries because thoses operations can be expensive for some
architectures. What you need is to align the stack on entry of the
"main" function. Of course, if you are mixing calls of llvm functions in
non-llvm functions, that's a problem and you probably need to use
dynamical alignments.
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Seg faulting on vector ops

Chris Lattner
On Sun, 22 Jul 2007, Christophe Avoinne wrote:
> "main" function. Of course, if you are mixing calls of llvm functions in
> non-llvm functions, that's a problem and you probably need to use
> dynamical alignments.

Yes, we support the native platform ABI so we can intermix llvm code with
code compiled by platform compilers.

-Chris

--
http://nondot.org/sabre/
http://llvm.org/
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Seg faulting on vector ops

Chuck Rose III
In reply to this post by Chris Lattner
Hola Chris,

I'm seeing this on windows, so I suspect the non-stack alignment thing
is a wider problem.  My first repro case (which was bigger) would take
in an array, copy it to the vector that was created with the alloca
instruction, etc.  For that, I had to ensure that my array was 16-byte
aligned, which is fair enough, but it would still generate some broken
code around the alignment of the allocated object.

I'd like to take a further look, but I can't guarantee I'll make any
quick progress.  The depths of the compiler are still somewhat new to
me.  I mostly work in the IR / JIT level with forays into the machine
code generation when things go awry, but I would like to learn more.

Thanks,
Chuck.

-----Original Message-----
From: [hidden email] [mailto:[hidden email]]
On Behalf Of Chris Lattner
Sent: Saturday, July 21, 2007 2:51 PM
To: LLVM Developers Mailing List
Subject: Re: [LLVMdev] Seg faulting on vector ops

On Fri, 20 Jul 2007, Chuck Rose III wrote:
> I'm looking to make use of the vectorization primitives in the Intel
> chip with the code we generate from LLVM and so I've started
> experimenting with it.  What is the state of the machine code
generated
> for vectors?  In my tinkering, I seem to be getting some wonky machine
> instructions, but I'm most likely just doing something wrong and I'm
> hoping you can set me in the correct course.

Hi Chuck,

Evan's solution is the right one.  However, your code is valid, so it
shouldn't crash.  I think it dies because linux does not guarantee that
the stack is 16 byte aligned, and the vector operations expect this.
The
code generator should compensate and dynamically align the stack on
entry
to the function.  This should be a relatively straight-forward extension

to the x86 backend if you're interested.

-Chris

> My minimal function creates a float4 vector with a specified scalar in
> all the elements.  It then extracts the third element and returns it.
>
>
>
> We are currently using the JIT and I'm currently synced to about a
week

> after the 2.0 branch, so I'm admittedly stale by about a month.
>
>
>
> In LLVM IR:
>
>
>
> ; ModuleID = 'test vectors'
>
>
>
> define float @vSelect3(float %x) {
>
> body:
>
>        %pv = alloca <4 x float>                ; <<4 x float>*>
> [#uses=1]
>
>        %v = load <4 x float>* %pv              ; <<4 x float>>
> [#uses=1]
>
>        %v1 = insertelement <4 x float> %v, float %x, i32 0
> ; <<4 x
>
> float>> [#uses=1]
>
>        %v2 = insertelement <4 x float> %v1, float %x, i32 1
> ; <<4 x
>
> float>> [#uses=1]
>
>        %v3 = insertelement <4 x float> %v2, float %x, i32 2
> ; <<4 x
>
> float>> [#uses=1]
>
>        %v4 = insertelement <4 x float> %v3, float %x, i32 3
> ; <<4 x
>
> float>> [#uses=1]
>
>        %s = extractelement <4 x float> %v4, i32 3              ;
> <float> [#uses
>
> =1]
>
>        ret float %s
>
> }
>
>
>
> In Intel assembly, I get the following:
>
>
>
> 00000000`01b80010 83ec20          sub     esp,20h
>
> 00000000`01b80013 f30f10442424    movss   xmm0,dword ptr [esp+24h]
<--
> this loads x into the low float of xmm0
>
> 00000000`01b80019 0f284c2404      movaps  xmm1,xmmword ptr [esp+4]
<--

> this seg faults because esp+4 isn't 16-byte aligned
>
> What is that line trying to achieve?  X is at [esp+24].  There weren't
> any other parameters.
>
>
>
> 00000000`01b8001e f30f10c8        movss   xmm1,xmm0
>
> 00000000`01b80022 8b442424        mov     eax,dword ptr [esp+24h]
>
> 00000000`01b80026 660fc4c802      pinsrw  xmm1,eax,2
>
> 00000000`01b8002b 89c1            mov     ecx,eax
>
> 00000000`01b8002d c1e910          shr     ecx,10h
>
> 00000000`01b80030 660fc4c903      pinsrw  xmm1,ecx,3
>
> 00000000`01b80035 660fc4c804      pinsrw  xmm1,eax,4
>
> 00000000`01b8003a 660fc4c905      pinsrw  xmm1,ecx,5
>
> 00000000`01b8003f 660fc4c806      pinsrw  xmm1,eax,6
>
> 00000000`01b80044 660fc4c907      pinsrw  xmm1,ecx,7
>
> 00000000`01b80049 0fc6c903        shufps  xmm1,xmm1,3
>
> 00000000`01b8004d f30f110c24      movss   dword ptr [esp],xmm1
>
> 00000000`01b80052 d90424          fld     dword ptr [esp]
>
> 00000000`01b80055 83c420          add     esp,20h
>
> 00000000`01b80058 c3              ret
>
>
>
> The code used to generate and run the program was:
>
>
>
> #include "llvm/Module.h"
>
> #include "llvm/DerivedTypes.h"
>
> #include "llvm/Constants.h"
>
> #include "llvm/Instructions.h"
>
> #include "llvm/ModuleProvider.h"
>
> #include "llvm/Analysis/Verifier.h"
>
> #include "llvm/System/DynamicLibrary.h"
>
> #include "llvm/ExecutionEngine/JIT.h"
>
> #include "llvm/ExecutionEngine/Interpreter.h"
>
> #include "llvm/ExecutionEngine/GenericValue.h"
>
> #include "llvm/Support/ManagedStatic.h"
>
> #include <iostream>
>
> using namespace llvm;
>
>
>
> Value* makeVector(Value* s, unsigned int dim, BasicBlock* basicBlock)
>
> {
>
>    AllocaInst* pV = new
> AllocaInst(VectorType::get(Type::FloatTy,dim),"pv",basicBlock);
>
>    Value* v = new LoadInst(pV,"v",basicBlock);
>
>
>
>    for (unsigned int i = 0 ; i < dim ; ++i)
>
>        v = new InsertElementInst(v,s,i,"v",basicBlock);
>
>
>
>    return v;
>
> }
>
>
>
> Function* generateVectorAndSelect(Module* pModule)
>
> {
>
>    std::vector<Type const*> params;
>
>
>
>    params.push_back(Type::FloatTy);
>
>
>
>    FunctionType* funcType =
> FunctionType::get(Type::FloatTy,params,NULL);
>
>    Function* func =
> cast<Function>(pModule->getOrInsertFunction("vSelect3",funcType));
>
>
>
>    BasicBlock* basicBlock = new BasicBlock("body",func);
>
>
>
>    Function::arg_iterator args = func->arg_begin();
>
>    Argument* x = args;
>
>    x->setName("x");
>
>
>
>    Value* v1 = makeVector(x,4,basicBlock);
>
>
>
>    Value* s = new ExtractElementInst(v1,3,"s",basicBlock);
>
>
>
>    new ReturnInst(s,basicBlock);
>
>
>
>    return func;
>
> }
>
>
>
> // modified from the fibonacci example
>
> int main(int argc, char **argv)
>
> {
>
>    Module* pVectorModule = new Module("test vectors");
>
>
>
>    Function* pMain = generateVectorAndSelect(pVectorModule);
>
>
>
>    pVectorModule->print(std::cout);
>
>
>
>    GenericValue gv1, gv2, gvR;
>
>
>
>    gv1.FloatVal = 2.0f;
>
>
>
>    ExistingModuleProvider *pMP = new
> ExistingModuleProvider(pVectorModule);
>
>
>
pMP->getModule()->setDataLayout("e-p:32:32:32-i1:8:8:8-i8:8:8:8-i32:32:3

> 2:32-f32:32:32:32");
>
>    ExecutionEngine *pEE = ExecutionEngine::create(pMP, false);
>
>
>
>    std::vector<GenericValue> args;
>
>
>
>    args.push_back(gv1);
>
>
>
>    GenericValue result = pEE->runFunction(pMain, args);
>
>
>
>    return 0;
>
> }
>
>
>
>
>
> Any help would be appreciated.
>
> .
>
> Thanks,
>
> Chuck.
>
>

-Chris

--
http://nondot.org/sabre/
http://llvm.org/
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Seg faulting on vector ops

Evan Cheng-2
In reply to this post by Chris Lattner
Hrm. This problem shouldn't be target specific. I am pretty sure  
prologue / epilogue inserter aligns stack correctly if there are  
stack objects with greater than default stack alignment requirement.  
Seems to be the initial alloca() instruction should specify 16 byte  
alignment?

Evan

On Jul 21, 2007, at 2:51 PM, Chris Lattner wrote:

> On Fri, 20 Jul 2007, Chuck Rose III wrote:
>> I'm looking to make use of the vectorization primitives in the Intel
>> chip with the code we generate from LLVM and so I've started
>> experimenting with it.  What is the state of the machine code  
>> generated
>> for vectors?  In my tinkering, I seem to be getting some wonky  
>> machine
>> instructions, but I'm most likely just doing something wrong and I'm
>> hoping you can set me in the correct course.
>
> Hi Chuck,
>
> Evan's solution is the right one.  However, your code is valid, so it
> shouldn't crash.  I think it dies because linux does not guarantee  
> that
> the stack is 16 byte aligned, and the vector operations expect  
> this.  The
> code generator should compensate and dynamically align the stack on  
> entry
> to the function.  This should be a relatively straight-forward  
> extension
> to the x86 backend if you're interested.
>
> -Chris
>
>> My minimal function creates a float4 vector with a specified  
>> scalar in
>> all the elements.  It then extracts the third element and returns it.
>>
>>
>>
>> We are currently using the JIT and I'm currently synced to about a  
>> week
>> after the 2.0 branch, so I'm admittedly stale by about a month.
>>
>>
>>
>> In LLVM IR:
>>
>>
>>
>> ; ModuleID = 'test vectors'
>>
>>
>>
>> define float @vSelect3(float %x) {
>>
>> body:
>>
>>        %pv = alloca <4 x float>                ; <<4 x float>*>
>> [#uses=1]
>>
>>        %v = load <4 x float>* %pv              ; <<4 x float>>
>> [#uses=1]
>>
>>        %v1 = insertelement <4 x float> %v, float %x, i32 0
>> ; <<4 x
>>
>> float>> [#uses=1]
>>
>>        %v2 = insertelement <4 x float> %v1, float %x, i32 1
>> ; <<4 x
>>
>> float>> [#uses=1]
>>
>>        %v3 = insertelement <4 x float> %v2, float %x, i32 2
>> ; <<4 x
>>
>> float>> [#uses=1]
>>
>>        %v4 = insertelement <4 x float> %v3, float %x, i32 3
>> ; <<4 x
>>
>> float>> [#uses=1]
>>
>>        %s = extractelement <4 x float> %v4, i32 3              ;
>> <float> [#uses
>>
>> =1]
>>
>>        ret float %s
>>
>> }
>>
>>
>>
>> In Intel assembly, I get the following:
>>
>>
>>
>> 00000000`01b80010 83ec20          sub     esp,20h
>>
>> 00000000`01b80013 f30f10442424    movss   xmm0,dword ptr [esp
>> +24h]   <--
>> this loads x into the low float of xmm0
>>
>> 00000000`01b80019 0f284c2404      movaps  xmm1,xmmword ptr [esp
>> +4]   <--
>> this seg faults because esp+4 isn't 16-byte aligned
>>
>> What is that line trying to achieve?  X is at [esp+24].  There  
>> weren't
>> any other parameters.
>>
>>
>>
>> 00000000`01b8001e f30f10c8        movss   xmm1,xmm0
>>
>> 00000000`01b80022 8b442424        mov     eax,dword ptr [esp+24h]
>>
>> 00000000`01b80026 660fc4c802      pinsrw  xmm1,eax,2
>>
>> 00000000`01b8002b 89c1            mov     ecx,eax
>>
>> 00000000`01b8002d c1e910          shr     ecx,10h
>>
>> 00000000`01b80030 660fc4c903      pinsrw  xmm1,ecx,3
>>
>> 00000000`01b80035 660fc4c804      pinsrw  xmm1,eax,4
>>
>> 00000000`01b8003a 660fc4c905      pinsrw  xmm1,ecx,5
>>
>> 00000000`01b8003f 660fc4c806      pinsrw  xmm1,eax,6
>>
>> 00000000`01b80044 660fc4c907      pinsrw  xmm1,ecx,7
>>
>> 00000000`01b80049 0fc6c903        shufps  xmm1,xmm1,3
>>
>> 00000000`01b8004d f30f110c24      movss   dword ptr [esp],xmm1
>>
>> 00000000`01b80052 d90424          fld     dword ptr [esp]
>>
>> 00000000`01b80055 83c420          add     esp,20h
>>
>> 00000000`01b80058 c3              ret
>>
>>
>>
>> The code used to generate and run the program was:
>>
>>
>>
>> #include "llvm/Module.h"
>>
>> #include "llvm/DerivedTypes.h"
>>
>> #include "llvm/Constants.h"
>>
>> #include "llvm/Instructions.h"
>>
>> #include "llvm/ModuleProvider.h"
>>
>> #include "llvm/Analysis/Verifier.h"
>>
>> #include "llvm/System/DynamicLibrary.h"
>>
>> #include "llvm/ExecutionEngine/JIT.h"
>>
>> #include "llvm/ExecutionEngine/Interpreter.h"
>>
>> #include "llvm/ExecutionEngine/GenericValue.h"
>>
>> #include "llvm/Support/ManagedStatic.h"
>>
>> #include <iostream>
>>
>> using namespace llvm;
>>
>>
>>
>> Value* makeVector(Value* s, unsigned int dim, BasicBlock* basicBlock)
>>
>> {
>>
>>    AllocaInst* pV = new
>> AllocaInst(VectorType::get(Type::FloatTy,dim),"pv",basicBlock);
>>
>>    Value* v = new LoadInst(pV,"v",basicBlock);
>>
>>
>>
>>    for (unsigned int i = 0 ; i < dim ; ++i)
>>
>>        v = new InsertElementInst(v,s,i,"v",basicBlock);
>>
>>
>>
>>    return v;
>>
>> }
>>
>>
>>
>> Function* generateVectorAndSelect(Module* pModule)
>>
>> {
>>
>>    std::vector<Type const*> params;
>>
>>
>>
>>    params.push_back(Type::FloatTy);
>>
>>
>>
>>    FunctionType* funcType =
>> FunctionType::get(Type::FloatTy,params,NULL);
>>
>>    Function* func =
>> cast<Function>(pModule->getOrInsertFunction("vSelect3",funcType));
>>
>>
>>
>>    BasicBlock* basicBlock = new BasicBlock("body",func);
>>
>>
>>
>>    Function::arg_iterator args = func->arg_begin();
>>
>>    Argument* x = args;
>>
>>    x->setName("x");
>>
>>
>>
>>    Value* v1 = makeVector(x,4,basicBlock);
>>
>>
>>
>>    Value* s = new ExtractElementInst(v1,3,"s",basicBlock);
>>
>>
>>
>>    new ReturnInst(s,basicBlock);
>>
>>
>>
>>    return func;
>>
>> }
>>
>>
>>
>> // modified from the fibonacci example
>>
>> int main(int argc, char **argv)
>>
>> {
>>
>>    Module* pVectorModule = new Module("test vectors");
>>
>>
>>
>>    Function* pMain = generateVectorAndSelect(pVectorModule);
>>
>>
>>
>>    pVectorModule->print(std::cout);
>>
>>
>>
>>    GenericValue gv1, gv2, gvR;
>>
>>
>>
>>    gv1.FloatVal = 2.0f;
>>
>>
>>
>>    ExistingModuleProvider *pMP = new
>> ExistingModuleProvider(pVectorModule);
>>
>>
>> pMP->getModule()->setDataLayout("e-p:32:32:32-i1:8:8:8-i8:8:8:8-
>> i32:32:3
>> 2:32-f32:32:32:32");
>>
>>    ExecutionEngine *pEE = ExecutionEngine::create(pMP, false);
>>
>>
>>
>>    std::vector<GenericValue> args;
>>
>>
>>
>>    args.push_back(gv1);
>>
>>
>>
>>    GenericValue result = pEE->runFunction(pMain, args);
>>
>>
>>
>>    return 0;
>>
>> }
>>
>>
>>
>>
>>
>> Any help would be appreciated.
>>
>> .
>>
>> Thanks,
>>
>> Chuck.
>>
>>
>
> -Chris
>
> --
> http://nondot.org/sabre/
> http://llvm.org/
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Seg faulting on vector ops

Evan Cheng-2
I am fairly certain this is right. Chuck, can you do a quick  
experiment for me? Go back to your original code but make sure the  
alloca instruction specify 16-byte alignment. The code should work.  
If not, please file a bug.

Thanks,

Evan

On Jul 24, 2007, at 1:58 PM, Evan Cheng wrote:

> Hrm. This problem shouldn't be target specific. I am pretty sure
> prologue / epilogue inserter aligns stack correctly if there are
> stack objects with greater than default stack alignment requirement.
> Seems to be the initial alloca() instruction should specify 16 byte
> alignment?
>
> Evan
>
> On Jul 21, 2007, at 2:51 PM, Chris Lattner wrote:
>
>> On Fri, 20 Jul 2007, Chuck Rose III wrote:
>>> I'm looking to make use of the vectorization primitives in the Intel
>>> chip with the code we generate from LLVM and so I've started
>>> experimenting with it.  What is the state of the machine code
>>> generated
>>> for vectors?  In my tinkering, I seem to be getting some wonky
>>> machine
>>> instructions, but I'm most likely just doing something wrong and I'm
>>> hoping you can set me in the correct course.
>>
>> Hi Chuck,
>>
>> Evan's solution is the right one.  However, your code is valid, so it
>> shouldn't crash.  I think it dies because linux does not guarantee
>> that
>> the stack is 16 byte aligned, and the vector operations expect
>> this.  The
>> code generator should compensate and dynamically align the stack on
>> entry
>> to the function.  This should be a relatively straight-forward
>> extension
>> to the x86 backend if you're interested.
>>
>> -Chris
>>
>>> My minimal function creates a float4 vector with a specified
>>> scalar in
>>> all the elements.  It then extracts the third element and returns  
>>> it.
>>>
>>>
>>>
>>> We are currently using the JIT and I'm currently synced to about a
>>> week
>>> after the 2.0 branch, so I'm admittedly stale by about a month.
>>>
>>>
>>>
>>> In LLVM IR:
>>>
>>>
>>>
>>> ; ModuleID = 'test vectors'
>>>
>>>
>>>
>>> define float @vSelect3(float %x) {
>>>
>>> body:
>>>
>>>        %pv = alloca <4 x float>                ; <<4 x float>*>
>>> [#uses=1]
>>>
>>>        %v = load <4 x float>* %pv              ; <<4 x float>>
>>> [#uses=1]
>>>
>>>        %v1 = insertelement <4 x float> %v, float %x, i32 0
>>> ; <<4 x
>>>
>>> float>> [#uses=1]
>>>
>>>        %v2 = insertelement <4 x float> %v1, float %x, i32 1
>>> ; <<4 x
>>>
>>> float>> [#uses=1]
>>>
>>>        %v3 = insertelement <4 x float> %v2, float %x, i32 2
>>> ; <<4 x
>>>
>>> float>> [#uses=1]
>>>
>>>        %v4 = insertelement <4 x float> %v3, float %x, i32 3
>>> ; <<4 x
>>>
>>> float>> [#uses=1]
>>>
>>>        %s = extractelement <4 x float> %v4, i32 3              ;
>>> <float> [#uses
>>>
>>> =1]
>>>
>>>        ret float %s
>>>
>>> }
>>>
>>>
>>>
>>> In Intel assembly, I get the following:
>>>
>>>
>>>
>>> 00000000`01b80010 83ec20          sub     esp,20h
>>>
>>> 00000000`01b80013 f30f10442424    movss   xmm0,dword ptr [esp
>>> +24h]   <--
>>> this loads x into the low float of xmm0
>>>
>>> 00000000`01b80019 0f284c2404      movaps  xmm1,xmmword ptr [esp
>>> +4]   <--
>>> this seg faults because esp+4 isn't 16-byte aligned
>>>
>>> What is that line trying to achieve?  X is at [esp+24].  There
>>> weren't
>>> any other parameters.
>>>
>>>
>>>
>>> 00000000`01b8001e f30f10c8        movss   xmm1,xmm0
>>>
>>> 00000000`01b80022 8b442424        mov     eax,dword ptr [esp+24h]
>>>
>>> 00000000`01b80026 660fc4c802      pinsrw  xmm1,eax,2
>>>
>>> 00000000`01b8002b 89c1            mov     ecx,eax
>>>
>>> 00000000`01b8002d c1e910          shr     ecx,10h
>>>
>>> 00000000`01b80030 660fc4c903      pinsrw  xmm1,ecx,3
>>>
>>> 00000000`01b80035 660fc4c804      pinsrw  xmm1,eax,4
>>>
>>> 00000000`01b8003a 660fc4c905      pinsrw  xmm1,ecx,5
>>>
>>> 00000000`01b8003f 660fc4c806      pinsrw  xmm1,eax,6
>>>
>>> 00000000`01b80044 660fc4c907      pinsrw  xmm1,ecx,7
>>>
>>> 00000000`01b80049 0fc6c903        shufps  xmm1,xmm1,3
>>>
>>> 00000000`01b8004d f30f110c24      movss   dword ptr [esp],xmm1
>>>
>>> 00000000`01b80052 d90424          fld     dword ptr [esp]
>>>
>>> 00000000`01b80055 83c420          add     esp,20h
>>>
>>> 00000000`01b80058 c3              ret
>>>
>>>
>>>
>>> The code used to generate and run the program was:
>>>
>>>
>>>
>>> #include "llvm/Module.h"
>>>
>>> #include "llvm/DerivedTypes.h"
>>>
>>> #include "llvm/Constants.h"
>>>
>>> #include "llvm/Instructions.h"
>>>
>>> #include "llvm/ModuleProvider.h"
>>>
>>> #include "llvm/Analysis/Verifier.h"
>>>
>>> #include "llvm/System/DynamicLibrary.h"
>>>
>>> #include "llvm/ExecutionEngine/JIT.h"
>>>
>>> #include "llvm/ExecutionEngine/Interpreter.h"
>>>
>>> #include "llvm/ExecutionEngine/GenericValue.h"
>>>
>>> #include "llvm/Support/ManagedStatic.h"
>>>
>>> #include <iostream>
>>>
>>> using namespace llvm;
>>>
>>>
>>>
>>> Value* makeVector(Value* s, unsigned int dim, BasicBlock*  
>>> basicBlock)
>>>
>>> {
>>>
>>>    AllocaInst* pV = new
>>> AllocaInst(VectorType::get(Type::FloatTy,dim),"pv",basicBlock);
>>>
>>>    Value* v = new LoadInst(pV,"v",basicBlock);
>>>
>>>
>>>
>>>    for (unsigned int i = 0 ; i < dim ; ++i)
>>>
>>>        v = new InsertElementInst(v,s,i,"v",basicBlock);
>>>
>>>
>>>
>>>    return v;
>>>
>>> }
>>>
>>>
>>>
>>> Function* generateVectorAndSelect(Module* pModule)
>>>
>>> {
>>>
>>>    std::vector<Type const*> params;
>>>
>>>
>>>
>>>    params.push_back(Type::FloatTy);
>>>
>>>
>>>
>>>    FunctionType* funcType =
>>> FunctionType::get(Type::FloatTy,params,NULL);
>>>
>>>    Function* func =
>>> cast<Function>(pModule->getOrInsertFunction("vSelect3",funcType));
>>>
>>>
>>>
>>>    BasicBlock* basicBlock = new BasicBlock("body",func);
>>>
>>>
>>>
>>>    Function::arg_iterator args = func->arg_begin();
>>>
>>>    Argument* x = args;
>>>
>>>    x->setName("x");
>>>
>>>
>>>
>>>    Value* v1 = makeVector(x,4,basicBlock);
>>>
>>>
>>>
>>>    Value* s = new ExtractElementInst(v1,3,"s",basicBlock);
>>>
>>>
>>>
>>>    new ReturnInst(s,basicBlock);
>>>
>>>
>>>
>>>    return func;
>>>
>>> }
>>>
>>>
>>>
>>> // modified from the fibonacci example
>>>
>>> int main(int argc, char **argv)
>>>
>>> {
>>>
>>>    Module* pVectorModule = new Module("test vectors");
>>>
>>>
>>>
>>>    Function* pMain = generateVectorAndSelect(pVectorModule);
>>>
>>>
>>>
>>>    pVectorModule->print(std::cout);
>>>
>>>
>>>
>>>    GenericValue gv1, gv2, gvR;
>>>
>>>
>>>
>>>    gv1.FloatVal = 2.0f;
>>>
>>>
>>>
>>>    ExistingModuleProvider *pMP = new
>>> ExistingModuleProvider(pVectorModule);
>>>
>>>
>>> pMP->getModule()->setDataLayout("e-p:32:32:32-i1:8:8:8-i8:8:8:8-
>>> i32:32:3
>>> 2:32-f32:32:32:32");
>>>
>>>    ExecutionEngine *pEE = ExecutionEngine::create(pMP, false);
>>>
>>>
>>>
>>>    std::vector<GenericValue> args;
>>>
>>>
>>>
>>>    args.push_back(gv1);
>>>
>>>
>>>
>>>    GenericValue result = pEE->runFunction(pMain, args);
>>>
>>>
>>>
>>>    return 0;
>>>
>>> }
>>>
>>>
>>>
>>>
>>>
>>> Any help would be appreciated.
>>>
>>> .
>>>
>>> Thanks,
>>>
>>> Chuck.
>>>
>>>
>>
>> -Chris
>>
>> --
>> http://nondot.org/sabre/
>> http://llvm.org/
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Seg faulting on vector ops

Chuck Rose III
Hola Evan,

With the latest source and with the alignment, it worked and put the
memory on the correct boundary.  

Thanks,
Chuck.



-----Original Message-----
From: [hidden email] [mailto:[hidden email]]
On Behalf Of Evan Cheng
Sent: Thursday, July 26, 2007 12:15 AM
To: LLVM Developers Mailing List
Subject: Re: [LLVMdev] Seg faulting on vector ops

I am fairly certain this is right. Chuck, can you do a quick  
experiment for me? Go back to your original code but make sure the  
alloca instruction specify 16-byte alignment. The code should work.  
If not, please file a bug.

Thanks,

Evan

On Jul 24, 2007, at 1:58 PM, Evan Cheng wrote:

> Hrm. This problem shouldn't be target specific. I am pretty sure
> prologue / epilogue inserter aligns stack correctly if there are
> stack objects with greater than default stack alignment requirement.
> Seems to be the initial alloca() instruction should specify 16 byte
> alignment?
>
> Evan
>
> On Jul 21, 2007, at 2:51 PM, Chris Lattner wrote:
>
>> On Fri, 20 Jul 2007, Chuck Rose III wrote:
>>> I'm looking to make use of the vectorization primitives in the Intel
>>> chip with the code we generate from LLVM and so I've started
>>> experimenting with it.  What is the state of the machine code
>>> generated
>>> for vectors?  In my tinkering, I seem to be getting some wonky
>>> machine
>>> instructions, but I'm most likely just doing something wrong and I'm
>>> hoping you can set me in the correct course.
>>
>> Hi Chuck,
>>
>> Evan's solution is the right one.  However, your code is valid, so it
>> shouldn't crash.  I think it dies because linux does not guarantee
>> that
>> the stack is 16 byte aligned, and the vector operations expect
>> this.  The
>> code generator should compensate and dynamically align the stack on
>> entry
>> to the function.  This should be a relatively straight-forward
>> extension
>> to the x86 backend if you're interested.
>>
>> -Chris
>>
>>> My minimal function creates a float4 vector with a specified
>>> scalar in
>>> all the elements.  It then extracts the third element and returns  
>>> it.
>>>
>>>
>>>
>>> We are currently using the JIT and I'm currently synced to about a
>>> week
>>> after the 2.0 branch, so I'm admittedly stale by about a month.
>>>
>>>
>>>
>>> In LLVM IR:
>>>
>>>
>>>
>>> ; ModuleID = 'test vectors'
>>>
>>>
>>>
>>> define float @vSelect3(float %x) {
>>>
>>> body:
>>>
>>>        %pv = alloca <4 x float>                ; <<4 x float>*>
>>> [#uses=1]
>>>
>>>        %v = load <4 x float>* %pv              ; <<4 x float>>
>>> [#uses=1]
>>>
>>>        %v1 = insertelement <4 x float> %v, float %x, i32 0
>>> ; <<4 x
>>>
>>> float>> [#uses=1]
>>>
>>>        %v2 = insertelement <4 x float> %v1, float %x, i32 1
>>> ; <<4 x
>>>
>>> float>> [#uses=1]
>>>
>>>        %v3 = insertelement <4 x float> %v2, float %x, i32 2
>>> ; <<4 x
>>>
>>> float>> [#uses=1]
>>>
>>>        %v4 = insertelement <4 x float> %v3, float %x, i32 3
>>> ; <<4 x
>>>
>>> float>> [#uses=1]
>>>
>>>        %s = extractelement <4 x float> %v4, i32 3              ;
>>> <float> [#uses
>>>
>>> =1]
>>>
>>>        ret float %s
>>>
>>> }
>>>
>>>
>>>
>>> In Intel assembly, I get the following:
>>>
>>>
>>>
>>> 00000000`01b80010 83ec20          sub     esp,20h
>>>
>>> 00000000`01b80013 f30f10442424    movss   xmm0,dword ptr [esp
>>> +24h]   <--
>>> this loads x into the low float of xmm0
>>>
>>> 00000000`01b80019 0f284c2404      movaps  xmm1,xmmword ptr [esp
>>> +4]   <--
>>> this seg faults because esp+4 isn't 16-byte aligned
>>>
>>> What is that line trying to achieve?  X is at [esp+24].  There
>>> weren't
>>> any other parameters.
>>>
>>>
>>>
>>> 00000000`01b8001e f30f10c8        movss   xmm1,xmm0
>>>
>>> 00000000`01b80022 8b442424        mov     eax,dword ptr [esp+24h]
>>>
>>> 00000000`01b80026 660fc4c802      pinsrw  xmm1,eax,2
>>>
>>> 00000000`01b8002b 89c1            mov     ecx,eax
>>>
>>> 00000000`01b8002d c1e910          shr     ecx,10h
>>>
>>> 00000000`01b80030 660fc4c903      pinsrw  xmm1,ecx,3
>>>
>>> 00000000`01b80035 660fc4c804      pinsrw  xmm1,eax,4
>>>
>>> 00000000`01b8003a 660fc4c905      pinsrw  xmm1,ecx,5
>>>
>>> 00000000`01b8003f 660fc4c806      pinsrw  xmm1,eax,6
>>>
>>> 00000000`01b80044 660fc4c907      pinsrw  xmm1,ecx,7
>>>
>>> 00000000`01b80049 0fc6c903        shufps  xmm1,xmm1,3
>>>
>>> 00000000`01b8004d f30f110c24      movss   dword ptr [esp],xmm1
>>>
>>> 00000000`01b80052 d90424          fld     dword ptr [esp]
>>>
>>> 00000000`01b80055 83c420          add     esp,20h
>>>
>>> 00000000`01b80058 c3              ret
>>>
>>>
>>>
>>> The code used to generate and run the program was:
>>>
>>>
>>>
>>> #include "llvm/Module.h"
>>>
>>> #include "llvm/DerivedTypes.h"
>>>
>>> #include "llvm/Constants.h"
>>>
>>> #include "llvm/Instructions.h"
>>>
>>> #include "llvm/ModuleProvider.h"
>>>
>>> #include "llvm/Analysis/Verifier.h"
>>>
>>> #include "llvm/System/DynamicLibrary.h"
>>>
>>> #include "llvm/ExecutionEngine/JIT.h"
>>>
>>> #include "llvm/ExecutionEngine/Interpreter.h"
>>>
>>> #include "llvm/ExecutionEngine/GenericValue.h"
>>>
>>> #include "llvm/Support/ManagedStatic.h"
>>>
>>> #include <iostream>
>>>
>>> using namespace llvm;
>>>
>>>
>>>
>>> Value* makeVector(Value* s, unsigned int dim, BasicBlock*  
>>> basicBlock)
>>>
>>> {
>>>
>>>    AllocaInst* pV = new
>>> AllocaInst(VectorType::get(Type::FloatTy,dim),"pv",basicBlock);
>>>
>>>    Value* v = new LoadInst(pV,"v",basicBlock);
>>>
>>>
>>>
>>>    for (unsigned int i = 0 ; i < dim ; ++i)
>>>
>>>        v = new InsertElementInst(v,s,i,"v",basicBlock);
>>>
>>>
>>>
>>>    return v;
>>>
>>> }
>>>
>>>
>>>
>>> Function* generateVectorAndSelect(Module* pModule)
>>>
>>> {
>>>
>>>    std::vector<Type const*> params;
>>>
>>>
>>>
>>>    params.push_back(Type::FloatTy);
>>>
>>>
>>>
>>>    FunctionType* funcType =
>>> FunctionType::get(Type::FloatTy,params,NULL);
>>>
>>>    Function* func =
>>> cast<Function>(pModule->getOrInsertFunction("vSelect3",funcType));
>>>
>>>
>>>
>>>    BasicBlock* basicBlock = new BasicBlock("body",func);
>>>
>>>
>>>
>>>    Function::arg_iterator args = func->arg_begin();
>>>
>>>    Argument* x = args;
>>>
>>>    x->setName("x");
>>>
>>>
>>>
>>>    Value* v1 = makeVector(x,4,basicBlock);
>>>
>>>
>>>
>>>    Value* s = new ExtractElementInst(v1,3,"s",basicBlock);
>>>
>>>
>>>
>>>    new ReturnInst(s,basicBlock);
>>>
>>>
>>>
>>>    return func;
>>>
>>> }
>>>
>>>
>>>
>>> // modified from the fibonacci example
>>>
>>> int main(int argc, char **argv)
>>>
>>> {
>>>
>>>    Module* pVectorModule = new Module("test vectors");
>>>
>>>
>>>
>>>    Function* pMain = generateVectorAndSelect(pVectorModule);
>>>
>>>
>>>
>>>    pVectorModule->print(std::cout);
>>>
>>>
>>>
>>>    GenericValue gv1, gv2, gvR;
>>>
>>>
>>>
>>>    gv1.FloatVal = 2.0f;
>>>
>>>
>>>
>>>    ExistingModuleProvider *pMP = new
>>> ExistingModuleProvider(pVectorModule);
>>>
>>>
>>> pMP->getModule()->setDataLayout("e-p:32:32:32-i1:8:8:8-i8:8:8:8-
>>> i32:32:3
>>> 2:32-f32:32:32:32");
>>>
>>>    ExecutionEngine *pEE = ExecutionEngine::create(pMP, false);
>>>
>>>
>>>
>>>    std::vector<GenericValue> args;
>>>
>>>
>>>
>>>    args.push_back(gv1);
>>>
>>>
>>>
>>>    GenericValue result = pEE->runFunction(pMain, args);
>>>
>>>
>>>
>>>    return 0;
>>>
>>> }
>>>
>>>
>>>
>>>
>>>
>>> Any help would be appreciated.
>>>
>>> .
>>>
>>> Thanks,
>>>
>>> Chuck.
>>>
>>>
>>
>> -Chris
>>
>> --
>> http://nondot.org/sabre/
>> http://llvm.org/
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev