Vector swizzling and write masks code generation

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Vector swizzling and write masks code generation

Zack Rusin-3
Hey,

as some of you may know we're in process of experimenting with LLVM in
Gallium3D (Mesa's new driver model), where LLVM would be used both in the
software only (by just JIT executing shaders) and hardware (drivers will
implement LLVM code-generators) cases.

While the software only case is pretty straight forward I just realized I
missed something in my initial evaluation.

That is graphics hardware (basically every single programmable gpu) has
instruction level support for vector swizzling and write masks.

For example the following represents a valid gpu shader instruction:
ADD dst.xyz   src1.yxzw  src2.zwxy
which performs an addition that stores the result to the dst operated (each
operarand is a vector type of four data elements) The instruction uses source
swizzle modifiers and destination mask modifier.

So if a language is capable of expressing such constructs (as GLSL, HLSL and
few others are) I'd like to make sure that the code generator is actually
capable of generating instructions with exactly those semantics.

Right now vector operations utilizing swizzling and write masks in LLVM IR
have to expressed with series of load/extractelement/instertelement/store
constructs. As in

vec2 = vec4.xy

would end up being:
%tmp = load <4 x float>* @vec4
%tmp1 = extractelement <4 x float> %tmp, i32 0
%tmp2 = insertelement <2 x float> undef, float %tmp1, i32 0
%tmp3 = extractelement <4 x float> %tmp, i32 1
%tmp4 = insertelement <2 x float> %tmp2, float %tmp3, i32 1
store <2 x float> %tmp4, <2 x float>* @vec2
or the like.

So I think my options come down to:

1) figure out a way of having code generator be actually able to combine all
those IR instructions back into
OP dst.writemask src1.swizzle1 src2.swizzle2

2) have some kind of instruction level support for it in LLVM IR

With my limited knowledge of code generators in LLVM I don't see a way of
doing #1 and I'm afraid #2 might be the only option.
I'd appreciate any ideas and/or comments that could potentially help to solve
this problem.

z
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Vector swizzling and write masks code generation

Chris Lattner
On Thu, 27 Sep 2007, Zack Rusin wrote:
> as some of you may know we're in process of experimenting with LLVM in
> Gallium3D (Mesa's new driver model), where LLVM would be used both in the
> software only (by just JIT executing shaders) and hardware (drivers will
> implement LLVM code-generators) cases.

Yep, nifty!

> That is graphics hardware (basically every single programmable gpu) has
> instruction level support for vector swizzling and write masks.

ok

> For example the following represents a valid gpu shader instruction:
> ADD dst.xyz   src1.yxzw  src2.zwxy
> which performs an addition that stores the result to the dst operated (each
> operarand is a vector type of four data elements) The instruction uses source
> swizzle modifiers and destination mask modifier.

Right.

> So if a language is capable of expressing such constructs (as GLSL, HLSL and
> few others are) I'd like to make sure that the code generator is actually
> capable of generating instructions with exactly those semantics.

Ok.  Are you planning to use the LLVM code generator, or roll your own?

> Right now vector operations utilizing swizzling and write masks in LLVM IR
> have to expressed with series of load/extractelement/instertelement/store
> constructs. As in
>
> vec2 = vec4.xy
>
> would end up being:
> %tmp = load <4 x float>* @vec4
> %tmp1 = extractelement <4 x float> %tmp, i32 0
> %tmp2 = insertelement <2 x float> undef, float %tmp1, i32 0
> %tmp3 = extractelement <4 x float> %tmp, i32 1
> %tmp4 = insertelement <2 x float> %tmp2, float %tmp3, i32 1
> store <2 x float> %tmp4, <2 x float>* @vec2
> or the like.

Yes, you're right.  If you are staying within the same width of operand
(e.g. vec4 -> vec4) you can use the shufflevector instruction, but if not,
you have to use insert/extract.

> So I think my options come down to:
>
> 1) figure out a way of having code generator be actually able to combine all
> those IR instructions back into
> OP dst.writemask src1.swizzle1 src2.swizzle2

Yep.  If you're using the LLVM code generator, it makes it reasonably easy
to pattern match on this sort of thing and/or introduce machine specific
abstractions to describe them.

-Chris

--
http://nondot.org/sabre/
http://llvm.org/
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Vector swizzling and write masks code generation

Gordon Henriksen-3
In reply to this post by Zack Rusin-3
Hi Zack,

On Sep 27, 2007, at 09:54, Zack Rusin wrote:

as some of you may know we're in process of experimenting with LLVM in Gallium3D (Mesa's new driver model), where LLVM would be used both in the software only (by just JIT executing shaders) and hardware (drivers will implement LLVM code-generators) cases.

Neat.

That is graphics hardware (basically every single programmable gpu) has instruction level support for vector swizzling and write masks.

For example the following represents a valid gpu shader instruction:
ADD dst.xyz   src1.yxzw  src2.zwxy
which performs an addition that stores the result to the dst operated (each operarand is a vector type of four data elements) The instruction uses source swizzle modifiers and destination mask modifier.

So if a language is capable of expressing such constructs (as GLSL, HLSL and few others are) I'd like to make sure that the code generator is actually capable of generating instructions with exactly those semantics. 

Right now vector operations utilizing swizzling and write masks in LLVM IR have to expressed with series of load/extractelement/instertelement/store constructs. As in 

vec2 = vec4.xy 

would end up being:
%tmp = load <4 x float>* @vec4
%tmp1 = extractelement <4 x float> %tmp, i32 0
%tmp2 = insertelement <2 x float> undef, float %tmp1, i32 0
%tmp3 = extractelement <4 x float> %tmp, i32 1
%tmp4 = insertelement <2 x float> %tmp2, float %tmp3, i32 1
store <2 x float> %tmp4, <2 x float>* @vec2
or the like.

Loads and stores are always explicit; the code generator will fold them into the machine instructions if possible.

You may be able to take advantage of the shufflevector instruction. Although its result will be a <4 x float> instead of a <2 x float>, same as the source vector. So you'll need to find a way to write "extract subvector" that codegens well. Perhaps this will work:

%shufvec = shufflevector <4 x float> ...
%src1 = extractelement %shufvec, 0
%src2 = extractelement %shufvec, 1
%tmp = insertelement <2 x float> undef, %src1, 0
%res = insertelement %tmp, %src2, 1

If that's no good, then you might want to add intrinsics to do the job. You can then easily pattern match on (llvm.extractvector (shufflevector ...), which) where llvm.extractvector is your intrinsic.

So I think my options come down to:

1) figure out a way of having code generator be actually able to combine all 
those IR instructions back into 
OP dst.writemask src1.swizzle1 src2.swizzle2

2) have some kind of instruction level support for it in LLVM IR

It's much easier to define intrinsic functions than instructions, so start there if you need to go that route.

— Gordon


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Vector swizzling and write masks code generation

[Alex]
In reply to this post by Chris Lattner
Chris Lattner wrote:
> > So I think my options come down to:
> >
> > 1) figure out a way of having code generator be actually able to combine all
> > those IR instructions back into
> > OP dst.writemask src1.swizzle1 src2.swizzle2
>
> Yep.  If you're using the LLVM code generator, it makes it reasonably easy
> to pattern match on this sort of thing and/or introduce machine specific
> abstractions to describe them.

I'd like to know if this is already implemented in some backend? so that I can take it as an example.