Re: [llvm-commits] Vectors of Pointers and Vector-GEP

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-commits] Vectors of Pointers and Vector-GEP

greened
"Rotem, Nadav" <[hidden email]> writes:

> Hi,
>
> Following the discussion in last week’s LLVM developers conference I started working on support for vectors-of-pointers.  Vectors of pointers are
> needed for supporting scatter/gather operations and are the first step in the direction of supporting predicated architectures.  In the attached
> patch, I change the LLVM-IR in order to support vectors-of-pointers and added basic support for vector-gep.  In following patches I plan to extend
> the vector-gep support to more indices and add scatter/gather intrinsics.
>
> I started by implementing vector-gep support for a simple case where there is a single index.  The reason for this limitation, as noted by Dan
> Gohman, is that pointers may point to structs, and vectors of indices may point to different members in the struct. I am aware of the fact that
> supporting multiple indices is an important feature and I do intend to add support for multiple indices in the future.

So glad to see this happening!

But I'm unclear about your description.  A vector GEP produces a vector
of pointers, right?  And a scatter/gather operation takes a vector of
pointers as the list of addresses to fetch?

What does the non-unit stride case look like?  I have two ways I think
about a gather/scatter or non-unit stride access.

1. Vector-of-indices

This looks something like:

V1 <- Base + V2

where Base is a base address and V2 is a vector of offsets from the
base.

2. Vector-of-addresses

This looks omsething like:

V1 <- Offset + V2

Where Offset is an offset value applied to each address in V2.

The first form is somewhat more convenient for striding through an array
while the second is somewhat more convenient for doing gather/scatter on
a set of "random" address locations (think operating on a field in
several randomly-allocated structs).

The multiple-index case (if I understand what you mean) is just a
special case of the above, where the values in V2 have been adjusted to
point to the various different fields.

Can you explain with some example what your proposal provides and how it
relates to #1 and #2 above?  I'm just trying to understand the overall
scheme.

Thanks!

                           -Dave

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-commits] Vectors of Pointers and Vector-GEP

Rotem, Nadav
David,

Thanks for the support! I sent a detailed email with the overall plan. But just to reiterate, the GEP would look like this:

        %PV = getelementptr <4 x i32*> %base, <4 x i32> <i32 1, i32 2, i32 3, i32 4>

Where the index of the GEP is a vector of indices. I am not against having multiple indices. I just want to start with a basic set of features.

Thanks,
Nadav

 


-----Original Message-----
From: David A. Greene [mailto:[hidden email]]
Sent: Tuesday, November 29, 2011 01:57
To: Rotem, Nadav
Cc: [hidden email]; LLVM Developers Mailing List
Subject: Re: [llvm-commits] Vectors of Pointers and Vector-GEP

"Rotem, Nadav" <[hidden email]> writes:

> Hi,
>
> Following the discussion in last week’s LLVM developers conference I started working on support for vectors-of-pointers.  Vectors of pointers are
> needed for supporting scatter/gather operations and are the first step in the direction of supporting predicated architectures.  In the attached
> patch, I change the LLVM-IR in order to support vectors-of-pointers and added basic support for vector-gep.  In following patches I plan to extend
> the vector-gep support to more indices and add scatter/gather intrinsics.
>
> I started by implementing vector-gep support for a simple case where there is a single index.  The reason for this limitation, as noted by Dan
> Gohman, is that pointers may point to structs, and vectors of indices may point to different members in the struct. I am aware of the fact that
> supporting multiple indices is an important feature and I do intend to add support for multiple indices in the future.

So glad to see this happening!

But I'm unclear about your description.  A vector GEP produces a vector
of pointers, right?  And a scatter/gather operation takes a vector of
pointers as the list of addresses to fetch?

What does the non-unit stride case look like?  I have two ways I think
about a gather/scatter or non-unit stride access.

1. Vector-of-indices

This looks something like:

V1 <- Base + V2

where Base is a base address and V2 is a vector of offsets from the
base.

2. Vector-of-addresses

This looks omsething like:

V1 <- Offset + V2

Where Offset is an offset value applied to each address in V2.

The first form is somewhat more convenient for striding through an array
while the second is somewhat more convenient for doing gather/scatter on
a set of "random" address locations (think operating on a field in
several randomly-allocated structs).

The multiple-index case (if I understand what you mean) is just a
special case of the above, where the values in V2 have been adjusted to
point to the various different fields.

Can you explain with some example what your proposal provides and how it
relates to #1 and #2 above?  I'm just trying to understand the overall
scheme.

Thanks!

                           -Dave
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-commits] Vectors of Pointers and Vector-GEP

greened
"Rotem, Nadav" <[hidden email]> writes:

> David,
>
> Thanks for the support! I sent a detailed email with the overall
> plan. But just to reiterate, the GEP would look like this:
>
> %PV = getelementptr <4 x i32*> %base, <4 x i32> <i32 1, i32 2, i32 3, i32 4>
>
> Where the index of the GEP is a vector of indices. I am not against
> having multiple indices. I just want to start with a basic set of
> features.

Ah, I see.  I actually think multiple indices as in multiple vectors of
indices to the GEP above would be pretty rare.

                             -Dave
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-commits] Vectors of Pointers and Vector-GEP

greened
In reply to this post by greened
[hidden email] (David A. Greene) writes:

> "Rotem, Nadav" <[hidden email]> writes:
>
>> Following the discussion in last week’s LLVM developers conference I
>> started working on support for vectors-of-pointers.  Vectors of
>> pointers are needed for supporting scatter/gather operations and are
>> the first step in the direction of supporting predicated
>> architectures.

I just want to make clear that gather/scatter and prediction are
orthogonal concepts.  You don't need one to do the other.  You can use
scatter/gather to vectorize conditional code but it's not the same as
true predication, which is generally much more efficient.

                               -Dave

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-commits] Vectors of Pointers and Vector-GEP

Rotem, Nadav
David,

I agree with you, scatter/gather and predication are orthogonal concepts. In theory, you could implement load/store predication using scatter/gather instructions by loading/storing to dedicated 'dead' stack slots. Having said that, I don't know of any architecture which has scatter/gather but does not have predicated instructions.

Looking onward, I definitely want to see LLVM support predicated architectures. Dan Gohman had some excellent ideas on different way this can be done. As soon as I finish stabilizing the pointer-vector patch, and implement scatter/gather intrinsics, I would like to start a discussion on the implementation of predicated instructions in LLVM.

Nadav


-----Original Message-----
From: David A. Greene [mailto:[hidden email]]
Sent: Tuesday, November 29, 2011 18:18
To: David A. Greene
Cc: Rotem, Nadav; [hidden email]; LLVM Developers Mailing List
Subject: Re: [llvm-commits] Vectors of Pointers and Vector-GEP

[hidden email] (David A. Greene) writes:

> "Rotem, Nadav" <[hidden email]> writes:
>
>> Following the discussion in last week’s LLVM developers conference I
>> started working on support for vectors-of-pointers.  Vectors of
>> pointers are needed for supporting scatter/gather operations and are
>> the first step in the direction of supporting predicated
>> architectures.

I just want to make clear that gather/scatter and prediction are
orthogonal concepts.  You don't need one to do the other.  You can use
scatter/gather to vectorize conditional code but it's not the same as
true predication, which is generally much more efficient.

                               -Dave
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-commits] Vectors of Pointers and Vector-GEP

Rotem, Nadav
In reply to this post by greened
I agree that a single vector index is sufficient for many cases. Matt Pharr (from the ISPC compiler), showed me an interesting case where there is a single pointer into an array. In this case we need to have two indices, where the first index is zero. Once the basic patch is in, we can start looking at adding support for arrays and multiple indices.

Nadav

-----Original Message-----
From: David A. Greene [mailto:[hidden email]]
Sent: Tuesday, November 29, 2011 18:17
To: Rotem, Nadav
Cc: David A. Greene; LLVM Developers Mailing List
Subject: Re: [llvm-commits] Vectors of Pointers and Vector-GEP

"Rotem, Nadav" <[hidden email]> writes:

> David,
>
> Thanks for the support! I sent a detailed email with the overall
> plan. But just to reiterate, the GEP would look like this:
>
> %PV = getelementptr <4 x i32*> %base, <4 x i32> <i32 1, i32 2, i32 3, i32 4>
>
> Where the index of the GEP is a vector of indices. I am not against
> having multiple indices. I just want to start with a basic set of
> features.

Ah, I see.  I actually think multiple indices as in multiple vectors of
indices to the GEP above would be pretty rare.

                             -Dave
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-commits] Vectors of Pointers and Vector-GEP

Jose Fonseca
In reply to this post by greened
----- Original Message -----

> "Rotem, Nadav" <[hidden email]> writes:
>
> > David,
> >
> > Thanks for the support! I sent a detailed email with the overall
> > plan. But just to reiterate, the GEP would look like this:
> >
> > %PV = getelementptr <4 x i32*> %base, <4 x i32> <i32 1, i32 2, i32
> > 3, i32 4>
> >
> > Where the index of the GEP is a vector of indices. I am not against
> > having multiple indices. I just want to start with a basic set of
> > features.
>
> Ah, I see.  I actually think multiple indices as in multiple vectors
> of
> indices to the GEP above would be pretty rare.

Nadav, David,

I'd like to understand a bit better the final role of these pointer vector types in 64bit architectures, where the pointers are often bigger than the elements stored/fetch (e.g, 32bits floats/ints).

Will 64bits backends be forced to actually operate with 64bit pointer vectors all the time? Or will they be able to retain operations on base + 32bit offsets as such?

In particular, an important use case for 3D software rendering is to be able to gather <4 x i32> values, from a i32* scalar base pointer in a 64bit address space, indexed by <N x i32> offsets. [1]  And it is important that the intermediate <N x i32*> pointer vectors is actually never instanced, as it wouldn't fit in the hardware SIMD registers, and therefore would require two gather operations.

It would be nice to see how this use case would look in the proposed IR, and get assurance that backends will be able to emit efficient code (i.e., a single gather instruction) from that IR.

Jose

[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-June/040825.html
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-commits] Vectors of Pointers and Vector-GEP

Rotem, Nadav
Hi Jose,

The proposed IR change does not contribute nor hinder the usecase you mentioned. The case of a base + vector-index should be easily addressed by an intrinsic. The pointer-vector proposal comes to support full scatter/gather instructions (such as the AVX2 gather instructions).

Nadav


-----Original Message-----
From: Jose Fonseca [mailto:[hidden email]]
Sent: Tuesday, November 29, 2011 22:25
To: Rotem, Nadav; David A. Greene
Cc: LLVM Developers Mailing List
Subject: Re: [LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP

----- Original Message -----

> "Rotem, Nadav" <[hidden email]> writes:
>
> > David,
> >
> > Thanks for the support! I sent a detailed email with the overall
> > plan. But just to reiterate, the GEP would look like this:
> >
> > %PV = getelementptr <4 x i32*> %base, <4 x i32> <i32 1, i32 2, i32
> > 3, i32 4>
> >
> > Where the index of the GEP is a vector of indices. I am not against
> > having multiple indices. I just want to start with a basic set of
> > features.
>
> Ah, I see.  I actually think multiple indices as in multiple vectors
> of
> indices to the GEP above would be pretty rare.

Nadav, David,

I'd like to understand a bit better the final role of these pointer vector types in 64bit architectures, where the pointers are often bigger than the elements stored/fetch (e.g, 32bits floats/ints).

Will 64bits backends be forced to actually operate with 64bit pointer vectors all the time? Or will they be able to retain operations on base + 32bit offsets as such?

In particular, an important use case for 3D software rendering is to be able to gather <4 x i32> values, from a i32* scalar base pointer in a 64bit address space, indexed by <N x i32> offsets. [1]  And it is important that the intermediate <N x i32*> pointer vectors is actually never instanced, as it wouldn't fit in the hardware SIMD registers, and therefore would require two gather operations.

It would be nice to see how this use case would look in the proposed IR, and get assurance that backends will be able to emit efficient code (i.e., a single gather instruction) from that IR.

Jose

[1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-June/040825.html
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-commits] Vectors of Pointers and Vector-GEP

Jose Fonseca
Yes, indeed I can always fallback to intrinsics.

But still, I believe that the case I described is in its essence quite common-place, so it should be a first-class citizen in the LLVM IR.  AVX2 is the target ISA I'm thinking of too BTW.

Let's forget 3D, and imagine something as trivial as a vectorized i32 => float table look up. I'd expect that the IR would look something like:

  ; Look Up Table with precomputed values
  declare float* @lut;

  define <8 x float> @foo(<8 x float> %indices) {
    %pointer = getelementptr float* @lut, <8 x i32> %indices
    %values = load <8 x float*> %pointer
    ret <8 x float> %values;
  }

And the final AVX2 code I'd expect would consist of a single VGATHERDPS, both on 64bits and 32bits addressing mode:

foo:
  VPCMPEQB   ymm1, ymm1, ymm1                        ; generate all ones
  VGATHERDPS ymm0, DWORD PTR [ymm0 * 4 + lut], ymm1
  RET

Jose

----- Original Message -----

> Hi Jose,
>
> The proposed IR change does not contribute nor hinder the usecase you
> mentioned. The case of a base + vector-index should be easily
> addressed by an intrinsic. The pointer-vector proposal comes to
> support full scatter/gather instructions (such as the AVX2 gather
> instructions).
>
> Nadav
>
>
> -----Original Message-----
> From: Jose Fonseca [mailto:[hidden email]]
> Sent: Tuesday, November 29, 2011 22:25
> To: Rotem, Nadav; David A. Greene
> Cc: LLVM Developers Mailing List
> Subject: Re: [LLVMdev] [llvm-commits] Vectors of Pointers and
> Vector-GEP
>
> ----- Original Message -----
> > "Rotem, Nadav" <[hidden email]> writes:
> >
> > > David,
> > >
> > > Thanks for the support! I sent a detailed email with the overall
> > > plan. But just to reiterate, the GEP would look like this:
> > >
> > > %PV = getelementptr <4 x i32*> %base, <4 x i32> <i32 1, i32 2,
> > > i32
> > > 3, i32 4>
> > >
> > > Where the index of the GEP is a vector of indices. I am not
> > > against
> > > having multiple indices. I just want to start with a basic set of
> > > features.
> >
> > Ah, I see.  I actually think multiple indices as in multiple
> > vectors
> > of
> > indices to the GEP above would be pretty rare.
>
> Nadav, David,
>
> I'd like to understand a bit better the final role of these pointer
> vector types in 64bit architectures, where the pointers are often
> bigger than the elements stored/fetch (e.g, 32bits floats/ints).
>
> Will 64bits backends be forced to actually operate with 64bit pointer
> vectors all the time? Or will they be able to retain operations on
> base + 32bit offsets as such?
>
> In particular, an important use case for 3D software rendering is to
> be able to gather <4 x i32> values, from a i32* scalar base pointer
> in a 64bit address space, indexed by <N x i32> offsets. [1]  And it
> is important that the intermediate <N x i32*> pointer vectors is
> actually never instanced, as it wouldn't fit in the hardware SIMD
> registers, and therefore would require two gather operations.
>
> It would be nice to see how this use case would look in the proposed
> IR, and get assurance that backends will be able to emit efficient
> code (i.e., a single gather instruction) from that IR.
>
> Jose
>
> [1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-June/040825.html
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-commits] Vectors of Pointers and Vector-GEP

Rotem, Nadav
Jose,

The scenario you described is probably the most important/common case. Supporting GEPs with a scalar base pointer and multiple indices can indeed assist IR-level optimizations in detecting these patterns and replace them with intrinsics. But even without a single scalar base pointers, optimizations can detect that the base pointer is broadcasted from a scalar.  Having said that, I am still not sure how to add codegen support for AVX2 scatter/gather of base + 32bit-indices. The problem is that the GEP would return a vector of pointers, which need to be reversed back to the 'base+index' form. I think that replacing the GEP/LOAD sequence with an intrinsic if probably the best choice.

Nadav


-----Original Message-----
From: Jose Fonseca [mailto:[hidden email]]
Sent: Wednesday, November 30, 2011 18:00
To: Rotem, Nadav
Cc: LLVM Developers Mailing List; David A. Greene
Subject: Re: [LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP

Yes, indeed I can always fallback to intrinsics.

But still, I believe that the case I described is in its essence quite common-place, so it should be a first-class citizen in the LLVM IR.  AVX2 is the target ISA I'm thinking of too BTW.

Let's forget 3D, and imagine something as trivial as a vectorized i32 => float table look up. I'd expect that the IR would look something like:

  ; Look Up Table with precomputed values
  declare float* @lut;

  define <8 x float> @foo(<8 x float> %indices) {
    %pointer = getelementptr float* @lut, <8 x i32> %indices
    %values = load <8 x float*> %pointer
    ret <8 x float> %values;
  }

And the final AVX2 code I'd expect would consist of a single VGATHERDPS, both on 64bits and 32bits addressing mode:

foo:
  VPCMPEQB   ymm1, ymm1, ymm1                        ; generate all ones
  VGATHERDPS ymm0, DWORD PTR [ymm0 * 4 + lut], ymm1
  RET

Jose

----- Original Message -----

> Hi Jose,
>
> The proposed IR change does not contribute nor hinder the usecase you
> mentioned. The case of a base + vector-index should be easily
> addressed by an intrinsic. The pointer-vector proposal comes to
> support full scatter/gather instructions (such as the AVX2 gather
> instructions).
>
> Nadav
>
>
> -----Original Message-----
> From: Jose Fonseca [mailto:[hidden email]]
> Sent: Tuesday, November 29, 2011 22:25
> To: Rotem, Nadav; David A. Greene
> Cc: LLVM Developers Mailing List
> Subject: Re: [LLVMdev] [llvm-commits] Vectors of Pointers and
> Vector-GEP
>
> ----- Original Message -----
> > "Rotem, Nadav" <[hidden email]> writes:
> >
> > > David,
> > >
> > > Thanks for the support! I sent a detailed email with the overall
> > > plan. But just to reiterate, the GEP would look like this:
> > >
> > > %PV = getelementptr <4 x i32*> %base, <4 x i32> <i32 1, i32 2,
> > > i32
> > > 3, i32 4>
> > >
> > > Where the index of the GEP is a vector of indices. I am not
> > > against
> > > having multiple indices. I just want to start with a basic set of
> > > features.
> >
> > Ah, I see.  I actually think multiple indices as in multiple
> > vectors
> > of
> > indices to the GEP above would be pretty rare.
>
> Nadav, David,
>
> I'd like to understand a bit better the final role of these pointer
> vector types in 64bit architectures, where the pointers are often
> bigger than the elements stored/fetch (e.g, 32bits floats/ints).
>
> Will 64bits backends be forced to actually operate with 64bit pointer
> vectors all the time? Or will they be able to retain operations on
> base + 32bit offsets as such?
>
> In particular, an important use case for 3D software rendering is to
> be able to gather <4 x i32> values, from a i32* scalar base pointer
> in a 64bit address space, indexed by <N x i32> offsets. [1]  And it
> is important that the intermediate <N x i32*> pointer vectors is
> actually never instanced, as it wouldn't fit in the hardware SIMD
> registers, and therefore would require two gather operations.
>
> It would be nice to see how this use case would look in the proposed
> IR, and get assurance that backends will be able to emit efficient
> code (i.e., a single gather instruction) from that IR.
>
> Jose
>
> [1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-June/040825.html
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-commits] Vectors of Pointers and Vector-GEP

greened
In reply to this post by Rotem, Nadav
"Rotem, Nadav" <[hidden email]> writes:

> I agree that a single vector index is sufficient for many cases. Matt Pharr (from the ISPC compiler), showed me an interesting case where there is a single pointer into an array. In this case we need to have two indices, where the first index is zero. Once the basic patch is in, we can start looking at adding support for arrays and multiple indices.

Ah, I see.  I forgot about the GEP zero offset case.  This is indeed a
common case.  In fact I would like to see a couple of additions:

%PV = getelementptr i32* %base, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, <4 x i32> <i32 w, i32 x, i32 y, i32 z>

That is, make the base address a scalar and all vector indices get
scaled and added to it to produce the vector of pointers.  This is the
case Matt gave you and is very common.  It's used for striding through
an array or G/S where all of the elements are in the same array.

This avoids the need to broadcast the base address to a vector.  We
could match the broadcast in codegen and revert it to an instruction
that takes a scalar base, but it's a bit trickier.  We'd probably have
to make special target ISD nodes _a_la_ the X86 shuffle stuff.  The
above would make this more target-independent.  It is easy enough for a
target to lower the base address to a broadcast if necessary.

%PV = getelementptr i32* %base, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i32 %index, i32 %stride

This is for common strided accesses, for example:

x[i], x[i+2], x[i+4], x[i+6]

The stride component would be multiplied against an incremented index:

%PV = getelementptr i32* %base, <4 x i32> <i32 0, i32 0, i32 0, i32 0>, i32 %i, i32 %stride

=>  <4 x i32 *><0+%i*%stride*bitwidth, 0+(%i+1)*%stride*bitwidth, 0+(%i+2)*%stride*bitwidth, 0+(%i+3)*%stride*bitwidth>

This avoids the need to generate a vector of indices that are a regular
strided pattern.  Similar to the above, it simplifies matching on vector
architectures that directly support strided accesses.

I would really like to see these two versions of GEP added at some
point.  They would simplify isel quite a bit.

Some other ideas:

An alternative is to introduce a cidx (compressed index) instruction:

%IV = cidx i32 %stride, i32 %vector_length

(if %vector_length is 4) => <4 x i32><%stride*0, %stride*1, %stride*2, %stride*3>

This generates the strided index pattern which could be fed into the GEP
as an index vector.

Or:

%IV = cidx i32 %i, i32 %stride, i32 %vector_length

(if %vector_length is 4) => <4 x i32><%stride*%i, %stride*(%i+1), %stride*(%i+2), %stride*(%i+3)>

You can see examples of these and other instructions from the Cray X1
here:

http://docs.cray.com/books/S-2314-51/S-2314-51-manual.pdf

2.5.4 is the section of interest for index generation.

In particular, the scan() and cmprss() (compress) instructions are also
very useful to generate index sets.

Note that the X1 index generation instructions also take a mask
(predicate) operand.  The difference between cidx() and scan() is how
the mask and index are interpreted.  scan() in particular is useful for
vector compress/expand operations.  This is something else to think
about if you plan to tackle predication in the future.

These are just some additional ideas to think about as you continue to
design this.

                            -Dave
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-commits] Vectors of Pointers and Vector-GEP

greened
In reply to this post by Jose Fonseca
Jose Fonseca <[hidden email]> writes:

> ----- Original Message -----
>> "Rotem, Nadav" <[hidden email]> writes:
>>
>> > David,
>> >
>> > Thanks for the support! I sent a detailed email with the overall
>> > plan. But just to reiterate, the GEP would look like this:
>> >
>> > %PV = getelementptr <4 x i32*> %base, <4 x i32> <i32 1, i32 2, i32
>> > 3, i32 4>
>> >
>> > Where the index of the GEP is a vector of indices. I am not against
>> > having multiple indices. I just want to start with a basic set of
>> > features.
>>
>> Ah, I see.  I actually think multiple indices as in multiple vectors
>> of
>> indices to the GEP above would be pretty rare.
>
> Nadav, David,
>
> I'd like to understand a bit better the final role of these pointer
> vector types in 64bit architectures, where the pointers are often
> bigger than the elements stored/fetch (e.g, 32bits floats/ints).

The pointers are addresses.  On a 64-bit address machine they will be 64
bits.  On a 32-bit address machine they will be 32 bits.

For a situation like PTX that has multiple addresses sizes, we will need
additional LLVM support.  Right now a pointer can only have one size per
target.

> Will 64bits backends be forced to actually operate with 64bit pointer
> vectors all the time? Or will they be able to retain operations on
> base + 32bit offsets as such?

Are you talking about 32-bit pointers?  If so, Nadav has talked about
vector inttoptr and ptrtoint instructions which I think can address the
need you're getting at.  But I'm a little unclear on what you want.

> In particular, an important use case for 3D software rendering is to
> be able to gather <4 x i32> values, from a i32* scalar base pointer in
> a 64bit address space, indexed by <N x i32> offsets. [1] And it is
> important that the intermediate <N x i32*> pointer vectors is actually
> never instanced, as it wouldn't fit in the hardware SIMD registers,
> and therefore would require two gather operations.

By "fit" are you worried about vector length?  If so, legalize would
have to break up the <N x i32*> vector into two or more smaller vectors.

If you are worried about element size (there are only 32-bit elements)
then inttoptr/ptrtoint should handle it, I think.

                              -Dave
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-commits] Vectors of Pointers and Vector-GEP

greened
In reply to this post by Rotem, Nadav
"Rotem, Nadav" <[hidden email]> writes:

> The scenario you described is probably the most important/common
> case. Supporting GEPs with a scalar base pointer and multiple indices
> can indeed assist IR-level optimizations in detecting these patterns
> and replace them with intrinsics. But even without a single scalar
> base pointers, optimizations can detect that the base pointer is
> broadcasted from a scalar.  

I just wrote an extensive reply that covers this.  I think we do want a
scalar base GEP to make isel easier and the IR more target-independent.
We should also consider a strided GEP (also covered in my reply) for the
same reason.

> Having said that, I am still not sure how to add codegen support for
> AVX2 scatter/gather of base + 32bit-indices. The problem is that the
> GEP would return a vector of pointers, which need to be reversed back
> to the 'base+index' form. I think that replacing the GEP/LOAD sequence
> with an intrinsic if probably the best choice.

In the same reply I mentioned various index generation instructions like
cidx.  This allows you to retain the index set separate from the final
addresses.  You'd then match load (gep(base, <0>, index set)) as the
gather operation in isel, similar to how memops are done for X86 today
(i.e. manual lowering would put the GEP information in special address
match data structures).

The same can be done without cidx, but you need a more complex
instruction sequence to generate the index set and match that in the
address manual lowering code.  This does make the manual address
matching code more complicated.

There are lots of options here but I do think we need something beyond
the simple "all vector" GEP.

                             -Dave
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-commits] Vectors of Pointers and Vector-GEP

greened
In reply to this post by Rotem, Nadav
"Rotem, Nadav" <[hidden email]> writes:

> Looking onward, I definitely want to see LLVM support predicated
> architectures. Dan Gohman had some excellent ideas on different way
> this can be done. As soon as I finish stabilizing the pointer-vector
> patch, and implement scatter/gather intrinsics, I would like to start
> a discussion on the implementation of predicated instructions in LLVM.

Me too.  This is going to be very important going forward as we're going
to have to extract and represent parallelism in the new age of the clock
ceiling.

                           -Dave
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-commits] Vectors of Pointers and Vector-GEP

Jose Fonseca
In reply to this post by greened


----- Original Message -----

> Jose Fonseca <[hidden email]> writes:
>
> > ----- Original Message -----
> >> "Rotem, Nadav" <[hidden email]> writes:
> >>
> >> > David,
> >> >
> >> > Thanks for the support! I sent a detailed email with the overall
> >> > plan. But just to reiterate, the GEP would look like this:
> >> >
> >> > %PV = getelementptr <4 x i32*> %base, <4 x i32> <i32 1, i32 2,
> >> > i32
> >> > 3, i32 4>
> >> >
> >> > Where the index of the GEP is a vector of indices. I am not
> >> > against
> >> > having multiple indices. I just want to start with a basic set
> >> > of
> >> > features.
> >>
> >> Ah, I see.  I actually think multiple indices as in multiple
> >> vectors
> >> of
> >> indices to the GEP above would be pretty rare.
> >
> > Nadav, David,
> >
> > I'd like to understand a bit better the final role of these pointer
> > vector types in 64bit architectures, where the pointers are often
> > bigger than the elements stored/fetch (e.g, 32bits floats/ints).
>
> The pointers are addresses.  On a 64-bit address machine they will be
> 64
> bits.  On a 32-bit address machine they will be 32 bits.
>
> For a situation like PTX that has multiple addresses sizes, we will
> need
> additional LLVM support.  Right now a pointer can only have one size
> per
> target.
>
> > Will 64bits backends be forced to actually operate with 64bit
> > pointer
> > vectors all the time? Or will they be able to retain operations on
> > base + 32bit offsets as such?
>
> Are you talking about 32-bit pointers?  If so, Nadav has talked about
> vector inttoptr and ptrtoint instructions which I think can address
> the
> need you're getting at.  But I'm a little unclear on what you want.
>
> > In particular, an important use case for 3D software rendering is
> > to
> > be able to gather <4 x i32> values, from a i32* scalar base pointer
> > in
> > a 64bit address space, indexed by <N x i32> offsets. [1] And it is
> > important that the intermediate <N x i32*> pointer vectors is
> > actually
> > never instanced, as it wouldn't fit in the hardware SIMD registers,
> > and therefore would require two gather operations.
>
> By "fit" are you worried about vector length?  If so, legalize would
> have to break up the <N x i32*> vector into two or more smaller
> vectors.
>
> If you are worried about element size (there are only 32-bit
> elements)
> then inttoptr/ptrtoint should handle it, I think.

I was referring to gathering a vector of sparse 32bit words, all relative from a base scalar pointer in a 64bit address space, where the offsets are in a 32bit integer vector.  My other reply gave a more detailed and concrete example.

Anyway, from Nadav's and your other replies on this thread it is now clear to me that even if the IR doesn't express base scalar pointers w/ vector indices directly, the backend can always match and emit the most efficient machine instruction. This addresses my main concern.

Jose
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-commits] Vectors of Pointers and Vector-GEP

greened
Jose Fonseca <[hidden email]> writes:

> I was referring to gathering a vector of sparse 32bit words, all
> relative from a base scalar pointer in a 64bit address space, where
> the offsets are in a 32bit integer vector.  My other reply gave a more
> detailed and concrete example.

Yep, I saw that.  I think LLVM IR should support it directly.

> Anyway, from Nadav's and your other replies on this thread it is now
> clear to me that even if the IR doesn't express base scalar pointers
> w/ vector indices directly, the backend can always match and emit the
> most efficient machine instruction. This addresses my main concern.

It can, but it would have to go through varying amounts of pain to do
so.  This is such a common operation that the IR should really support
it directly.

                           -Dave
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-commits] Vectors of Pointers and Vector-GEP

Rotem, Nadav
Hi,

I just wanted to let you know that I committed the pointer-vector patch.

Thanks,
Nadav

-----Original Message-----
From: David A. Greene [mailto:[hidden email]]
Sent: Tuesday, December 06, 2011 00:10
To: Jose Fonseca
Cc: David A. Greene; Rotem, Nadav; LLVM Developers Mailing List
Subject: Re: [LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP

Jose Fonseca <[hidden email]> writes:

> I was referring to gathering a vector of sparse 32bit words, all
> relative from a base scalar pointer in a 64bit address space, where
> the offsets are in a 32bit integer vector.  My other reply gave a more
> detailed and concrete example.

Yep, I saw that.  I think LLVM IR should support it directly.

> Anyway, from Nadav's and your other replies on this thread it is now
> clear to me that even if the IR doesn't express base scalar pointers
> w/ vector indices directly, the backend can always match and emit the
> most efficient machine instruction. This addresses my main concern.

It can, but it would have to go through varying amounts of pain to do
so.  This is such a common operation that the IR should really support
it directly.

                           -Dave
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-commits] Vectors of Pointers and Vector-GEP

Jose Fonseca
In reply to this post by greened
----- Original Message -----

> Jose Fonseca <[hidden email]> writes:
> > Anyway, from Nadav's and your other replies on this thread it is
> > now
> > clear to me that even if the IR doesn't express base scalar
> > pointers
> > w/ vector indices directly, the backend can always match and emit
> > the
> > most efficient machine instruction. This addresses my main concern.
>
> It can, but it would have to go through varying amounts of pain to do
> so.  This is such a common operation that the IR should really
> support
> it directly.

I entirely agree FWIW.

Jose
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-commits] Vectors of Pointers and Vector-GEP

Hal Finkel
In reply to this post by Rotem, Nadav
On Tue, 6 Dec 2011 09:19:43 +0200
"Rotem, Nadav" <[hidden email]> wrote:

> Hi,
>
> I just wanted to let you know that I committed the pointer-vector
> patch.

Nadav,

I just committed a change to BBVectorizer to allow it to generate these
(and also vector selects) [>= r154735]. If you have cases where you
think these should be generated, and they're now not generated by the
vectorizer, please let me know.

Thanks again,
Hal

>
> Thanks,
> Nadav
>
> -----Original Message-----
> From: David A. Greene [mailto:[hidden email]]
> Sent: Tuesday, December 06, 2011 00:10
> To: Jose Fonseca
> Cc: David A. Greene; Rotem, Nadav; LLVM Developers Mailing List
> Subject: Re: [LLVMdev] [llvm-commits] Vectors of Pointers and
> Vector-GEP
>
> Jose Fonseca <[hidden email]> writes:
>
> > I was referring to gathering a vector of sparse 32bit words, all
> > relative from a base scalar pointer in a 64bit address space, where
> > the offsets are in a 32bit integer vector.  My other reply gave a
> > more detailed and concrete example.
>
> Yep, I saw that.  I think LLVM IR should support it directly.
>
> > Anyway, from Nadav's and your other replies on this thread it is now
> > clear to me that even if the IR doesn't express base scalar pointers
> > w/ vector indices directly, the backend can always match and emit
> > the most efficient machine instruction. This addresses my main
> > concern.
>
> It can, but it would have to go through varying amounts of pain to do
> so.  This is such a common operation that the IR should really support
> it directly.
>
>                            -Dave
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



--
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-commits] Vectors of Pointers and Vector-GEP

Rotem, Nadav
Hi Hal!

This is great!   Vector-selects are always profitable compared to scalar selects. They are even emulated using a few Xor/And/Or instructions on platforms that don't have a native 'blend' support.

Vector-geps on the other hand are only useful in very specific cases.  Currently we do not support Load/Store instructions with a pointer-vector operand, so you need to extract each pointer element prior to loading and storing.
X86 supports a complex addressing modes which means that most GEPs are included in the load/store instruction and they cost nothing. So, performing vector-gep on these instructions would actually be slower. Vector-geps are only useful if your hardware supports scatter/gather using a vector of pointers/indices.  For example AVX2 has support for the 'gather' instruction which receives a vector of indices.  In many cases it is possible to lower vector-gep instructions into a gather instruction, but this is currently not supported by LLVM.

Thanks,
Nadav


 
-----Original Message-----
From: Hal Finkel [mailto:[hidden email]]
Sent: Saturday, April 14, 2012 11:02
To: Rotem, Nadav
Cc: David A. Greene; Jose Fonseca; LLVM Developers Mailing List
Subject: Re: [LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP

On Tue, 6 Dec 2011 09:19:43 +0200
"Rotem, Nadav" <[hidden email]> wrote:

> Hi,
>
> I just wanted to let you know that I committed the pointer-vector
> patch.

Nadav,

I just committed a change to BBVectorizer to allow it to generate these (and also vector selects) [>= r154735]. If you have cases where you think these should be generated, and they're now not generated by the vectorizer, please let me know.

Thanks again,
Hal

>
> Thanks,
> Nadav
>
> -----Original Message-----
> From: David A. Greene [mailto:[hidden email]]
> Sent: Tuesday, December 06, 2011 00:10
> To: Jose Fonseca
> Cc: David A. Greene; Rotem, Nadav; LLVM Developers Mailing List
> Subject: Re: [LLVMdev] [llvm-commits] Vectors of Pointers and
> Vector-GEP
>
> Jose Fonseca <[hidden email]> writes:
>
> > I was referring to gathering a vector of sparse 32bit words, all
> > relative from a base scalar pointer in a 64bit address space, where
> > the offsets are in a 32bit integer vector.  My other reply gave a
> > more detailed and concrete example.
>
> Yep, I saw that.  I think LLVM IR should support it directly.
>
> > Anyway, from Nadav's and your other replies on this thread it is now
> > clear to me that even if the IR doesn't express base scalar pointers
> > w/ vector indices directly, the backend can always match and emit
> > the most efficient machine instruction. This addresses my main
> > concern.
>
> It can, but it would have to go through varying amounts of pain to do
> so.  This is such a common operation that the IR should really support
> it directly.
>
>                            -Dave
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



--
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
12