[llvm-dev] [RFC] Matrix support (take 2)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Matrix support (take 2)

Sudhindra kulkarni via llvm-dev
Roman Lebedev via llvm-dev <[hidden email]> writes:

> Much like as with native fixed-point type support, a whole new
> incompatible type is suggested to be added here.  I *suspect*, *every*
> single transform in instcombine/instsimplify will need to be
> *duplicated*.

Only for operations which can be masked (and therefore use the new
type).  We may or may not care about masking integer adds and subtracts,
for example, or bitwise operations.  We may want to do so for power
reasons (and maybe others?) but it's generally not required for
correctness.

> That is a lot. Intrinsics sound like less intrusive solution, in both
> cases.

I don't see how intrinsics is any less work.  We'll need to duplicate
transforms for every intrinsic we add, which seems to be the moral
equivalent of duplicating transforms for every operation we care about
using the new type.  Intrinsics have the added disadvantage of being
semantically opaque.

                               -David
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Matrix support (take 2)

Sudhindra kulkarni via llvm-dev
In reply to this post by Sudhindra kulkarni via llvm-dev

On 12/20/18 6:43 PM, Roman Lebedev wrote:

> On Thu, Dec 20, 2018 at 7:40 PM Simon Moll via llvm-dev
> <[hidden email]> wrote:
>> On 12/20/18 4:43 PM, David Greene wrote:
>>> Simon Moll <[hidden email]> writes:
>>>
>>>>> How will existing passes be taught about the new intrinsics?  For
>>>>> example, what would have to be done to instcombine to teach it about
>>>>> these intrinsics?  Let's suppose every existing operation had an
>>>>> equivalent masked intrinsic.  Would it be easier to teach all of the
>>>>> passes about them or would it be easier to teach the passes about a mask
>>>>> operand on the existing Instructions?  Would it be easier to teach isel
>>>>> about all the intrinsics or would it be easier to teach isel about a
>>>>> mask operand?
>>>> Consider that over night we introduce optional mask parameters to
>>>> vector instructions. Then, since you can not safely ignore the mask,
>>>> every transformation and analysis that is somehow concerned with
>>>> vector instructions is potentially broken and needs to be fixed.
>>> True, but is there a way we could do this incrementally?  Even if we
>>> start with intrinsics and then migrate to first-class support, at some
>>> point passes are going to be broken with respect to masks on
>>> Instructions.
>> Here is path an idea for an incremental transition:
>>
>> a) Create a new, distinct type. Let's say its called the "predicated
>> vector type", written "{W x double}".
>>
>> b) Make the predicate vector type a legal operand type for all binary
>> operators and add an optional predicate parameter to them. Now, here is
>> the catch: the predicate parameter is only legal if the data type of the
>> operation is "predicated vector type". That is "fadd <8 x double>" will
>> for ever be unpredicated. However, "fadd {8 x double} %a, %b" may have
>> an optional predicate argument. Semantically, these two operations would
>> be identical:
>>
>> fadd <8 x double>, %a, %b
>>
>> fadd {8 x double}, %a, %b, predicate(<8 x i1><1, 1, 1, 1, 1, 1, 1, 1>)
>>
>> In terms of the LLVM type hierachy, PredicatedVectorType would be
>> distinct from VectorType and so no transformation can break it. While
>> you are in the transition (from unpredicated to predicated IR), you may
>> see codes like this:
>>
>> %aP = bitcast <8 x  double> %a to {8 x double}
>> %bP = bitcast <8 x  double> %b to {8 x double}
>> %cP = fdiv %aP, %bP, mask(11101110) ; predicated fdiv
>> %c = bitcast <8 x double> %c to %cP
>> %d = fadd <8 x double> %a, %c   ; no predicated fadd yet
>>
>> Eventually, when all optimizations/instructions/analyses have been
>> migrated to run well with the new type, 1. deprecate the old vector
>> type, 2. promote it to PredicatedVectorType when parsing BCand, after a
>> grace period, rename {8 x double} to <8 x double>
> I'm likely missing things,
> but i strongly suspect that the amount of effort needed is underestimated.
>
> Vector support works because, with some exceptions,
> vector is simply interpreted as several scalars concatenated.
>
> Much like as with native fixed-point type support,
> a whole new incompatible type is suggested to be added here.
> I *suspect*, *every* single transform in instcombine/instsimplify
> will need to be *duplicated*.

Actually, there is no need for an entirely distinct type.
PredicatedVectorType could simply be the super class of VectorType.

/// operations of this type may have a mask or an explicit vector length.

class PredicatedVectorType : public SequentialType {

public:
   virtual bool allowsMasking() const { return true; }
   virtual bool allowsExplicitVectorLength() const { return true; }

// [all contents of the current VectorType]
};

// operations of this type can not have a mask nor an explicit vector length

class VectorType : public PredicatedVectorType {
public:
   bool allowsMasking() const { return false; }
   bool allowsExplicitVectorLength() const { return false; }
// no other members
};

When we go down that path, there would be *no duplication* at all,
instead transformations would be lifted from VectorType to
PredicatedVectorType one at a time. Eventually, the PredicatedVectorType
class could be deprecated as masking and an explicit vector length would
only depend on the operation.

>
> That is a lot. Intrinsics sound like less intrusive solution, in both cases.
>>>> If you go with masking intrinsics, and set the attributes right, it is
>>>> clear that transformations won't break your code and you will need to
>>>> teach InstCombine, DAGCombiner, etc that a `masked.fadd` is just an
>>>> fadd` with a mask. However, this gives you the opportunity to
>>>> "re-enable" one optimization add a time each time making sure that the
>>>> mask is handled correctly. In case of InstCombine, the vector
>>>> instruction patterns transfer to mask intrinsics: if all mask
>>>> intrinsics in the pattern have the same mask parameter you can apply
>>>> the transformation, the resulting mask intrinsics will again take the
>>>> same mask parameter.
>>> Right.
>>>
>>>> Also, this need not be a hard transition from vector instructions to
>>>> masking intrinsics.. you can add new types of masking intrinsics in
>>>> batches along with the required transformations. Masking intrinsics
>>>> and vector instruction can live side by side (as they do today,
>>>> anyway).
>>> Of course.
>>>
>>>
>>>>> I honestly don't know the answers to these questions.  But I think they
>>>>> are important to consider, especially if intrinsics are seen as a bridge
>>>>> to first-class IR support for masking.
>>>> I think its sensible to use masking intrinsics (or EVL
>>>> https://reviews.llvm.org/D53613) on IR level and masked SD nodes in
>>>> the backend. However, i agree that intrinsics should just be a bridge
>>>> to native support mid term.
>>> The biggest question I have is how such a transition would happen.
>>> Let's say we have a full set of masking intrinsics.  Now we want to take
>>> one IR-level operation, say fadd, and add mask support to it.  How do we
>>> do that?  Is it any easier because we have all of the intrinsics, or
>>> does all of the work on masking intrinsics get thrown away at some
>>> point?
>> The masking intrinsics are just a transitional thing. Eg, we could add
>> them now and let them mature. Once the intrinsics are stable and proven
>> start migrating for core IR support (eg as sketched above).
>>> I'm reminded of this now decade-old thread on gather/scatter and masking
>>> from Don Gohman, which I also mentioned in an SVE thread earlier this
>>> year:
>>>
>>> https://lists.llvm.org/pipermail/llvm-dev/2008-August/016284.html
>>>
>>> The applymask idea got worked through a bit and IIRC at some later point
>>> someone found issues with it that need to be addressed, but it's an
>>> interesting idea to consider.  I wasn't too hot on it at the time but it
>>> may be a way forward.
>>>
>>> In that thread, Tim Foley posted a summary of options for mask support,
>>> one of which was adding intrinsics:
>>>
>>> https://lists.llvm.org/pipermail/llvm-dev/2008-August/016371.html
>>>
>>>                                   -David
>> Thank for you for the pointer! Is this documented somewhere? (say in a
>> wiki or some proposal doc). Otherwise, we are bound to go through these
>> discussions again and again until a consensus is reached. Btw, different
>> to then, we are also talking about an active vector length now (hence EVL).
>>
>> AFAIU apply_mask was proposed to have less (redundant) predicate
>> arguments. Unless the apply_mask breaks a chain in a matcher pattern,
>> the approach should be prone to the issue of transformations breaking
>> code as well.
>>
>> Has something like the PredicatedVectorType approach above been proposed
>> before?
>>
>> - Simon
>>
>> --
>>
>> Simon Moll
>> Researcher / PhD Student
>>
>> Compiler Design Lab (Prof. Hack)
>> Saarland University, Computer Science
>> Building E1.3, Room 4.31
>>
>> Tel. +49 (0)681 302-57521 : [hidden email]
>> Fax. +49 (0)681 302-3065  : http://compilers.cs.uni-saarland.de/people/moll
> Roman.
>
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--

Simon Moll
Researcher / PhD Student

Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.31

Tel. +49 (0)681 302-57521 : [hidden email]
Fax. +49 (0)681 302-3065  : http://compilers.cs.uni-saarland.de/people/moll

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Matrix support (take 2)

Sudhindra kulkarni via llvm-dev
In reply to this post by Sudhindra kulkarni via llvm-dev


On Dec 19, 2018, at 3:08 PM, Simon Moll <[hidden email]> wrote:

Hi,

On 12/19/18 11:21 PM, David Greene via llvm-dev wrote:
Adam Nemet via llvm-dev <[hidden email]> writes:

    I spent some time chatting with Adam about this and have a better
    understanding of his concerns here. It seems to me that if having
    masking intrinsics is the long-term solution we want, we should do
    that now (for add and sub) rather than building arbitrary matrix
    layout info into intrinsics, since a mask has all the information
    that we actually need.

I think that sounds like a reasonable compromise. We already have
masked load/store intrinsics so adding add and sub just follows that
precedent. If the decision is made to move masking to the core
operations, the new intrinsics would just move as well.
How will existing passes be taught about the new intrinsics?  For
example, what would have to be done to instcombine to teach it about
these intrinsics?  Let's suppose every existing operation had an
equivalent masked intrinsic.  Would it be easier to teach all of the
passes about them or would it be easier to teach the passes about a mask
operand on the existing Instructions?  Would it be easier to teach isel
about all the intrinsics or would it be easier to teach isel about a
mask operand?

Consider that over night we introduce optional mask parameters to vector instructions. Then, since you can not safely ignore the mask, every transformation and analysis that is somehow concerned with vector instructions is potentially broken and needs to be fixed.

If you go with masking intrinsics, and set the attributes right, it is clear that transformations won't break your code and you will need to teach InstCombine, DAGCombiner, etc that a `masked.fadd` is just an `fadd` with a mask. However, this gives you the opportunity to "re-enable" one optimization add a time each time making sure that the mask is handled correctly. In case of InstCombine, the vector instruction patterns transfer to mask intrinsics: if all mask intrinsics in the pattern have the same mask parameter you can apply the transformation, the resulting mask intrinsics will again take the same mask parameter.

Also, this need not be a hard transition from vector instructions to masking intrinsics.. you can add new types of masking intrinsics in batches along with the required transformations. Masking intrinsics and vector instruction can live side by side (as they do today, anyway).

+1

Also this thread is getting off-topic. It would probably be best to continue the discussion about the masked-intrinsic transition strategy under https://reviews.llvm.org/D53613.

Adam


I honestly don't know the answers to these questions.  But I think they
are important to consider, especially if intrinsics are seen as a bridge
to first-class IR support for masking.

I think its sensible to use masking intrinsics (or EVL https://reviews.llvm.org/D53613) on IR level and masked SD nodes in the backend. However, i agree that intrinsics should just be a bridge to native support mid term.

- Simon

                                 -David
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--

Simon Moll
Researcher / PhD Student

Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.31

Tel. +49 (0)681 302-57521 : [hidden email]
Fax. +49 (0)681 302-3065  : http://compilers.cs.uni-saarland.de/people/moll


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Matrix support (take 2)

Sudhindra kulkarni via llvm-dev


On Dec 21, 2018, at 12:07 AM, Adam Nemet via llvm-dev <[hidden email]> wrote:



On Dec 19, 2018, at 3:08 PM, Simon Moll <[hidden email]> wrote:

Hi,

On 12/19/18 11:21 PM, David Greene via llvm-dev wrote:
Adam Nemet via llvm-dev <[hidden email]> writes:

    I spent some time chatting with Adam about this and have a better
    understanding of his concerns here. It seems to me that if having
    masking intrinsics is the long-term solution we want, we should do
    that now (for add and sub) rather than building arbitrary matrix
    layout info into intrinsics, since a mask has all the information
    that we actually need.

I think that sounds like a reasonable compromise. We already have
masked load/store intrinsics so adding add and sub just follows that
precedent. If the decision is made to move masking to the core
operations, the new intrinsics would just move as well.
How will existing passes be taught about the new intrinsics?  For
example, what would have to be done to instcombine to teach it about
these intrinsics?  Let's suppose every existing operation had an
equivalent masked intrinsic.  Would it be easier to teach all of the
passes about them or would it be easier to teach the passes about a mask
operand on the existing Instructions?  Would it be easier to teach isel
about all the intrinsics or would it be easier to teach isel about a
mask operand?

Consider that over night we introduce optional mask parameters to vector instructions. Then, since you can not safely ignore the mask, every transformation and analysis that is somehow concerned with vector instructions is potentially broken and needs to be fixed.

If you go with masking intrinsics, and set the attributes right, it is clear that transformations won't break your code and you will need to teach InstCombine, DAGCombiner, etc that a `masked.fadd` is just an `fadd` with a mask. However, this gives you the opportunity to "re-enable" one optimization add a time each time making sure that the mask is handled correctly. In case of InstCombine, the vector instruction patterns transfer to mask intrinsics: if all mask intrinsics in the pattern have the same mask parameter you can apply the transformation, the resulting mask intrinsics will again take the same mask parameter.

Also, this need not be a hard transition from vector instructions to masking intrinsics.. you can add new types of masking intrinsics in batches along with the required transformations. Masking intrinsics and vector instruction can live side by side (as they do today, anyway).

+1

Also this thread is getting off-topic. It would probably be best to continue the discussion about the masked-intrinsic transition strategy under https://reviews.llvm.org/D53613.

I agree. The thread started to lose focus. It seems all that is needed to get this going on open source is get into the pragmatics mind set  like Chris said and “just” agree on a decision on the sticking question like >you had suggested earlier.

My sense is that this info is important for your lowering, and your approach of using dataflow analysis to recover this will fail in some cases.

Since layout and padding information is important, it seems most logical to put this into the type.  Doing so would make it available in all these places.

That said, I still don’t really understand why you *need* it.

>This seems like the main sticking point so let’s close on this first and see if my answers above are satisfying.


The rest will follow once the learning and iteration starts. 



I honestly don't know the answers to these questions.  But I think they
are important to consider, especially if intrinsics are seen as a bridge
to first-class IR support for masking.

I think its sensible to use masking intrinsics (or EVL https://reviews.llvm.org/D53613) on IR level and masked SD nodes in the backend. However, i agree that intrinsics should just be a bridge to native support mid term.

- Simon

                                 -David
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 

Simon Moll
Researcher / PhD Student

Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.31

Tel. +49 (0)681 302-57521 : [hidden email]
Fax. +49 (0)681 302-3065  : http://compilers.cs.uni-saarland.de/people/moll

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Matrix support (take 2)

Sudhindra kulkarni via llvm-dev
In reply to this post by Sudhindra kulkarni via llvm-dev
On Tue Dec 18 20:45:12 PST 2018, Chris wrote:

> Since layout and padding information is important, it seems most
> logical to put this into the type.  Doing so would make it available
> in all these places.

> That said, I still don’t really understand why you *need* it.

for large vectors and matrices that simply will not fit into the register
file, LD/ST and MV etc. in the form of gather/scatter or vectorised MVX [1]
is the clear and obvious requirement.

however the penalty for use of LD/ST is the power consumption hit of
going through the L1/L2 cache barrier.

for a low-power cost-competitive 3D GPU, for example, a 100% increase in
power consumption due to the penalty of being forced to move data back
and forth multiple times through the L1/L2 cache would be completely
unacceptable.

hence the natural solution, for small vectors and matrices, to be able
to process them *in-place*.

that in turn means having, at the *architectural* level, a way to re-order
the sequence of an otherwise straight linear 1D array of elements.  with
the right re-ordering capability, it even becomes possible to do arbitrary
in-place transposition of the order of elements, such that matrix multiply
may be done *in-place*, without MV operations.

this practice is extremely common in 3D GPUs, as there tend to be a lot
of 3x4 matrices.  ARM MALI actually added a special hard-coded set of
operations just to deal with 3x4 matrix data.

l.

[1] regfile[regfile[rs]] = regfile[rd]

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
12