[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev

Hello,

I am the main author of SLEEF( https://sleef.org ).

For resolution and mapping of vectorized math functions, an easy way
would be to allow users to choose a mapping through declaration of
vector math function prototype. It would be nice if we could make a new
function attribute for this. Then, the compiler does not need to care
about accuracy, domain, ant other things. Users can choose faster
functions regardless of fast-math options.


The header file would look like :

#ifdef SLEEF_LOWER_ACCURACY
__attribute__((map_vectormathfunc("sin"))) __m128d Sleef_sind2_u35(__m128d);
__attribute__((map_vectormathfunc("sin"))) __m256d Sleef_sind4_u35(__m256d);
#else
__attribute__((map_vectormathfunc("sin"))) __m128d Sleef_sind2_u10(__m128d);
__attribute__((map_vectormathfunc("sin"))) __m256d Sleef_sind4_u10(__m256d);
#endif


If usage of GNU vector ABI is preferred, it would be :

#pragma omp declare simd simdlen(2) notinbranch
__attribute__((map_vectormathfunc("sin"))) double Sleef_sin_u35(double);


Regards,

Naoki Shibata


On 7/4/2018 4:47 PM, Simon Moll via llvm-dev wrote:

> Instead there is a lazy interface (PlatformInfo::getResolver) that takes
> in the scalar function name, the argument shapes and whether there is a
> non-uniform predicate at the call site. We currently return just one
> possible mapping per query but you could also generate a list of
> possible mappings and let the vectorizer decide for itself, from this
> tailored list, which mapping to use.
>
> This approach will scale not just to math functions.
> Behind the curtains, a call to ::getResolver works through a chain of
> ResolverServices that can raise their hand if they could provide a
> vector implementation for the scalar function.
>
> The first in the chain will check whether this is a math function and if
> it should use a VECLIB call (RV does this for SLEEF, the vectorized
> functions are actually linked in immediately). Since we are not tied to
> a static VECLIB table, we actually allow users to provide an ULP error
> bound on the math functions. The SLEEF resolver will only consider
> functions that are within that bound
> (https://github.com/cdl-saarland/rv/blob/develop/include/rv/sleefLibrary.h).
>
>
> Further down the chain, you have a resolver that checks whether the
> scalar callee is defined in the module and if so, whether it can invoke
> whole-function vectorization recursively on the callee (again, given the
> precise argument shapes, we will get a precise return value shape). Atm,
> we only do this to vectorize and inline scalar SLEEF functions but it is
> trivial to do that on the same module.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev
In reply to this post by Tom Stellard via llvm-dev
[+CC Naoki Shibata (SLEEF), Xinmin Tian (Intel), Renato Golin (Linaro) ]

Hi All,

Apologies for jumping in so late in this thread.

The scalar-to-vector mapping mechanism works pretty well with the Openmp directive `#pragma amp declare simd`. We have implemented it in arm compiler for HPC [1]. We didn’t have to do any hack in the TargetLibraryInfo lists of vector functions, and the functionality is independent on the choice of the target library.

This is a commercial compiler, so the actual implementation of the functionality doesn’t conform 100% to the LLVM way of doing things [2], but Arm is working with Intel to be able to provide a fully open source implementation of this mechanism that will work for all targets that specify a vector function ABI based on `#pragma imp declare simd`. Intel and Arm work is available at [3], we would like to hear your opinion on this, feel free to join the review.

The functionality provided by the Vector Clone pass [3] goes as follows:

1. The vectorizer is informed via an attribute of the availability of a vector function associated to the scalar call in the scalar loop
2. The name of the vector function carries the info generated from the `declare simd` directive associated to the scalar declaration/definition.
3. The vectorizer chooses which vector function to use based in the information generated by the vector-variant attribute associated to the original scale function.

This mechanism is modular (clang and opt can be tested separately as the vectorization information are stored in the IR via an attribute), therefore it is superior  to the functionality in Arm compiler for HPC, but it is equivalent in the case of function definition, which is the case we need to interface external vector libraries, whether math libraries or any other kind of vector library.

As one can see from [1], the list of available vector functions is not coded in the TLI, but just provided via a header file in <clang>/lib/Headers/math.h, which is easy to maintain.

As it is, this mechanism cannot be used as a replacement for the VECLIB functionality, because external libraries like SVML, or SLEEF, have their own naming conventions. To this extend the new directive `declare variant` of the upcoming OpenMP 5.0 standard is, in my opinion, the way forward.  This directive allows to re-map the name associated to a `declare simd` declaration/definition to  a new name chosen by the user/library vendor.

For example, in case of __svml_sin4 on x86, the declaration could be the following [4]:

```
#pragma omp declare simd simdlen(4) notinbranch
float sinf(float);

#ifdef USE_SVML
    #pragma omp declare variant (float sinf(float)) match(construct = {simd(notinbranch, simdlen(4))}, device={uarch(sse)}
    __m128d __svml_sin4(__m128d);
#endif
```

With this construct it would be able to choose the list of vector functions available in the library by simply tweaking the command line to select the correct portion of the header file shipped with the compiler [6], without the need to maintain lists in the TLI source code, and completely splitting the functionality between frontend and backend, with no dependencies.

Finally, for those interested, I just wanted to point the BoF [7] I will be running in the LLVM dev meeting in San Jose, where I would like to discuss these topics with anyone interested. Hopefully the meeting will help moving forward these functionalities in clang/LLVM.

Kind regards,

Francesco

[1] https://developer.arm.com/products/software-development-tools/hpc/documentation/vector-math-routines
[2] Our implementation of declare simd require clang and opt to be coupled (they cannot be tested separately as there is no attribute in the IR that describes the availability of the vector function).
[3] VectorClone pass and related patches.
1. https://reviews.llvm.org/D40577 - clang patches to add the SIMD mangled names as “vector-variants” attribute (ib/CodeGen/CGOpenMPRuntime.cpp)
2. https://reviews.llvm.org/D40575 - loop vectorizer pass that interfaces with the vector clone pass
3. https://reviews.llvm.org/D22792 - vector clone pass
4. https://reviews.llvm.org/D52579 - Additional tests
[4] https://www.openmp.org/wp-content/uploads/openmp-TR7.pdf
[5] Disclaimer: I am not an expert in x86 vector extension, the code you see in the example might be broken, it is there for an illustrative example.
[6] This wouldn’t work with a Fortran frontend, as there is no equivalent of C header files in Fortran. In any case, this OpenMP_5.0-based solution is in my opinion better than the TLI-list-based one as it allows to split completely frontend and backend. Yes the Fortran frontend will have to list the equivalent of the C header file somewhere in its sources, but it wouldn’t be touching any code in the mid-end/back-end.
[7] https://llvm.org/devmtg/2018-10/talk-abstracts.html#bof7

> On Jul 9, 2018, at 12:36 PM, Saito, Hideki via llvm-dev <[hidden email]> wrote:
>
>
> All,
>
> It looks like we are finally converging into
>
> 4)      Vectorizer emit legalized VECLIB calls. Since it can emit instructions in scalarized form, adding legalized call functionality is in some sense similar to that. Vectorizer can’t simply choose type legal function name with illegal vector ---- since LegalizeVectorType() will still end up using one call instead of two.
>
> I was hoping to collectively come up with a better solution, but not at all surprised to see us settling down to this known-to-work practical approach.
>
> We need a more elaborate VECLIB setting, taking per-target function availability into account. Also, much of the "legalized call" mechanism should work for OpenMP declare simd --- and we should make that easier for reuse in case other FE/optimizers want to emit legalized calls.
>
> Simon, is the RV "legalized call emission" code easily reusable outside of RV? If yes, would you be able to restructure it so that it can reside, say, under Transforms/Utils?
>
> I think this RFC is ready to close at the end of this week. Thank you very much for all the lively discussions. If anybody have more inputs, please speak up soon.
>
> Thanks,
> Hideki
>
> ===========================================
> From: Hal Finkel [mailto:[hidden email]]
> Sent: Wednesday, July 04, 2018 9:59 AM
> To: Robert Lougher <[hidden email]>; Nema, Ashutosh <[hidden email]>
> Cc: Saito, Hideki <[hidden email]>; Sanjay Patel <[hidden email]>; [hidden email]; [hidden email]; [hidden email]; Masten, Matt <[hidden email]>
> Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>
>
> On 07/04/2018 07:50 AM, Robert Lougher wrote:
> Hi,
>
> On 4 July 2018 at 07:42, Nema, Ashutosh via llvm-dev <[hidden email]> wrote:
> + llvm-dev
>
> -----Original Message-----
> From: Nema, Ashutosh
> Sent: Wednesday, July 4, 2018 12:12 PM
> To: Hal Finkel <[hidden email]>; Saito, Hideki <[hidden email]>; Sanjay Patel <[hidden email]>; [hidden email]
> Cc: [hidden email]; Masten, Matt <[hidden email]>
> Subject: RE: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>
> Hi Hal,
>
>> __svml_sin8 (plus whatever shuffles are necessary).
>> The vectorizer should do this.
>> It should not generate calls to functions that don't exist.
>
> I'm not sure how vectorizer will do this, consider the case where "-vectorizer-maximize-bandwidth" option is enabled and vectorizer is forced to generate the wider VF, and hence it may generate a call to __svml_sin_* which may not exist.
>
> Are you expecting the vectorizer to lower the calls i.e. __svml_sin_8 to two __svml_sin_4 calls ?
>
> Regards,
> Ashutosh
>
> If an accurate cost model was in place (which there isn't), then an "unsupported" vectorization factor should only be selected if it was forced.  However, in this case __svml_sin_8 is the same cost as __svml_sin_4, so the loop vectorizer will select a VF of 8, and generate a call to a function which effectively doesn't exist.
>
> Would it actually be the same, or would there be extra shuffle costs associated with the calls to __svml_sin_4?
>
>
>
> The simplest way to fix it, is to simply only populate the SVML vector library table with __svml_sin_8 when the target is AVX-512.
>
> I believe that this is exactly what we should do. When not targeting AVX-512, __svml_sin_8 essentially doesn't exist (i.e. there's no usable ABI via which we can call it), and so it should not appear in the vectorizer's list of options at all.
>
>  -Hal
>
>
>   Alternatively, TLI.isFunctionVectorizable() should check that the entry is available on the target (this is more difficult as the type is not encoded).
> I'm guessing that the cost model would then make VF=4 cheaper, so generating calls to __svml_sin_4 (I'm not in work so can't check).   If the vectorization factor was forced to 8, we'll either get a call to the intrinsic llvm.sin.v8f64 (if no-math-errno) or the vectorizer will scalarize the call.  The vectorizer would not generate two calls to __svml_sin_4 although this would be cheaper.
>
> While this problem probably doesn't require the loop vectorizer to have knowledge of the target ABI, others may do.  I'm thinking specifically of D48193:
>
> https://reviews.llvm.org/D48193
> In this case we have poor code generation due to the interleave count selected by the loop vectorizer.  I can't see how this can be fixed later, so we will need to expose details of the ABI to the loop vectorizer (see my latest comment D48193#1149705).
> Thanks,
> Rob.
>
>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev
Hi Francesco,

Thanks for copying me, I missed this thread.

On Tue, 9 Oct 2018 at 19:45, Francesco Petrogalli
<[hidden email]> wrote:
> The functionality provided by the Vector Clone pass [3] goes as follows:
>
> 1. The vectorizer is informed via an attribute of the availability of a vector function associated to the scalar call in the scalar loop

I assume this is OMP's pragma SIMD's job, for now. We may want to work
that out automatically if we see vector functions being defined in
some header, for example.

> 2. The name of the vector function carries the info generated from the `declare simd` directive associated to the scalar declaration/definition.

Headers for C, some text file? for Fortran, OMP 5 for the rest, right?

> 3. The vectorizer chooses which vector function to use based in the information generated by the vector-variant attribute associated to the original scale function.

And, I assume, make sure that the types are compatible (args and
return) according to the language's own promotions and conversions.

> This mechanism is modular (clang and opt can be tested separately as the vectorization information are stored in the IR via an attribute), therefore it is superior  to the functionality in Arm compiler for HPC, but it is equivalent in the case of function definition, which is the case we need to interface external vector libraries, whether math libraries or any other kind of vector library.

I assume Clang would just emit the defines / metadata for the vector
functions so that LLVM can reason with them. If that's the case, then
any Fortran front-end would have to do the same with whatever is the
mechanism there.

> As it is, this mechanism cannot be used as a replacement for the VECLIB functionality, because external libraries like SVML, or SLEEF, have their own naming conventions. To this extend the new directive `declare variant` of the upcoming OpenMP 5.0 standard is, in my opinion, the way forward.  This directive allows to re-map the name associated to a `declare simd` declaration/definition to  a new name chosen by the user/library vendor.

I was going to ask about SLEEF. :)

Is it possible that we write shims between VECLIB and SLEEF? So that
we can already use them before OMP 5 is settled?

> With this construct it would be able to choose the list of vector functions available in the library by simply tweaking the command line to select the correct portion of the header file shipped with the compiler [6], without the need to maintain lists in the TLI source code, and completely splitting the functionality between frontend and backend, with no dependencies.

What about the costs? An initial estimate would be "whatever the cost
of the scalar / VF", but that is perhaps too naive, and we could get
it wrong either way by more than just a bit.

Eventually, I foresee people trying to add heuristics somewhere, and
that could pollute the IR. It would be good to at least have an idea
where that would live, so that we can make the best informed decision
now.

cheers,
--renato
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev

> On Oct 9, 2018, at 3:43 PM, Renato Golin <[hidden email]> wrote:
>
> Hi Francesco,
>
> Thanks for copying me, I missed this thread.
>
> On Tue, 9 Oct 2018 at 19:45, Francesco Petrogalli
> <[hidden email]> wrote:
>> The functionality provided by the Vector Clone pass [3] goes as follows:
>>
>> 1. The vectorizer is informed via an attribute of the availability of a vector function associated to the scalar call in the scalar loop
>
> I assume this is OMP's pragma SIMD's job, for now. We may want to work
> that out automatically if we see vector functions being defined in
> some header, for example.
>

No, I meant IR attribute. There is an RFC submitted by Intel that describe such attribute: http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html


>> 2. The name of the vector function carries the info generated from the `declare simd` directive associated to the scalar declaration/definition.
>
> Headers for C, some text file? for Fortran, OMP 5 for the rest, right?
>

Header for C
Other mechanism (text files?) for Fortran

Both could use OMP 5 when we decide to go along that route.

What do you exactly mean with “OpenMP 5 for the rest”?


>> 3. The vectorizer chooses which vector function to use based in the information generated by the vector-variant attribute associated to the original scale function.
>
> And, I assume, make sure that the types are compatible (args and
> return) according to the language's own promotions and conversions.
>

Yes.

>> This mechanism is modular (clang and opt can be tested separately as the vectorization information are stored in the IR via an attribute), therefore it is superior  to the functionality in Arm compiler for HPC, but it is equivalent in the case of function definition, which is the case we need to interface external vector libraries, whether math libraries or any other kind of vector library.
>
> I assume Clang would just emit the defines / metadata for the vector
> functions so that LLVM can reason with them. If that's the case, then
> any Fortran front-end would have to do the same with whatever is the
> mechanism there.
>

Yes, the information is store in metadata in the IR.

>> As it is, this mechanism cannot be used as a replacement for the VECLIB functionality, because external libraries like SVML, or SLEEF, have their own naming conventions. To this extend the new directive `declare variant` of the upcoming OpenMP 5.0 standard is, in my opinion, the way forward.  This directive allows to re-map the name associated to a `declare simd` declaration/definition to  a new name chosen by the user/library vendor.
>
> I was going to ask about SLEEF. :)
>
> Is it possible that we write shims between VECLIB and SLEEF? So that
> we can already use them before OMP 5 is settled?
>

If I got your question correctly, you are asking wether we can start using SLEEF before we set up this mechanism with OpenMP 5.0. If that the question, the answer is yes. We could use SLEEF by adding  VECLIB option like it is done now for SVML in the TLI, or we could use SLEEF and support Intel and Arm by using the libmvec compatible version of the library, libsleefgnuabi.so - this is my favorite solution as it is based on the vector function ABI standards of Intel and Arm.

>> With this construct it would be able to choose the list of vector functions available in the library by simply tweaking the command line to select the correct portion of the header file shipped with the compiler [6], without the need to maintain lists in the TLI source code, and completely splitting the functionality between frontend and backend, with no dependencies.
>
> What about the costs? An initial estimate would be "whatever the cost
> of the scalar / VF", but that is perhaps too naive, and we could get
> it wrong either way by more than just a bit.
>

There is no way to get the cost of the vector function, other than the wrong assumption that cost(vector version) = cost(scalar version), which is not the case. By the way, why do you think that the cost of the vector version is scalar cost / VF?

We could argue that vectorizing a math function is always beneficial?

> Eventually, I foresee people trying to add heuristics somewhere, and
> that could pollute the IR. It would be good to at least have an idea
> where that would live, so that we can make the best informed decision
> now.
>

Why do you say “pollute the IR”? The heuristics would not be added to the IR, they would be added in the code of the cost model. I am not sure I understand what you mean here.

> cheers,
> --renato

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev
In reply to this post by Tom Stellard via llvm-dev

I'm all for discussing the overall vectorized function call mechanism issue. Looking forward to have a fruitful discussion next week. While we are discussing,
we can try listing up the interesting cases (like longer than target full vector) and see what we need to add on top of https://reviews.llvm.org/D40575
that my colleague worked on.

In the meantime, another colleague of mine created https://reviews.llvm.org/D53035 for SVML legalization. This is probably as much as we can do w/o touching
the VECLIB/TLI and LV's dependency on using one common VF in many places. Would also be a good ground for the discussion.
       
>Eventually, I foresee people trying to add heuristics somewhere, and that could pollute the IR. It would be good to at least have an idea where that would live, so that we can make the best informed decision now.

It would be really nice if we can let LLVM read a table provided by the library implementations. In addition to the per-target availability for the VF, IMF attribute (see https://lists.llvm.org/pipermail/llvm-dev/2016-March/097862.html) would be a good candidates for the table entries. Cost could also be.

Thanks,
Hideki

-----Original Message-----
From: Renato Golin [mailto:[hidden email]]
Sent: Tuesday, October 09, 2018 1:43 PM
To: Francesco Petrogalli <[hidden email]>
Cc: Saito, Hideki <[hidden email]>; Hal Finkel <[hidden email]>; [hidden email]; Nema, Ashutosh <[hidden email]>; LLVM Dev <[hidden email]>; Masten, Matt <[hidden email]>; Davide Italiano <[hidden email]>; [hidden email]; Tian, Xinmin <[hidden email]>
Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Hi Francesco,

Thanks for copying me, I missed this thread.

On Tue, 9 Oct 2018 at 19:45, Francesco Petrogalli <[hidden email]> wrote:
> The functionality provided by the Vector Clone pass [3] goes as follows:
>
> 1. The vectorizer is informed via an attribute of the availability of
> a vector function associated to the scalar call in the scalar loop

I assume this is OMP's pragma SIMD's job, for now. We may want to work that out automatically if we see vector functions being defined in some header, for example.

> 2. The name of the vector function carries the info generated from the `declare simd` directive associated to the scalar declaration/definition.

Headers for C, some text file? for Fortran, OMP 5 for the rest, right?

> 3. The vectorizer chooses which vector function to use based in the information generated by the vector-variant attribute associated to the original scale function.

And, I assume, make sure that the types are compatible (args and
return) according to the language's own promotions and conversions.

> This mechanism is modular (clang and opt can be tested separately as the vectorization information are stored in the IR via an attribute), therefore it is superior  to the functionality in Arm compiler for HPC, but it is equivalent in the case of function definition, which is the case we need to interface external vector libraries, whether math libraries or any other kind of vector library.

I assume Clang would just emit the defines / metadata for the vector functions so that LLVM can reason with them. If that's the case, then any Fortran front-end would have to do the same with whatever is the mechanism there.

> As it is, this mechanism cannot be used as a replacement for the VECLIB functionality, because external libraries like SVML, or SLEEF, have their own naming conventions. To this extend the new directive `declare variant` of the upcoming OpenMP 5.0 standard is, in my opinion, the way forward.  This directive allows to re-map the name associated to a `declare simd` declaration/definition to  a new name chosen by the user/library vendor.

I was going to ask about SLEEF. :)

Is it possible that we write shims between VECLIB and SLEEF? So that we can already use them before OMP 5 is settled?

> With this construct it would be able to choose the list of vector functions available in the library by simply tweaking the command line to select the correct portion of the header file shipped with the compiler [6], without the need to maintain lists in the TLI source code, and completely splitting the functionality between frontend and backend, with no dependencies.

What about the costs? An initial estimate would be "whatever the cost of the scalar / VF", but that is perhaps too naive, and we could get it wrong either way by more than just a bit.

Eventually, I foresee people trying to add heuristics somewhere, and that could pollute the IR. It would be good to at least have an idea where that would live, so that we can make the best informed decision now.

cheers,
--renato
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev
In reply to this post by Tom Stellard via llvm-dev
On Tue, 9 Oct 2018 at 22:45, Francesco Petrogalli
<[hidden email]> wrote:
> > I assume this is OMP's pragma SIMD's job, for now. We may want to work
> > that out automatically if we see vector functions being defined in
> > some header, for example.
>
> No, I meant IR attribute. There is an RFC submitted by Intel that describe such attribute: http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html

Oh, the mangled names, right! I remember that RFC.


> What do you exactly mean with “OpenMP 5 for the rest”?

For the libraries that don't follow the mangling pattern above.

> If I got your question correctly, you are asking wether we can start using SLEEF before we set up this mechanism with OpenMP 5.0. If that the question, the answer is yes.

That was the question, yes. Thanks! :)

> We could use SLEEF by adding VECLIB option like it is done now for SVML in the TLI

I don't like any approach that needs specialised compiler support in
such a low level library.

> or we could use SLEEF and support Intel and Arm by using the libmvec compatible version of the library, libsleefgnuabi.so - this is my favorite solution as it is based on the vector function ABI standards of Intel and Arm.

Agreed.

> There is no way to get the cost of the vector function, other than the wrong assumption that cost(vector version) = cost(scalar version), which is not the case. By the way, why do you think that the cost of the vector version is scalar cost / VF?

Sorry, cost after taking VF into account. The cost itself would be
(naively and wrongly) the same as scalar.

> We could argue that vectorizing a math function is always beneficial?

I'm (possibly wrongly) worried about two things:

1. Cost of prologue/epilogue/shuffles

Different architectures have different ABIs, and some are more
efficient than others when calling vector functions.

Also, some vector extensions have features other do not, for example,
scatter/gather. Some libraries try to emulate that in vector code,
which is not always obviously beneficial.

If you're calling them explicitly in the code, by programmers that
know what they're doing, this is fine (as you expect them to have
benchmarked). But if this is the compiler deciding on its own and
taking that choice, we risk upsetting users.

One thing is to produce slow code because you didn't do enough,
another is because you did too much. People often understand the
former, not usually the latter. :)

2. Skipping direct codegen

This is a minor issue, but depending on how early this transformation
passes, it may hinder specialised codegen, particular to the
architecture, that could have been more efficient.

I don't have any example to hand, but imagine a machine has a sincos
implementation in scalar that is faster than 2-lane library call. The
compiler will assume, unwittingly, that the library call is better.

> Why do you say “pollute the IR”? The heuristics would not be added to the IR, they would be added in the code of the cost model. I am not sure I understand what you mean here.

Different libraries may have different "costs" for different
architectures. We can't possibly hold them in the cost model for all
known library implementations, past, present and future.

--
cheers,
--renato
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev
In reply to this post by Tom Stellard via llvm-dev
Reading back on the whole thread, I agree with Hideki that we're
converging into a common solution.

On Wed, 4 Jul 2018 at 17:59, Hal Finkel via llvm-dev
<[hidden email]> wrote:
> > The simplest way to fix it, is to simply only populate the SVML vector library table with __svml_sin_8 when the target is AVX-512.
>
> I believe that this is exactly what we should do. When not targeting AVX-512, __svml_sin_8 essentially doesn't exist (i.e. there's no usable ABI via which we can call it), and so it should not appear in the vectorizer's list of options at all.

I agree, too. We already do that with RTLIB calls. A good example is
DIVREM, which has wildly different legal patterns on common arches.
AEABI doesn't have REM, so we emit DIVREM, discard the DIV and move
the REM to the output register.

Having a table of VECLIB calls for various implementations, however,
is a little harder than GNU vs AEABI. The latter has standard
documents, the former can change any time, especially if they're not
maintained by vendors or GNU.

Hideki's proposal to have a "config file" in the implementation (a
header file with some mappings/attributes would be enough, I think)
can work for all non-standardised implementations, as well as
proprietary ones.

The other new thing would be to look for functions with VF/2... /4...
/8 until one is found, and add the costs of addition function calls to
the BB if not equals VF. That should work as long as all types are
valid, but may back-fire if we add too many shuffles with zero cost.

But that seems trivial compared to maintaining VECLIB.

cheers,
--renato
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev

> On Oct 10, 2018, at 11:52 AM, Renato Golin via llvm-dev <[hidden email]> wrote:
>
> Reading back on the whole thread, I agree with Hideki that we're
> converging into a common solution.
>
> On Wed, 4 Jul 2018 at 17:59, Hal Finkel via llvm-dev
> <[hidden email]> wrote:
>>> The simplest way to fix it, is to simply only populate the SVML vector library table with __svml_sin_8 when the target is AVX-512.
>>
>> I believe that this is exactly what we should do. When not targeting AVX-512, __svml_sin_8 essentially doesn't exist (i.e. there's no usable ABI via which we can call it), and so it should not appear in the vectorizer's list of options at all.
>
> I agree, too. We already do that with RTLIB calls. A good example is
> DIVREM, which has wildly different legal patterns on common arches.
> AEABI doesn't have REM, so we emit DIVREM, discard the DIV and move
> the REM to the output register.
>
> Having a table of VECLIB calls for various implementations, however,
> is a little harder than GNU vs AEABI. The latter has standard
> documents, the former can change any time, especially if they're not
> maintained by vendors or GNU.
>
> Hideki's proposal to have a "config file" in the implementation (a
> header file with some mappings/attributes would be enough, I think)
> can work for all non-standardised implementations, as well as
> proprietary ones.
>

I am not sure I understand this. Are you saying that the signature of __svml_sin_8 might not conform to the signature that the intel vector function ABI mandates for a 8 lanes version of sin operating on double?
Or is it just a difference in names that raises concerns for SVML? If the latter, I believe that the problem is easily solvable with OpenMP5.0. If the former, it gets indeed more complicated.

> The other new thing would be to look for functions with VF/2... /4...
> /8 until one is found, and add the costs of addition function calls to
> the BB if not equals VF. That should work as long as all types are
> valid, but may back-fire if we add too many shuffles with zero cost.
>
> But that seems trivial compared to maintaining VECLIB.
>
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev
In reply to this post by Tom Stellard via llvm-dev

>
>> or we could use SLEEF and support Intel and Arm by using the libmvec compatible version of the library, libsleefgnuabi.so - this is my favorite solution as it is based on the vector function ABI standards of Intel and Arm.
>
> Agreed.
>

With libsleefgnuabi.so it would be indeed easy to bring up the infrastructure needed to cover Arm, Intel and Power, on a variety of OSs (windows, linux, OS X). That could be used for a starting point for tuning the functionality.

>
> 2. Skipping direct codegen
>
> This is a minor issue, but depending on how early this transformation
> passes, it may hinder specialised codegen, particular to the
> architecture, that could have been more efficient.
>
> I don't have any example to hand, but imagine a machine has a sincos
> implementation in scalar that is faster than 2-lane library call. The
> compiler will assume, unwittingly, that the library call is better.
>

There is no way for the compiler to solve this problem without accessing the code of the function, scalar or vector.

The only to possible solutions I see are:

1. Provide runtime libraries as bitcode libraries - then, the bitcode of the call would be visible to the compiler. (SLEEF has already been compiled successfully into a bitcode library and used in a couple of compiler projects).
2. LTO. If the vectorizer generates vector calls that follow Vector Function ABI naming and conventions, a smart linker would be able to associate a scalar function to the vector one, and decide whether to replace such vector call with a loop that uses the faster scalar function.

Francesco


IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev
On Wed, 10 Oct 2018 at 21:30, Francesco Petrogalli
<[hidden email]> wrote:
> There is no way for the compiler to solve this problem without accessing the code of the function, scalar or vector.

One small thing would be to never touch arch-specific intrinsics (like
@aeabi_divmod), but that also backfires frequently (Android likes to
play with library calls).

> 1. Provide runtime libraries as bitcode libraries - then, the bitcode of the call would be visible to the compiler. (SLEEF has already been compiled successfully into a bitcode library and used in a couple of compiler projects).
> 2. LTO. If the vectorizer generates vector calls that follow Vector Function ABI naming and conventions, a smart linker would be able to associate a scalar function to the vector one, and decide whether to replace such vector call with a loop that uses the faster scalar function.

Bitcode is safer, but requires work from third-parties. LTO
(especially inlining) is less of an exact science, but in theory,
require no changes elsewhere.

As usual, pursuing short and long term paths at the same time gives us
the most benefits in the long run, but also cost more. :)

--renato
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev
In reply to this post by Tom Stellard via llvm-dev
On Wed, 10 Oct 2018 at 21:18, Francesco Petrogalli
<[hidden email]> wrote:
> I am not sure I understand this. Are you saying that the signature of __svml_sin_8 might not conform to the signature that the intel vector function ABI mandates for a 8 lanes version of sin operating on double?
> Or is it just a difference in names that raises concerns for SVML? If the latter, I believe that the problem is easily solvable with OpenMP5.0. If the former, it gets indeed more complicated.

Neither. :)

My point is that GNU, Arm, Intel, IBM (usually) publish documents
outlining their ABIs and they tend to stick to them. Not all library
vendors do, or need to.

Tracking changes in the stable ABIs is a long process and everyone
pays the cost, but tracking (perhaps more frequent and unpredictable)
changes in some other libraries could not only mean trunk (or maybe
even a release) will have broken support for some library (that we
officially support), but also it could increase the cost of
maintaining (and identifying) all past variants and applying the
correct ones.

I don't personally know how stable SVML, SLEEF and others are, and how
much of a standardisation process they have, so the comment may be
moot. But it's a point to consider for all libraries once we introduce
a more "generic" way of replacing functions that still needs compiler
support (ie. not OMP5).

--renato
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev
On 10/11/2018 8:11 PM, Renato Golin via llvm-dev wrote:

> On Wed, 10 Oct 2018 at 21:18, Francesco Petrogalli
> <[hidden email]> wrote:
>> I am not sure I understand this. Are you saying that the signature of __svml_sin_8 might not conform to the signature that the intel vector function ABI mandates for a 8 lanes version of sin operating on double?
>> Or is it just a difference in names that raises concerns for SVML? If the latter, I believe that the problem is easily solvable with OpenMP5.0. If the former, it gets indeed more complicated.
>
> Neither. :)
>
> My point is that GNU, Arm, Intel, IBM (usually) publish documents
> outlining their ABIs and they tend to stick to them. Not all library
> vendors do, or need to.
>
> Tracking changes in the stable ABIs is a long process and everyone
> pays the cost, but tracking (perhaps more frequent and unpredictable)
> changes in some other libraries could not only mean trunk (or maybe
> even a release) will have broken support for some library (that we
> officially support), but also it could increase the cost of
> maintaining (and identifying) all past variants and applying the
> correct ones.
>
> I don't personally know how stable SVML, SLEEF and others are, and how
> much of a standardisation process they have, so the comment may be
> moot. But it's a point to consider for all libraries once we introduce
> a more "generic" way of replacing functions that still needs compiler
> support (ie. not OMP5).


What kind of a standardization process are you talking about? As a
developer of SLEEF, I am rather trying to know what is actually needed
by the developers of compilers. I am also trying to come up with a new
feature with which I can write a paper.

It would be easy if LLVM community will list up requirements. Then,
SLEEF project will try to make the library comply to those requirements.
This would be actually a matter of who will undertake the cumbersome
task for setting the direction of implementation.


In my opinion, the first thing we need to consider is who is actually in
need of vector math library.

1. As for users of a compiler who don't care so much about how
optimization is done : If the main thing these people care about is that
the existing software works correctly, and they don't tolerate
performance degradation by overdoing optimization, then I think vector
math library is not needed for these people, for at least some time.

Considering about people in this category, I think the compiler should
not vectorize the code by default. It would be also better to document
that there is some risk of performance degradation if too much
vectorization is introduced. It is sometimes not very easy to understand
the difference between out-of-order execution and SIMD.

2. As for the users of a compiler who wants ultimate optimization and
are trying to make the compiler generate assembly code as they want :
what they need is some way to fine tune how the code will be vectorized.

What these people need is function attributes with which how code is
vectorized can be specified in a detailed way.

3. As for the developers of a compiler who are caring mainly about the
benchmark score : compiler has functionality to decide how the code is
vectorized by itself. I think LTO should be used when benchmark programs
are compiled.


The second thing we need to consider is compiler's compliance to the
standard. The troublesome thing is that libraries may not be fully
compliant to the C standard. We need to think of accuracy, input domain,
whether it produces consistent results, etc. The number of items can
increase, and developers of different libraries may be seeing different
demands.

It is better if some condition is indicated for a vector math library to
be treated with some default-ish setting. Is it good enough if the
function has 1 ULP accuracy and full FP input domain? With this
default-ish setting, interchangeability between vector math libraries
should be also guaranteed, to some degree.

For non-default math functions which are faster but less accurate or
with narrower input range, I think the developer of the code that will
call such functions should be fully aware which particular
library/functions they are trying to use. IMO, these functions should
not be introduced automatically. fast-math options are too simple and
too rough, and introducing many compiler options for making the compiler
decide which functions can be used is too troublesome. I think the
easiest way is to let the users choose which functions to be replaced
during vectorization by macros and headers, and probably such
functionality can be implemented with OMP5.


Another thing I want to know is how much compliance to the Vector
Function ABI is needed. I know Arm is keen in supporting this ABI, but
how about Intel? Is there possibility that SVML will comply to the
Vector Function ABI in the near future?


Naoki Shibata

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev
Hi Naoki,

I'll try to keep it short, as this is not the most important part of
this thread. If that's too short, I'll be glad to chat in private.

On Sat, 13 Oct 2018 at 09:20, Naoki Shibata <[hidden email]> wrote:
> What kind of a standardization process are you talking about? As a
> developer of SLEEF, I am rather trying to know what is actually needed
> by the developers of compilers. I am also trying to come up with a new
> feature with which I can write a paper.

I meant ABI standards. An official document, written by the authors of
the library (company, community, developer), specifying things like
function names and what they do, argument types and how they expand,
how errors are handled, special registers used (if any), macros that
control behaviour, macros that are defined by the library, etc.

This is the important part for compiler writers, not necessarily for
users. End users of the compiler do not care at all what the ABIs are,
they want their code compiled, correct results and fast execution.

Users of your library won't care much either, as if you change the
names or mandatory arguments or even internal behaviour, they'll adapt
to the new model. Most users only use one version of each library
anyway.

But when embedding the behaviour of your library (alongside all other
similar libraries) in the compiler, and you change the behaviour in
the new version, the compiler now has to be compatible with two
versions.

Furthermore, if you don't follow the same behaviour (as you would have
if there's an official ABI document), then we'd only notice you
changed when our users start complaining *we* are breaking their code.

A good example of an official ABI document is what ARM publishes for
their architecture:

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.subset.swdev.abi/index.html

But there are a lot of documents in there, and that's not what I'm
asking. Something like the NEON intrinsics list would be a good start:

http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073b/IHI0073B_arm_neon_intrinsics_ref.pdf

But it would be better with a short explanation of what the function
does, what the arguments are and what is the results returned.


> The second thing we need to consider is compiler's compliance to the
> standard. The troublesome thing is that libraries may not be fully
> compliant to the C standard. We need to think of accuracy, input domain,
> whether it produces consistent results, etc. The number of items can
> increase, and developers of different libraries may be seeing different
> demands.

This is largely irrelevant to the topic of this thread. How you
compile your library is up to you, this thread is about the
expectation of what are the entry points of the library (functions,
arguments) and returned values and types, so that we can replace
scalar functions (already checked by the front-end) with vector
alternatives (not checked by anyone).


> Another thing I want to know is how much compliance to the Vector
> Function ABI is needed. I know Arm is keen in supporting this ABI, but
> how about Intel? Is there possibility that SVML will comply to the
> Vector Function ABI in the near future?

That's a good questions and is mostly up to all of us to make sure
that works in the future. If we all have clear expectations (and an
official document goes a long way in providing that), then we'll all
have a much easier job.

Hope this helps.

--
cheers,
--renato
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev

Hello Renato,

On 10/14/2018 12:05 AM, Renato Golin wrote:

> Hi Naoki,
>
> I'll try to keep it short, as this is not the most important part of
> this thread. If that's too short, I'll be glad to chat in private.
>
> On Sat, 13 Oct 2018 at 09:20, Naoki Shibata <[hidden email]> wrote:
>> What kind of a standardization process are you talking about? As a
>> developer of SLEEF, I am rather trying to know what is actually needed
>> by the developers of compilers. I am also trying to come up with a new
>> feature with which I can write a paper.
>
> I meant ABI standards. An official document, written by the authors of
> the library (company, community, developer), specifying things like
> function names and what they do, argument types and how they expand,
> how errors are handled, special registers used (if any), macros that
> control behaviour, macros that are defined by the library, etc.
>
> This is the important part for compiler writers, not necessarily for
> users. End users of the compiler do not care at all what the ABIs are,
> they want their code compiled, correct results and fast execution.
>
> Users of your library won't care much either, as if you change the
> names or mandatory arguments or even internal behaviour, they'll adapt
> to the new model. Most users only use one version of each library
> anyway.
>
> But when embedding the behaviour of your library (alongside all other
> similar libraries) in the compiler, and you change the behaviour in
> the new version, the compiler now has to be compatible with two
> versions.
>
> Furthermore, if you don't follow the same behaviour (as you would have
> if there's an official ABI document), then we'd only notice you
> changed when our users start complaining *we* are breaking their code.
>
> A good example of an official ABI document is what ARM publishes for
> their architecture:
>
> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.subset.swdev.abi/index.html
>
> But there are a lot of documents in there, and that's not what I'm
> asking. Something like the NEON intrinsics list would be a good start:
>
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073b/IHI0073B_arm_neon_intrinsics_ref.pdf
>
> But it would be better with a short explanation of what the function
> does, what the arguments are and what is the results returned.

Your stance seems to be that it is compiler's responsibility to adapt to
the changes of math libraries, but is it fair to say that? I would say
there should be a kind of standard regarding to how a vector math
library is implemented. Then, the compiler can simply assume that the
library is implemented conforming to that standard. It would be even
better if there is a conformance testing tool.

In order to make the compiler compliant to the C standard, the standard
library also needs to be compliant to the C standard. Documenting what
is assumed by the compiler should be not too hard, since there is
already documents for C standard library and documents for the Vector ABI.


>> The second thing we need to consider is compiler's compliance to the
>> standard. The troublesome thing is that libraries may not be fully
>> compliant to the C standard. We need to think of accuracy, input domain,
>> whether it produces consistent results, etc. The number of items can
>> increase, and developers of different libraries may be seeing different
>> demands.
>
> This is largely irrelevant to the topic of this thread. How you
> compile your library is up to you, this thread is about the
> expectation of what are the entry points of the library (functions,
> arguments) and returned values and types, so that we can replace
> scalar functions (already checked by the front-end) with vector
> alternatives (not checked by anyone).

I am saying about this because you said:

> One thing is to produce slow code because you didn't do enough,
> another is because you did too much. People often understand the
> former, not usually the latter.

The problem is that there are things that are not visible from the
current compiler, which are accuracy, input domain and performance. If
the compiled program does not give enough accuracy as it is designed, it
would also upset the users.

If you want to make the compiler choose the fastest functions in the
math libraries, we need at least a way to express how much performance
each function in the libraries provides. This is not a trivial thing.
The compiler also needs additional code for processing these figures.
The merit of LTO is that we can avoid the problem of expressing
performance. Since the compiler can see through everything in the
library, the existing optimization passes can be used without changes.

For accuracy, the only thing we can do is to make assumptions. The
easiest assumption is that the functions in a vector math library
conforms to the ANSI C standard and Vector ABI.


>> Another thing I want to know is how much compliance to the Vector
>> Function ABI is needed. I know Arm is keen in supporting this ABI, but
>> how about Intel? Is there possibility that SVML will comply to the
>> Vector Function ABI in the near future?
>
> That's a good questions and is mostly up to all of us to make sure
> that works in the future. If we all have clear expectations (and an
> official document goes a long way in providing that), then we'll all
> have a much easier job.
>
> Hope this helps.


Naoki Shibta

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
12