[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev

 

Illustrative Example:

 

clang -fveclib=SVML -O3 svml.c -mavx

 

#include <math.h>

void foo(double *a, int N){

  int i;

#pragma clang loop vectorize_width(8)

  for (i=0;i<N;i++){

    a[i] = sin(i);

  }

}

 

Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer.

This is 8-element SVML sin() called with 8-element argument. On the surface, this looks very good.

Later on, standard vector type legalization kicks-in but only the argument and return data are legalized.

        vmovaps %ymm0, %ymm1

        vcvtdq2pd       %xmm1, %ymm0

        vextractf128    $1, %ymm1, %xmm1

        vcvtdq2pd       %xmm1, %ymm1

        callq   __svml_sin8

        vmovups %ymm1, 32(%r15,%r12,8)

        vmovups %ymm0, (%r15,%r12,8)

Unfortunately, __svml_sin8() doesn’t use this form of input/output. It takes zmm0 and returns zmm0.

i.e., not legal to use for AVX.

 

What we need to see instead is two calls to __svml_sin4(), like below.

        vmovaps %ymm0, %ymm1

        vcvtdq2pd       %xmm1, %ymm0

        vextractf128    $1, %ymm1, %xmm1

        vcvtdq2pd       %xmm1, %ymm1

        callq   __svml_sin4

        vmovups %ymm0, 32(%r15,%r12,8)

        vmovups %ymm1, ymm0

        callq   __svml_sin4

        vmovups %ymm0, (%r15,%r12,8)

 

What would be the most acceptable way to make this happen? Anybody having had a similar need previously?

 

Easiest workaround is to serialize the call above “type legal” vectorization factor. This can be done with a few lines of code,

plus the code to recognize that the call is “SVML” (which is currently string match against “__svml” prefix in my local workspace).

If higher VF is not forced, cost model will likely favor lower VF. Functionally correct, but obviously not an ideal solution.

 

Here are a few ideas I thought about:

1)      Standard LegalizeVectorType() in CodeGen/SelectionDAG doesn’t seem to work. We could define a generic ISD::VECLIB
and try to split into two or more VECLIB nodes, but at that moment we lost the information about which function to call.
We can’t define ISD opcode per function. There will be too many libm entries to deal with. We need a scalable solution.

2)      We could write an IR to IR pass to perform IR level legalization. This is essentially duplicating the functionality of LegalizeVectorType()
but we can make this available for other similar things that can’t use ISD level vector type legalization. This looks to be attractive enough
from that perspective.

3)      We have implemented something similar to 2), but legalization code is specialized for SVML legalization. This was much quicker than
trying to generalize the legalization scheme, but I’d imagine community won’t like it.

4)      Vectorizer emit legalized VECLIB calls. Since it can emit instructions in scalarized form, adding legalized call functionality is in some sense
similar to that. Vectorizer can’t simply choose type legal function name with illegal vector ---- since LegalizeVectorType() will still
end up using one call instead of two.

 

Anything else?

 

Also, doing any of this requires reverse mapping from VECLIB name to scalar function name. What’s the most recommended way to do so?

Can we use TableGen to create a reverse map?

 

Your input is greatly appreciated. Is there a real need/desire for 2) outside of VECLIB (or outside of SVML)?

 

Thanks,

Hideki Saito

Intel Corporation

 

 


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev

Hi Saito,

 

At AMD we have our own version of vector library and faced similar problems, we followed the SVML path and from vectorizer generated the respective vector calls. When vectorizer generates the respective calls i.e __svml_sin_4 or __amdlibm_sin_4, later one can perform only string matching to identify the vector lib call. I’m not sure it’s the proper way, may be instead of generating respective calls it’s better to generate some standard call (may be intrinsics) and lower it later. A late IR pass can be introduced to perform lowering, this will lower the intrinsic calls to specific lib calls(__svml_sin_4 or __amdlibm_sin_4 or … ). This can be table driven to decide the action based on the vector library, function name, VF and target information, the action can be full-serialize, partial-serialize(VF8 to 2 VF4) or generate the lib call with same VF.

 

Thanks,

Ashutosh

 

From: llvm-dev [mailto:[hidden email]] On Behalf Of Saito, Hideki via llvm-dev
Sent: Friday, June 29, 2018 7:41 AM
To: 'Saito, Hideki via llvm-dev' <[hidden email]>
Subject: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

 

 

Illustrative Example:

 

clang -fveclib=SVML -O3 svml.c -mavx

 

#include <math.h>

void foo(double *a, int N){

  int i;

#pragma clang loop vectorize_width(8)

  for (i=0;i<N;i++){

    a[i] = sin(i);

  }

}

 

Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer.

This is 8-element SVML sin() called with 8-element argument. On the surface, this looks very good.

Later on, standard vector type legalization kicks-in but only the argument and return data are legalized.

        vmovaps %ymm0, %ymm1

        vcvtdq2pd       %xmm1, %ymm0

        vextractf128    $1, %ymm1, %xmm1

        vcvtdq2pd       %xmm1, %ymm1

        callq   __svml_sin8

        vmovups %ymm1, 32(%r15,%r12,8)

        vmovups %ymm0, (%r15,%r12,8)

Unfortunately, __svml_sin8() doesn’t use this form of input/output. It takes zmm0 and returns zmm0.

i.e., not legal to use for AVX.

 

What we need to see instead is two calls to __svml_sin4(), like below.

        vmovaps %ymm0, %ymm1

        vcvtdq2pd       %xmm1, %ymm0

        vextractf128    $1, %ymm1, %xmm1

        vcvtdq2pd       %xmm1, %ymm1

        callq   __svml_sin4

        vmovups %ymm0, 32(%r15,%r12,8)

        vmovups %ymm1, ymm0

        callq   __svml_sin4

        vmovups %ymm0, (%r15,%r12,8)

 

What would be the most acceptable way to make this happen? Anybody having had a similar need previously?

 

Easiest workaround is to serialize the call above “type legal” vectorization factor. This can be done with a few lines of code,

plus the code to recognize that the call is “SVML” (which is currently string match against “__svml” prefix in my local workspace).

If higher VF is not forced, cost model will likely favor lower VF. Functionally correct, but obviously not an ideal solution.

 

Here are a few ideas I thought about:

1)      Standard LegalizeVectorType() in CodeGen/SelectionDAG doesn’t seem to work. We could define a generic ISD::VECLIB
and try to split into two or more VECLIB nodes, but at that moment we lost the information about which function to call.
We can’t define ISD opcode per function. There will be too many libm entries to deal with. We need a scalable solution.

2)      We could write an IR to IR pass to perform IR level legalization. This is essentially duplicating the functionality of LegalizeVectorType()
but we can make this available for other similar things that can’t use ISD level vector type legalization. This looks to be attractive enough
from that perspective.

3)      We have implemented something similar to 2), but legalization code is specialized for SVML legalization. This was much quicker than
trying to generalize the legalization scheme, but I’d imagine community won’t like it.

4)      Vectorizer emit legalized VECLIB calls. Since it can emit instructions in scalarized form, adding legalized call functionality is in some sense
similar to that. Vectorizer can’t simply choose type legal function name with illegal vector ---- since LegalizeVectorType() will still
end up using one call instead of two.

 

Anything else?

 

Also, doing any of this requires reverse mapping from VECLIB name to scalar function name. What’s the most recommended way to do so?

Can we use TableGen to create a reverse map?

 

Your input is greatly appreciated. Is there a real need/desire for 2) outside of VECLIB (or outside of SVML)?

 

Thanks,

Hideki Saito

Intel Corporation

 

 


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev

 

Ashutosh,

 

Thanks for the repy.

 

Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there.

https://reviews.llvm.org/D19544

There, I see Hal’s review comment “let’s start only with the directly-legal calls”. Apparently, what we have right now

in the trunk is “not legal enough”. I’ll work on the patch to stop bleeding while we continue to discuss legalization topic.

 

I suppose

1)      LV only solution (let LV emit already legalized VECLIB calls) is certainly not scalable. It won’t help if VECLIB calls
are generated elsewhere. Also, keeping VF low enough to prevent the legalization problem is only a workaround,
not a solution.

2)      Assuming that we have to go to IR to IR pass route, there are 3 ways to think:

a.       Go with very generic IR to IR legalization pass comparable to ISD level legalization. This is most general
but I’d think this is the highest cost for development.

b.      Go with Intrinsic-only legalization and then apply VECLIB afterwards. This requires all scalar functions
with VECLIB mapping to be added to intrinsic.

c.       Go with generic enough function call legalization, with the ability to add custom legalization for each VECLIB
(and if needed each VECLIB or non-VECLIB entry).

 

I think the cost of 2.b) and 2.c) are similar and 2.c) seems to be more flexible. So, I guess we don’t really have to tie this

discussion with “letting LV emit widened math call instead of VECLIB”, even though I strongly favor that than LV emitting

VECLIB calls.

 

@Davide, in D19544, @spatel thought LibCallSimplifier has relevance to this legalization topic. Do you know enough about

LibCallSimiplifer to tell whether it can be extended to deal with 2.b) or 2.c)?

 

If we think 2.b)/2.c) are right enough directions, I can clean up what we have and upload it to Phabricator as a starting point

to get to 2.b)/2.c).

 

Continue waiting for more feedback. I guess I shouldn’t expect a lot this week and next due to the big holiday in the U.S.

 

Thanks,

Hideki

 

From: Nema, Ashutosh [mailto:[hidden email]]
Sent: Thursday, June 28, 2018 11:37 PM
To: Saito, Hideki <[hidden email]>
Cc: [hidden email]
Subject: RE: [RFC][VECLIB] how should we legalize VECLIB calls?

 

Hi Saito,

 

At AMD we have our own version of vector library and faced similar problems, we followed the SVML path and from vectorizer generated the respective vector calls. When vectorizer generates the respective calls i.e __svml_sin_4 or __amdlibm_sin_4, later one can perform only string matching to identify the vector lib call. I’m not sure it’s the proper way, may be instead of generating respective calls it’s better to generate some standard call (may be intrinsics) and lower it later. A late IR pass can be introduced to perform lowering, this will lower the intrinsic calls to specific lib calls(__svml_sin_4 or __amdlibm_sin_4 or … ). This can be table driven to decide the action based on the vector library, function name, VF and target information, the action can be full-serialize, partial-serialize(VF8 to 2 VF4) or generate the lib call with same VF.

 

Thanks,

Ashutosh

 

From: llvm-dev [[hidden email]] On Behalf Of Saito, Hideki via llvm-dev
Sent: Friday, June 29, 2018 7:41 AM
To: 'Saito, Hideki via llvm-dev' <[hidden email]>
Subject: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

 

 

Illustrative Example:

 

clang -fveclib=SVML -O3 svml.c -mavx

 

#include <math.h>

void foo(double *a, int N){

  int i;

#pragma clang loop vectorize_width(8)

  for (i=0;i<N;i++){

    a[i] = sin(i);

  }

}

 

Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer.

This is 8-element SVML sin() called with 8-element argument. On the surface, this looks very good.

Later on, standard vector type legalization kicks-in but only the argument and return data are legalized.

        vmovaps %ymm0, %ymm1

        vcvtdq2pd       %xmm1, %ymm0

        vextractf128    $1, %ymm1, %xmm1

        vcvtdq2pd       %xmm1, %ymm1

        callq   __svml_sin8

        vmovups %ymm1, 32(%r15,%r12,8)

        vmovups %ymm0, (%r15,%r12,8)

Unfortunately, __svml_sin8() doesn’t use this form of input/output. It takes zmm0 and returns zmm0.

i.e., not legal to use for AVX.

 

What we need to see instead is two calls to __svml_sin4(), like below.

        vmovaps %ymm0, %ymm1

        vcvtdq2pd       %xmm1, %ymm0

        vextractf128    $1, %ymm1, %xmm1

        vcvtdq2pd       %xmm1, %ymm1

        callq   __svml_sin4

        vmovups %ymm0, 32(%r15,%r12,8)

        vmovups %ymm1, ymm0

        callq   __svml_sin4

        vmovups %ymm0, (%r15,%r12,8)

 

What would be the most acceptable way to make this happen? Anybody having had a similar need previously?

 

Easiest workaround is to serialize the call above “type legal” vectorization factor. This can be done with a few lines of code,

plus the code to recognize that the call is “SVML” (which is currently string match against “__svml” prefix in my local workspace).

If higher VF is not forced, cost model will likely favor lower VF. Functionally correct, but obviously not an ideal solution.

 

Here are a few ideas I thought about:

1)      Standard LegalizeVectorType() in CodeGen/SelectionDAG doesn’t seem to work. We could define a generic ISD::VECLIB
and try to split into two or more VECLIB nodes, but at that moment we lost the information about which function to call.
We can’t define ISD opcode per function. There will be too many libm entries to deal with. We need a scalable solution.

2)      We could write an IR to IR pass to perform IR level legalization. This is essentially duplicating the functionality of LegalizeVectorType()
but we can make this available for other similar things that can’t use ISD level vector type legalization. This looks to be attractive enough
from that perspective.

3)      We have implemented something similar to 2), but legalization code is specialized for SVML legalization. This was much quicker than
trying to generalize the legalization scheme, but I’d imagine community won’t like it.

4)      Vectorizer emit legalized VECLIB calls. Since it can emit instructions in scalarized form, adding legalized call functionality is in some sense
similar to that. Vectorizer can’t simply choose type legal function name with illegal vector ---- since LegalizeVectorType() will still
end up using one call instead of two.

 

Anything else?

 

Also, doing any of this requires reverse mapping from VECLIB name to scalar function name. What’s the most recommended way to do so?

Can we use TableGen to create a reverse map?

 

Your input is greatly appreciated. Is there a real need/desire for 2) outside of VECLIB (or outside of SVML)?

 

Thanks,

Hideki Saito

Intel Corporation

 

 


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev
Hi Hideki -

I hinted at this problem in the summary text of https://reviews.llvm.org/D47610:
Why are we transforming from LLVM intrinsics to platform-specific intrinsics in IR? I don't see the benefit.

I don't know if it solves all of the problems you're seeing, but it should be a small change to transform to the platform-specific SVML or other intrinsics in the DAG. We already do this for mathlib calls on Linux for example when we can use the finite versions of the calls. Have a look in SelectionDAGLegalize::ConvertNodeToLibcall():

    if (CanUseFiniteLibCall && DAG.getLibInfo().has(LibFunc_log_finite))
      Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_FINITE_F32,
                                        RTLIB::LOG_FINITE_F64,
                                        RTLIB::LOG_FINITE_F80,
                                        RTLIB::LOG_FINITE_F128,
                                        RTLIB::LOG_FINITE_PPCF128));
    else
      Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_F32, RTLIB::LOG_F64,
                                        RTLIB::LOG_F80, RTLIB::LOG_F128,
                                        RTLIB::LOG_PPCF128));




On Fri, Jun 29, 2018 at 2:15 PM, Saito, Hideki <[hidden email]> wrote:

 

Ashutosh,

 

Thanks for the repy.

 

Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there.

https://reviews.llvm.org/D19544

There, I see Hal’s review comment “let’s start only with the directly-legal calls”. Apparently, what we have right now

in the trunk is “not legal enough”. I’ll work on the patch to stop bleeding while we continue to discuss legalization topic.

 

I suppose

1)      LV only solution (let LV emit already legalized VECLIB calls) is certainly not scalable. It won’t help if VECLIB calls
are generated elsewhere. Also, keeping VF low enough to prevent the legalization problem is only a workaround,
not a solution.

2)      Assuming that we have to go to IR to IR pass route, there are 3 ways to think:

a.       Go with very generic IR to IR legalization pass comparable to ISD level legalization. This is most general
but I’d think this is the highest cost for development.

b.      Go with Intrinsic-only legalization and then apply VECLIB afterwards. This requires all scalar functions
with VECLIB mapping to be added to intrinsic.

c.       Go with generic enough function call legalization, with the ability to add custom legalization for each VECLIB
(and if needed each VECLIB or non-VECLIB entry).

 

I think the cost of 2.b) and 2.c) are similar and 2.c) seems to be more flexible. So, I guess we don’t really have to tie this

discussion with “letting LV emit widened math call instead of VECLIB”, even though I strongly favor that than LV emitting

VECLIB calls.

 

@Davide, in D19544, @spatel thought LibCallSimplifier has relevance to this legalization topic. Do you know enough about

LibCallSimiplifer to tell whether it can be extended to deal with 2.b) or 2.c)?

 

If we think 2.b)/2.c) are right enough directions, I can clean up what we have and upload it to Phabricator as a starting point

to get to 2.b)/2.c).

 

Continue waiting for more feedback. I guess I shouldn’t expect a lot this week and next due to the big holiday in the U.S.

 

Thanks,

Hideki

 

From: Nema, Ashutosh [mailto:[hidden email]]
Sent: Thursday, June 28, 2018 11:37 PM
To: Saito, Hideki <[hidden email]>
Cc: [hidden email]
Subject: RE: [RFC][VECLIB] how should we legalize VECLIB calls?

 

Hi Saito,

 

At AMD we have our own version of vector library and faced similar problems, we followed the SVML path and from vectorizer generated the respective vector calls. When vectorizer generates the respective calls i.e __svml_sin_4 or __amdlibm_sin_4, later one can perform only string matching to identify the vector lib call. I’m not sure it’s the proper way, may be instead of generating respective calls it’s better to generate some standard call (may be intrinsics) and lower it later. A late IR pass can be introduced to perform lowering, this will lower the intrinsic calls to specific lib calls(__svml_sin_4 or __amdlibm_sin_4 or … ). This can be table driven to decide the action based on the vector library, function name, VF and target information, the action can be full-serialize, partial-serialize(VF8 to 2 VF4) or generate the lib call with same VF.

 

Thanks,

Ashutosh

 

From: llvm-dev [[hidden email]] On Behalf Of Saito, Hideki via llvm-dev
Sent: Friday, June 29, 2018 7:41 AM
To: 'Saito, Hideki via llvm-dev' <[hidden email]>
Subject: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

 

 

Illustrative Example:

 

clang -fveclib=SVML -O3 svml.c -mavx

 

#include <math.h>

void foo(double *a, int N){

  int i;

#pragma clang loop vectorize_width(8)

  for (i=0;i<N;i++){

    a[i] = sin(i);

  }

}

 

Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer.

This is 8-element SVML sin() called with 8-element argument. On the surface, this looks very good.

Later on, standard vector type legalization kicks-in but only the argument and return data are legalized.

        vmovaps %ymm0, %ymm1

        vcvtdq2pd       %xmm1, %ymm0

        vextractf128    $1, %ymm1, %xmm1

        vcvtdq2pd       %xmm1, %ymm1

        callq   __svml_sin8

        vmovups %ymm1, 32(%r15,%r12,8)

        vmovups %ymm0, (%r15,%r12,8)

Unfortunately, __svml_sin8() doesn’t use this form of input/output. It takes zmm0 and returns zmm0.

i.e., not legal to use for AVX.

 

What we need to see instead is two calls to __svml_sin4(), like below.

        vmovaps %ymm0, %ymm1

        vcvtdq2pd       %xmm1, %ymm0

        vextractf128    $1, %ymm1, %xmm1

        vcvtdq2pd       %xmm1, %ymm1

        callq   __svml_sin4

        vmovups %ymm0, 32(%r15,%r12,8)

        vmovups %ymm1, ymm0

        callq   __svml_sin4

        vmovups %ymm0, (%r15,%r12,8)

 

What would be the most acceptable way to make this happen? Anybody having had a similar need previously?

 

Easiest workaround is to serialize the call above “type legal” vectorization factor. This can be done with a few lines of code,

plus the code to recognize that the call is “SVML” (which is currently string match against “__svml” prefix in my local workspace).

If higher VF is not forced, cost model will likely favor lower VF. Functionally correct, but obviously not an ideal solution.

 

Here are a few ideas I thought about:

1)      Standard LegalizeVectorType() in CodeGen/SelectionDAG doesn’t seem to work. We could define a generic ISD::VECLIB
and try to split into two or more VECLIB nodes, but at that moment we lost the information about which function to call.
We can’t define ISD opcode per function. There will be too many libm entries to deal with. We need a scalable solution.

2)      We could write an IR to IR pass to perform IR level legalization. This is essentially duplicating the functionality of LegalizeVectorType()
but we can make this available for other similar things that can’t use ISD level vector type legalization. This looks to be attractive enough
from that perspective.

3)      We have implemented something similar to 2), but legalization code is specialized for SVML legalization. This was much quicker than
trying to generalize the legalization scheme, but I’d imagine community won’t like it.

4)      Vectorizer emit legalized VECLIB calls. Since it can emit instructions in scalarized form, adding legalized call functionality is in some sense
similar to that. Vectorizer can’t simply choose type legal function name with illegal vector ---- since LegalizeVectorType() will still
end up using one call instead of two.

 

Anything else?

 

Also, doing any of this requires reverse mapping from VECLIB name to scalar function name. What’s the most recommended way to do so?

Can we use TableGen to create a reverse map?

 

Your input is greatly appreciated. Is there a real need/desire for 2) outside of VECLIB (or outside of SVML)?

 

Thanks,

Hideki Saito

Intel Corporation

 

 



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev
Adding to Ashutosh's comments,  We are also interested in making LLVM generate vector math library calls that are available with glibc (version > 2.22).


Using the example case given in the reference, we found there are  2 vector versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx) version and _ZGVdN4v_sin (avx2) versions.  Following the SVML path adding new entry in VecDesc structure in TargetLibraryInfo.cpp,  we can generate the vector version.

But unable to decide which version to expand in the vectorizer. We needed the  TTI information (ISA ).  It looks like better to legalize or generate them later.
 
regards,
Venkat.


On 30 June 2018 at 04:04, Sanjay Patel via llvm-dev <[hidden email]> wrote:
Hi Hideki -

I hinted at this problem in the summary text of https://reviews.llvm.org/D47610:
Why are we transforming from LLVM intrinsics to platform-specific intrinsics in IR? I don't see the benefit.

I don't know if it solves all of the problems you're seeing, but it should be a small change to transform to the platform-specific SVML or other intrinsics in the DAG. We already do this for mathlib calls on Linux for example when we can use the finite versions of the calls. Have a look in SelectionDAGLegalize::ConvertNodeToLibcall():

    if (CanUseFiniteLibCall && DAG.getLibInfo().has(LibFunc_log_finite))
      Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_FINITE_F32,
                                        RTLIB::LOG_FINITE_F64,
                                        RTLIB::LOG_FINITE_F80,
                                        RTLIB::LOG_FINITE_F128,
                                        RTLIB::LOG_FINITE_PPCF128));
    else
      Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_F32, RTLIB::LOG_F64,
                                        RTLIB::LOG_F80, RTLIB::LOG_F128,
                                        RTLIB::LOG_PPCF128));




On Fri, Jun 29, 2018 at 2:15 PM, Saito, Hideki <[hidden email]> wrote:

 

Ashutosh,

 

Thanks for the repy.

 

Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there.

https://reviews.llvm.org/D19544

There, I see Hal’s review comment “let’s start only with the directly-legal calls”. Apparently, what we have right now

in the trunk is “not legal enough”. I’ll work on the patch to stop bleeding while we continue to discuss legalization topic.

 

I suppose

1)      LV only solution (let LV emit already legalized VECLIB calls) is certainly not scalable. It won’t help if VECLIB calls
are generated elsewhere. Also, keeping VF low enough to prevent the legalization problem is only a workaround,
not a solution.

2)      Assuming that we have to go to IR to IR pass route, there are 3 ways to think:

a.       Go with very generic IR to IR legalization pass comparable to ISD level legalization. This is most general
but I’d think this is the highest cost for development.

b.      Go with Intrinsic-only legalization and then apply VECLIB afterwards. This requires all scalar functions
with VECLIB mapping to be added to intrinsic.

c.       Go with generic enough function call legalization, with the ability to add custom legalization for each VECLIB
(and if needed each VECLIB or non-VECLIB entry).

 

I think the cost of 2.b) and 2.c) are similar and 2.c) seems to be more flexible. So, I guess we don’t really have to tie this

discussion with “letting LV emit widened math call instead of VECLIB”, even though I strongly favor that than LV emitting

VECLIB calls.

 

@Davide, in D19544, @spatel thought LibCallSimplifier has relevance to this legalization topic. Do you know enough about

LibCallSimiplifer to tell whether it can be extended to deal with 2.b) or 2.c)?

 

If we think 2.b)/2.c) are right enough directions, I can clean up what we have and upload it to Phabricator as a starting point

to get to 2.b)/2.c).

 

Continue waiting for more feedback. I guess I shouldn’t expect a lot this week and next due to the big holiday in the U.S.

 

Thanks,

Hideki

 

From: Nema, Ashutosh [mailto:[hidden email]]
Sent: Thursday, June 28, 2018 11:37 PM
To: Saito, Hideki <[hidden email]>
Cc: [hidden email]
Subject: RE: [RFC][VECLIB] how should we legalize VECLIB calls?

 

Hi Saito,

 

At AMD we have our own version of vector library and faced similar problems, we followed the SVML path and from vectorizer generated the respective vector calls. When vectorizer generates the respective calls i.e __svml_sin_4 or __amdlibm_sin_4, later one can perform only string matching to identify the vector lib call. I’m not sure it’s the proper way, may be instead of generating respective calls it’s better to generate some standard call (may be intrinsics) and lower it later. A late IR pass can be introduced to perform lowering, this will lower the intrinsic calls to specific lib calls(__svml_sin_4 or __amdlibm_sin_4 or … ). This can be table driven to decide the action based on the vector library, function name, VF and target information, the action can be full-serialize, partial-serialize(VF8 to 2 VF4) or generate the lib call with same VF.

 

Thanks,

Ashutosh

 

From: llvm-dev [[hidden email]] On Behalf Of Saito, Hideki via llvm-dev
Sent: Friday, June 29, 2018 7:41 AM
To: 'Saito, Hideki via llvm-dev' <[hidden email]>
Subject: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

 

 

Illustrative Example:

 

clang -fveclib=SVML -O3 svml.c -mavx

 

#include <math.h>

void foo(double *a, int N){

  int i;

#pragma clang loop vectorize_width(8)

  for (i=0;i<N;i++){

    a[i] = sin(i);

  }

}

 

Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer.

This is 8-element SVML sin() called with 8-element argument. On the surface, this looks very good.

Later on, standard vector type legalization kicks-in but only the argument and return data are legalized.

        vmovaps %ymm0, %ymm1

        vcvtdq2pd       %xmm1, %ymm0

        vextractf128    $1, %ymm1, %xmm1

        vcvtdq2pd       %xmm1, %ymm1

        callq   __svml_sin8

        vmovups %ymm1, 32(%r15,%r12,8)

        vmovups %ymm0, (%r15,%r12,8)

Unfortunately, __svml_sin8() doesn’t use this form of input/output. It takes zmm0 and returns zmm0.

i.e., not legal to use for AVX.

 

What we need to see instead is two calls to __svml_sin4(), like below.

        vmovaps %ymm0, %ymm1

        vcvtdq2pd       %xmm1, %ymm0

        vextractf128    $1, %ymm1, %xmm1

        vcvtdq2pd       %xmm1, %ymm1

        callq   __svml_sin4

        vmovups %ymm0, 32(%r15,%r12,8)

        vmovups %ymm1, ymm0

        callq   __svml_sin4

        vmovups %ymm0, (%r15,%r12,8)

 

What would be the most acceptable way to make this happen? Anybody having had a similar need previously?

 

Easiest workaround is to serialize the call above “type legal” vectorization factor. This can be done with a few lines of code,

plus the code to recognize that the call is “SVML” (which is currently string match against “__svml” prefix in my local workspace).

If higher VF is not forced, cost model will likely favor lower VF. Functionally correct, but obviously not an ideal solution.

 

Here are a few ideas I thought about:

1)      Standard LegalizeVectorType() in CodeGen/SelectionDAG doesn’t seem to work. We could define a generic ISD::VECLIB
and try to split into two or more VECLIB nodes, but at that moment we lost the information about which function to call.
We can’t define ISD opcode per function. There will be too many libm entries to deal with. We need a scalable solution.

2)      We could write an IR to IR pass to perform IR level legalization. This is essentially duplicating the functionality of LegalizeVectorType()
but we can make this available for other similar things that can’t use ISD level vector type legalization. This looks to be attractive enough
from that perspective.

3)      We have implemented something similar to 2), but legalization code is specialized for SVML legalization. This was much quicker than
trying to generalize the legalization scheme, but I’d imagine community won’t like it.

4)      Vectorizer emit legalized VECLIB calls. Since it can emit instructions in scalarized form, adding legalized call functionality is in some sense
similar to that. Vectorizer can’t simply choose type legal function name with illegal vector ---- since LegalizeVectorType() will still
end up using one call instead of two.

 

Anything else?

 

Also, doing any of this requires reverse mapping from VECLIB name to scalar function name. What’s the most recommended way to do so?

Can we use TableGen to create a reverse map?

 

Your input is greatly appreciated. Is there a real need/desire for 2) outside of VECLIB (or outside of SVML)?

 

Thanks,

Hideki Saito

Intel Corporation

 

 



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev

 

Venkat, we did not invent LLVM’s VecLib functionality. The original version of D19544 (https://reviews.llvm.org/D19544?id=55036) was indeed a separate pass to convert widened math lib to SVML.

Our preference for “vectorized sin()” is just widened sin(), that is to be lowered to a specific library call at a later point (either as IR to IR or in CodeGen). Matt tried to sell that idea and it didn’t go through.

Anyone else willing to work with us to try it again? In my opinion, however, this is a related but different topic from legalization issue.

 

Sanjay, I think what you are suggesting would work better if we don’t map math lib calls to VecLib. Otherwise, we’ll have too many RTLIB:VECLIB_ enums, one from each different math function multiplied by each vectorization factor --- for each different VecLib. That’s way too many. If that’s one per different math functions, I’d guess it’s 100+. Still a lot but manageable. This requires those functions to be listed in the intrinsics, right? That’s another reason some people favor VecLib mapping at vectorizer. Those math functions don’t have to be added to the intrinsics.

 

I don’t insist on IR to IR legalization. However, I’m also interested in being able to legalize OpenMP declare simd function calls (**). These are user functions and as such we have no ways to list them as intrinsics or have RTLIB: enums predefined. For each Target, vector function ABI defines how the parameters need to be passed and Legalizer should be implemented based on the ABI, w/o knowing the details of what the user function does. Math lib only solution doesn’t help legalization of OpenMP declare simd.

 

Thanks,

Hideki

 

--------------------------------

(**)

#pragma omp declare simd uniform(a), linear(i)

void foo(float *a, int i);

 

 

#pragma omp simd

for(i) {                   // this loop could be vectorized with VF that’s wider than widest available vector function for foo().
    …
    foo(a, i)
    …

}

 

From: Venkataramanan Kumar [mailto:[hidden email]]
Sent: Sunday, July 01, 2018 11:38 PM
To: Sanjay Patel <[hidden email]>
Cc: Saito, Hideki <[hidden email]>; [hidden email]; Masten, Matt <[hidden email]>; [hidden email]
Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

 

Adding to Ashutosh's comments,  We are also interested in making LLVM generate vector math library calls that are available with glibc (version > 2.22).

 

 

Using the example case given in the reference, we found there are  2 vector versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx) version and _ZGVdN4v_sin (avx2) versions.  Following the SVML path adding new entry in VecDesc structure in TargetLibraryInfo.cpp,  we can generate the vector version.

 

But unable to decide which version to expand in the vectorizer. We needed the  TTI information (ISA ).  It looks like better to legalize or generate them later.

 

regards,

Venkat.

 

 

On 30 June 2018 at 04:04, Sanjay Patel via llvm-dev <[hidden email]> wrote:

Hi Hideki -

 

I hinted at this problem in the summary text of https://reviews.llvm.org/D47610:

Why are we transforming from LLVM intrinsics to platform-specific intrinsics in IR? I don't see the benefit.

 

I don't know if it solves all of the problems you're seeing, but it should be a small change to transform to the platform-specific SVML or other intrinsics in the DAG. We already do this for mathlib calls on Linux for example when we can use the finite versions of the calls. Have a look in SelectionDAGLegalize::ConvertNodeToLibcall():

 

    if (CanUseFiniteLibCall && DAG.getLibInfo().has(LibFunc_log_finite))
      Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_FINITE_F32,
                                        RTLIB::LOG_FINITE_F64,
                                        RTLIB::LOG_FINITE_F80,
                                        RTLIB::LOG_FINITE_F128,
                                        RTLIB::LOG_FINITE_PPCF128));
    else
      Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_F32, RTLIB::LOG_F64,
                                        RTLIB::LOG_F80, RTLIB::LOG_F128,
                                        RTLIB::LOG_PPCF128));

 

 

 

 

On Fri, Jun 29, 2018 at 2:15 PM, Saito, Hideki <[hidden email]> wrote:

 

Ashutosh,

 

Thanks for the repy.

 

Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there.

https://reviews.llvm.org/D19544

There, I see Hal’s review comment “let’s start only with the directly-legal calls”. Apparently, what we have right now

in the trunk is “not legal enough”. I’ll work on the patch to stop bleeding while we continue to discuss legalization topic.

 

I suppose

1)      LV only solution (let LV emit already legalized VECLIB calls) is certainly not scalable. It won’t help if VECLIB calls
are generated elsewhere. Also, keeping VF low enough to prevent the legalization problem is only a workaround,
not a solution.

2)      Assuming that we have to go to IR to IR pass route, there are 3 ways to think:

a.       Go with very generic IR to IR legalization pass comparable to ISD level legalization. This is most general
but I’d think this is the highest cost for development.

b.      Go with Intrinsic-only legalization and then apply VECLIB afterwards. This requires all scalar functions
with VECLIB mapping to be added to intrinsic.

c.       Go with generic enough function call legalization, with the ability to add custom legalization for each VECLIB
(and if needed each VECLIB or non-VECLIB entry).

 

I think the cost of 2.b) and 2.c) are similar and 2.c) seems to be more flexible. So, I guess we don’t really have to tie this

discussion with “letting LV emit widened math call instead of VECLIB”, even though I strongly favor that than LV emitting

VECLIB calls.

 

@Davide, in D19544, @spatel thought LibCallSimplifier has relevance to this legalization topic. Do you know enough about

LibCallSimiplifer to tell whether it can be extended to deal with 2.b) or 2.c)?

 

If we think 2.b)/2.c) are right enough directions, I can clean up what we have and upload it to Phabricator as a starting point

to get to 2.b)/2.c).

 

Continue waiting for more feedback. I guess I shouldn’t expect a lot this week and next due to the big holiday in the U.S.

 

Thanks,

Hideki

 

From: Nema, Ashutosh [mailto:[hidden email]]
Sent: Thursday, June 28, 2018 11:37 PM
To: Saito, Hideki <[hidden email]>
Cc: [hidden email]
Subject: RE: [RFC][VECLIB] how should we legalize VECLIB calls?

 

Hi Saito,

 

At AMD we have our own version of vector library and faced similar problems, we followed the SVML path and from vectorizer generated the respective vector calls. When vectorizer generates the respective calls i.e __svml_sin_4 or __amdlibm_sin_4, later one can perform only string matching to identify the vector lib call. I’m not sure it’s the proper way, may be instead of generating respective calls it’s better to generate some standard call (may be intrinsics) and lower it later. A late IR pass can be introduced to perform lowering, this will lower the intrinsic calls to specific lib calls(__svml_sin_4 or __amdlibm_sin_4 or … ). This can be table driven to decide the action based on the vector library, function name, VF and target information, the action can be full-serialize, partial-serialize(VF8 to 2 VF4) or generate the lib call with same VF.

 

Thanks,

Ashutosh

 

From: llvm-dev [[hidden email]] On Behalf Of Saito, Hideki via llvm-dev
Sent: Friday, June 29, 2018 7:41 AM
To: 'Saito, Hideki via llvm-dev' <[hidden email]>
Subject: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

 

 

Illustrative Example:

 

clang -fveclib=SVML -O3 svml.c -mavx

 

#include <math.h>

void foo(double *a, int N){

  int i;

#pragma clang loop vectorize_width(8)

  for (i=0;i<N;i++){

    a[i] = sin(i);

  }

}

 

Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer.

This is 8-element SVML sin() called with 8-element argument. On the surface, this looks very good.

Later on, standard vector type legalization kicks-in but only the argument and return data are legalized.

        vmovaps %ymm0, %ymm1

        vcvtdq2pd       %xmm1, %ymm0

        vextractf128    $1, %ymm1, %xmm1

        vcvtdq2pd       %xmm1, %ymm1

        callq   __svml_sin8

        vmovups %ymm1, 32(%r15,%r12,8)

        vmovups %ymm0, (%r15,%r12,8)

Unfortunately, __svml_sin8() doesn’t use this form of input/output. It takes zmm0 and returns zmm0.

i.e., not legal to use for AVX.

 

What we need to see instead is two calls to __svml_sin4(), like below.

        vmovaps %ymm0, %ymm1

        vcvtdq2pd       %xmm1, %ymm0

        vextractf128    $1, %ymm1, %xmm1

        vcvtdq2pd       %xmm1, %ymm1

        callq   __svml_sin4

        vmovups %ymm0, 32(%r15,%r12,8)

        vmovups %ymm1, ymm0

        callq   __svml_sin4

        vmovups %ymm0, (%r15,%r12,8)

 

What would be the most acceptable way to make this happen? Anybody having had a similar need previously?

 

Easiest workaround is to serialize the call above “type legal” vectorization factor. This can be done with a few lines of code,

plus the code to recognize that the call is “SVML” (which is currently string match against “__svml” prefix in my local workspace).

If higher VF is not forced, cost model will likely favor lower VF. Functionally correct, but obviously not an ideal solution.

 

Here are a few ideas I thought about:

1)      Standard LegalizeVectorType() in CodeGen/SelectionDAG doesn’t seem to work. We could define a generic ISD::VECLIB
and try to split into two or more VECLIB nodes, but at that moment we lost the information about which function to call.
We can’t define ISD opcode per function. There will be too many libm entries to deal with. We need a scalable solution.

2)      We could write an IR to IR pass to perform IR level legalization. This is essentially duplicating the functionality of LegalizeVectorType()
but we can make this available for other similar things that can’t use ISD level vector type legalization. This looks to be attractive enough
from that perspective.

3)      We have implemented something similar to 2), but legalization code is specialized for SVML legalization. This was much quicker than
trying to generalize the legalization scheme, but I’d imagine community won’t like it.

4)      Vectorizer emit legalized VECLIB calls. Since it can emit instructions in scalarized form, adding legalized call functionality is in some sense
similar to that. Vectorizer can’t simply choose type legal function name with illegal vector ---- since LegalizeVectorType() will still
end up using one call instead of two.

 

Anything else?

 

Also, doing any of this requires reverse mapping from VECLIB name to scalar function name. What’s the most recommended way to do so?

Can we use TableGen to create a reverse map?

 

Your input is greatly appreciated. Is there a real need/desire for 2) outside of VECLIB (or outside of SVML)?

 

Thanks,

Hideki Saito

Intel Corporation

 

 

 


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

 


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev
It may not be a full solution for the problems you're trying to solve, but I don't know why adding to include/llvm/CodeGen/RuntimeLibcalls.def is a problem in itself. Certainly, it's a mess that could be organized, especially so we're not repeating everything for each data type as we do right now.

So yes, I think that would allow us to remove the VecLib mappings because we are always waiting until codegen to make the translation from generic IR to target-specific libcall. Or is there some reason that the vectorizer needs to be aware of those libcalls?

On Mon, Jul 2, 2018 at 11:52 AM, Saito, Hideki <[hidden email]> wrote:

 

Venkat, we did not invent LLVM’s VecLib functionality. The original version of D19544 (https://reviews.llvm.org/D19544?id=55036) was indeed a separate pass to convert widened math lib to SVML.

Our preference for “vectorized sin()” is just widened sin(), that is to be lowered to a specific library call at a later point (either as IR to IR or in CodeGen). Matt tried to sell that idea and it didn’t go through.

Anyone else willing to work with us to try it again? In my opinion, however, this is a related but different topic from legalization issue.

 

Sanjay, I think what you are suggesting would work better if we don’t map math lib calls to VecLib. Otherwise, we’ll have too many RTLIB:VECLIB_ enums, one from each different math function multiplied by each vectorization factor --- for each different VecLib. That’s way too many. If that’s one per different math functions, I’d guess it’s 100+. Still a lot but manageable. This requires those functions to be listed in the intrinsics, right? That’s another reason some people favor VecLib mapping at vectorizer. Those math functions don’t have to be added to the intrinsics.

 

I don’t insist on IR to IR legalization. However, I’m also interested in being able to legalize OpenMP declare simd function calls (**). These are user functions and as such we have no ways to list them as intrinsics or have RTLIB: enums predefined. For each Target, vector function ABI defines how the parameters need to be passed and Legalizer should be implemented based on the ABI, w/o knowing the details of what the user function does. Math lib only solution doesn’t help legalization of OpenMP declare simd.

 

Thanks,

Hideki

 

--------------------------------

(**)

#pragma omp declare simd uniform(a), linear(i)

void foo(float *a, int i);

 

 

#pragma omp simd

for(i) {                   // this loop could be vectorized with VF that’s wider than widest available vector function for foo().
    …
    foo(a, i)
    …

}

 

From: Venkataramanan Kumar [mailto:[hidden email]]
Sent: Sunday, July 01, 2018 11:38 PM
To: Sanjay Patel <[hidden email]>
Cc: Saito, Hideki <[hidden email]>; [hidden email]; Masten, Matt <[hidden email]>; [hidden email]
Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

 

Adding to Ashutosh's comments,  We are also interested in making LLVM generate vector math library calls that are available with glibc (version > 2.22).

 

 

Using the example case given in the reference, we found there are  2 vector versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx) version and _ZGVdN4v_sin (avx2) versions.  Following the SVML path adding new entry in VecDesc structure in TargetLibraryInfo.cpp,  we can generate the vector version.

 

But unable to decide which version to expand in the vectorizer. We needed the  TTI information (ISA ).  It looks like better to legalize or generate them later.

 

regards,

Venkat.

 

 

On 30 June 2018 at 04:04, Sanjay Patel via llvm-dev <[hidden email]> wrote:

Hi Hideki -

 

I hinted at this problem in the summary text of https://reviews.llvm.org/D47610:

Why are we transforming from LLVM intrinsics to platform-specific intrinsics in IR? I don't see the benefit.

 

I don't know if it solves all of the problems you're seeing, but it should be a small change to transform to the platform-specific SVML or other intrinsics in the DAG. We already do this for mathlib calls on Linux for example when we can use the finite versions of the calls. Have a look in SelectionDAGLegalize::ConvertNodeToLibcall():

 

    if (CanUseFiniteLibCall && DAG.getLibInfo().has(LibFunc_log_finite))
      Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_FINITE_F32,
                                        RTLIB::LOG_FINITE_F64,
                                        RTLIB::LOG_FINITE_F80,
                                        RTLIB::LOG_FINITE_F128,
                                        RTLIB::LOG_FINITE_PPCF128));
    else
      Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_F32, RTLIB::LOG_F64,
                                        RTLIB::LOG_F80, RTLIB::LOG_F128,
                                        RTLIB::LOG_PPCF128));

 

 

 

 

On Fri, Jun 29, 2018 at 2:15 PM, Saito, Hideki <[hidden email]> wrote:

 

Ashutosh,

 

Thanks for the repy.

 

Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there.

https://reviews.llvm.org/D19544

There, I see Hal’s review comment “let’s start only with the directly-legal calls”. Apparently, what we have right now

in the trunk is “not legal enough”. I’ll work on the patch to stop bleeding while we continue to discuss legalization topic.

 

I suppose

1)      LV only solution (let LV emit already legalized VECLIB calls) is certainly not scalable. It won’t help if VECLIB calls
are generated elsewhere. Also, keeping VF low enough to prevent the legalization problem is only a workaround,
not a solution.

2)      Assuming that we have to go to IR to IR pass route, there are 3 ways to think:

a.       Go with very generic IR to IR legalization pass comparable to ISD level legalization. This is most general
but I’d think this is the highest cost for development.

b.      Go with Intrinsic-only legalization and then apply VECLIB afterwards. This requires all scalar functions
with VECLIB mapping to be added to intrinsic.

c.       Go with generic enough function call legalization, with the ability to add custom legalization for each VECLIB
(and if needed each VECLIB or non-VECLIB entry).

 

I think the cost of 2.b) and 2.c) are similar and 2.c) seems to be more flexible. So, I guess we don’t really have to tie this

discussion with “letting LV emit widened math call instead of VECLIB”, even though I strongly favor that than LV emitting

VECLIB calls.

 

@Davide, in D19544, @spatel thought LibCallSimplifier has relevance to this legalization topic. Do you know enough about

LibCallSimiplifer to tell whether it can be extended to deal with 2.b) or 2.c)?

 

If we think 2.b)/2.c) are right enough directions, I can clean up what we have and upload it to Phabricator as a starting point

to get to 2.b)/2.c).

 

Continue waiting for more feedback. I guess I shouldn’t expect a lot this week and next due to the big holiday in the U.S.

 

Thanks,

Hideki

 

From: Nema, Ashutosh [mailto:[hidden email]]
Sent: Thursday, June 28, 2018 11:37 PM
To: Saito, Hideki <[hidden email]>
Cc: [hidden email]
Subject: RE: [RFC][VECLIB] how should we legalize VECLIB calls?

 

Hi Saito,

 

At AMD we have our own version of vector library and faced similar problems, we followed the SVML path and from vectorizer generated the respective vector calls. When vectorizer generates the respective calls i.e __svml_sin_4 or __amdlibm_sin_4, later one can perform only string matching to identify the vector lib call. I’m not sure it’s the proper way, may be instead of generating respective calls it’s better to generate some standard call (may be intrinsics) and lower it later. A late IR pass can be introduced to perform lowering, this will lower the intrinsic calls to specific lib calls(__svml_sin_4 or __amdlibm_sin_4 or … ). This can be table driven to decide the action based on the vector library, function name, VF and target information, the action can be full-serialize, partial-serialize(VF8 to 2 VF4) or generate the lib call with same VF.

 

Thanks,

Ashutosh

 

From: llvm-dev [[hidden email]] On Behalf Of Saito, Hideki via llvm-dev
Sent: Friday, June 29, 2018 7:41 AM
To: 'Saito, Hideki via llvm-dev' <[hidden email]>
Subject: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

 

 

Illustrative Example:

 

clang -fveclib=SVML -O3 svml.c -mavx

 

#include <math.h>

void foo(double *a, int N){

  int i;

#pragma clang loop vectorize_width(8)

  for (i=0;i<N;i++){

    a[i] = sin(i);

  }

}

 

Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer.

This is 8-element SVML sin() called with 8-element argument. On the surface, this looks very good.

Later on, standard vector type legalization kicks-in but only the argument and return data are legalized.

        vmovaps %ymm0, %ymm1

        vcvtdq2pd       %xmm1, %ymm0

        vextractf128    $1, %ymm1, %xmm1

        vcvtdq2pd       %xmm1, %ymm1

        callq   __svml_sin8

        vmovups %ymm1, 32(%r15,%r12,8)

        vmovups %ymm0, (%r15,%r12,8)

Unfortunately, __svml_sin8() doesn’t use this form of input/output. It takes zmm0 and returns zmm0.

i.e., not legal to use for AVX.

 

What we need to see instead is two calls to __svml_sin4(), like below.

        vmovaps %ymm0, %ymm1

        vcvtdq2pd       %xmm1, %ymm0

        vextractf128    $1, %ymm1, %xmm1

        vcvtdq2pd       %xmm1, %ymm1

        callq   __svml_sin4

        vmovups %ymm0, 32(%r15,%r12,8)

        vmovups %ymm1, ymm0

        callq   __svml_sin4

        vmovups %ymm0, (%r15,%r12,8)

 

What would be the most acceptable way to make this happen? Anybody having had a similar need previously?

 

Easiest workaround is to serialize the call above “type legal” vectorization factor. This can be done with a few lines of code,

plus the code to recognize that the call is “SVML” (which is currently string match against “__svml” prefix in my local workspace).

If higher VF is not forced, cost model will likely favor lower VF. Functionally correct, but obviously not an ideal solution.

 

Here are a few ideas I thought about:

1)      Standard LegalizeVectorType() in CodeGen/SelectionDAG doesn’t seem to work. We could define a generic ISD::VECLIB
and try to split into two or more VECLIB nodes, but at that moment we lost the information about which function to call.
We can’t define ISD opcode per function. There will be too many libm entries to deal with. We need a scalable solution.

2)      We could write an IR to IR pass to perform IR level legalization. This is essentially duplicating the functionality of LegalizeVectorType()
but we can make this available for other similar things that can’t use ISD level vector type legalization. This looks to be attractive enough
from that perspective.

3)      We have implemented something similar to 2), but legalization code is specialized for SVML legalization. This was much quicker than
trying to generalize the legalization scheme, but I’d imagine community won’t like it.

4)      Vectorizer emit legalized VECLIB calls. Since it can emit instructions in scalarized form, adding legalized call functionality is in some sense
similar to that. Vectorizer can’t simply choose type legal function name with illegal vector ---- since LegalizeVectorType() will still
end up using one call instead of two.

 

Anything else?

 

Also, doing any of this requires reverse mapping from VECLIB name to scalar function name. What’s the most recommended way to do so?

Can we use TableGen to create a reverse map?

 

Your input is greatly appreciated. Is there a real need/desire for 2) outside of VECLIB (or outside of SVML)?

 

Thanks,

Hideki Saito

Intel Corporation

 

 

 


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

 



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev

 

>It may not be a full solution for the problems you're trying to solve

 

If we are inventing a new solution, I’d like it also to solve OpenMP declare simd legalization issue. If a small extension of existing scheme

works for mathlib only, I’m happy to take that and discuss OpenMP declare simd issue separately.

 

>Or is there some reason that the vectorizer needs to be aware of those libcalls?

 

I’m a strong believer of CodeGen mapping (scalar and widened) mathlib calls to actual library (or inlined sequence).

So, that question needs to be answered by someone else.

 

Adding Michael and Hal.

 

 

From: Sanjay Patel [mailto:[hidden email]]
Sent: Monday, July 02, 2018 11:49 AM
To: Saito, Hideki <[hidden email]>
Cc: Venkataramanan Kumar <[hidden email]>; [hidden email]; Masten, Matt <[hidden email]>; [hidden email]
Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

 

It may not be a full solution for the problems you're trying to solve, but I don't know why adding to include/llvm/CodeGen/RuntimeLibcalls.def is a problem in itself. Certainly, it's a mess that could be organized, especially so we're not repeating everything for each data type as we do right now.

 

So yes, I think that would allow us to remove the VecLib mappings because we are always waiting until codegen to make the translation from generic IR to target-specific libcall. Or is there some reason that the vectorizer needs to be aware of those libcalls?

 

On Mon, Jul 2, 2018 at 11:52 AM, Saito, Hideki <[hidden email]> wrote:

 

Venkat, we did not invent LLVM’s VecLib functionality. The original version of D19544 (https://reviews.llvm.org/D19544?id=55036) was indeed a separate pass to convert widened math lib to SVML.

Our preference for “vectorized sin()” is just widened sin(), that is to be lowered to a specific library call at a later point (either as IR to IR or in CodeGen). Matt tried to sell that idea and it didn’t go through.

Anyone else willing to work with us to try it again? In my opinion, however, this is a related but different topic from legalization issue.

 

Sanjay, I think what you are suggesting would work better if we don’t map math lib calls to VecLib. Otherwise, we’ll have too many RTLIB:VECLIB_ enums, one from each different math function multiplied by each vectorization factor --- for each different VecLib. That’s way too many. If that’s one per different math functions, I’d guess it’s 100+. Still a lot but manageable. This requires those functions to be listed in the intrinsics, right? That’s another reason some people favor VecLib mapping at vectorizer. Those math functions don’t have to be added to the intrinsics.

 

I don’t insist on IR to IR legalization. However, I’m also interested in being able to legalize OpenMP declare simd function calls (**). These are user functions and as such we have no ways to list them as intrinsics or have RTLIB: enums predefined. For each Target, vector function ABI defines how the parameters need to be passed and Legalizer should be implemented based on the ABI, w/o knowing the details of what the user function does. Math lib only solution doesn’t help legalization of OpenMP declare simd.

 

Thanks,

Hideki

 

--------------------------------

(**)

#pragma omp declare simd uniform(a), linear(i)

void foo(float *a, int i);

 

 

#pragma omp simd

for(i) {                   // this loop could be vectorized with VF that’s wider than widest available vector function for foo().
    …
    foo(a, i)
    …

}

 

From: Venkataramanan Kumar [mailto:[hidden email]]
Sent: Sunday, July 01, 2018 11:38 PM
To: Sanjay Patel <[hidden email]>
Cc: Saito, Hideki <[hidden email]>; [hidden email]; Masten, Matt <[hidden email]>; [hidden email]
Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

 

Adding to Ashutosh's comments,  We are also interested in making LLVM generate vector math library calls that are available with glibc (version > 2.22).

 

 

Using the example case given in the reference, we found there are  2 vector versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx) version and _ZGVdN4v_sin (avx2) versions.  Following the SVML path adding new entry in VecDesc structure in TargetLibraryInfo.cpp,  we can generate the vector version.

 

But unable to decide which version to expand in the vectorizer. We needed the  TTI information (ISA ).  It looks like better to legalize or generate them later.

 

regards,

Venkat.

 

 

On 30 June 2018 at 04:04, Sanjay Patel via llvm-dev <[hidden email]> wrote:

Hi Hideki -

 

I hinted at this problem in the summary text of https://reviews.llvm.org/D47610:

Why are we transforming from LLVM intrinsics to platform-specific intrinsics in IR? I don't see the benefit.

 

I don't know if it solves all of the problems you're seeing, but it should be a small change to transform to the platform-specific SVML or other intrinsics in the DAG. We already do this for mathlib calls on Linux for example when we can use the finite versions of the calls. Have a look in SelectionDAGLegalize::ConvertNodeToLibcall():

 

    if (CanUseFiniteLibCall && DAG.getLibInfo().has(LibFunc_log_finite))
      Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_FINITE_F32,
                                        RTLIB::LOG_FINITE_F64,
                                        RTLIB::LOG_FINITE_F80,
                                        RTLIB::LOG_FINITE_F128,
                                        RTLIB::LOG_FINITE_PPCF128));
    else
      Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_F32, RTLIB::LOG_F64,
                                        RTLIB::LOG_F80, RTLIB::LOG_F128,
                                        RTLIB::LOG_PPCF128));

 

 

 

 

On Fri, Jun 29, 2018 at 2:15 PM, Saito, Hideki <[hidden email]> wrote:

 

Ashutosh,

 

Thanks for the repy.

 

Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there.

https://reviews.llvm.org/D19544

There, I see Hal’s review comment “let’s start only with the directly-legal calls”. Apparently, what we have right now

in the trunk is “not legal enough”. I’ll work on the patch to stop bleeding while we continue to discuss legalization topic.

 

I suppose

1)      LV only solution (let LV emit already legalized VECLIB calls) is certainly not scalable. It won’t help if VECLIB calls
are generated elsewhere. Also, keeping VF low enough to prevent the legalization problem is only a workaround,
not a solution.

2)      Assuming that we have to go to IR to IR pass route, there are 3 ways to think:

a.       Go with very generic IR to IR legalization pass comparable to ISD level legalization. This is most general
but I’d think this is the highest cost for development.

b.      Go with Intrinsic-only legalization and then apply VECLIB afterwards. This requires all scalar functions
with VECLIB mapping to be added to intrinsic.

c.       Go with generic enough function call legalization, with the ability to add custom legalization for each VECLIB
(and if needed each VECLIB or non-VECLIB entry).

 

I think the cost of 2.b) and 2.c) are similar and 2.c) seems to be more flexible. So, I guess we don’t really have to tie this

discussion with “letting LV emit widened math call instead of VECLIB”, even though I strongly favor that than LV emitting

VECLIB calls.

 

@Davide, in D19544, @spatel thought LibCallSimplifier has relevance to this legalization topic. Do you know enough about

LibCallSimiplifer to tell whether it can be extended to deal with 2.b) or 2.c)?

 

If we think 2.b)/2.c) are right enough directions, I can clean up what we have and upload it to Phabricator as a starting point

to get to 2.b)/2.c).

 

Continue waiting for more feedback. I guess I shouldn’t expect a lot this week and next due to the big holiday in the U.S.

 

Thanks,

Hideki

 

From: Nema, Ashutosh [mailto:[hidden email]]
Sent: Thursday, June 28, 2018 11:37 PM
To: Saito, Hideki <[hidden email]>
Cc: [hidden email]
Subject: RE: [RFC][VECLIB] how should we legalize VECLIB calls?

 

Hi Saito,

 

At AMD we have our own version of vector library and faced similar problems, we followed the SVML path and from vectorizer generated the respective vector calls. When vectorizer generates the respective calls i.e __svml_sin_4 or __amdlibm_sin_4, later one can perform only string matching to identify the vector lib call. I’m not sure it’s the proper way, may be instead of generating respective calls it’s better to generate some standard call (may be intrinsics) and lower it later. A late IR pass can be introduced to perform lowering, this will lower the intrinsic calls to specific lib calls(__svml_sin_4 or __amdlibm_sin_4 or … ). This can be table driven to decide the action based on the vector library, function name, VF and target information, the action can be full-serialize, partial-serialize(VF8 to 2 VF4) or generate the lib call with same VF.

 

Thanks,

Ashutosh

 

From: llvm-dev [[hidden email]] On Behalf Of Saito, Hideki via llvm-dev
Sent: Friday, June 29, 2018 7:41 AM
To: 'Saito, Hideki via llvm-dev' <[hidden email]>
Subject: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

 

 

Illustrative Example:

 

clang -fveclib=SVML -O3 svml.c -mavx

 

#include <math.h>

void foo(double *a, int N){

  int i;

#pragma clang loop vectorize_width(8)

  for (i=0;i<N;i++){

    a[i] = sin(i);

  }

}

 

Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer.

This is 8-element SVML sin() called with 8-element argument. On the surface, this looks very good.

Later on, standard vector type legalization kicks-in but only the argument and return data are legalized.

        vmovaps %ymm0, %ymm1

        vcvtdq2pd       %xmm1, %ymm0

        vextractf128    $1, %ymm1, %xmm1

        vcvtdq2pd       %xmm1, %ymm1

        callq   __svml_sin8

        vmovups %ymm1, 32(%r15,%r12,8)

        vmovups %ymm0, (%r15,%r12,8)

Unfortunately, __svml_sin8() doesn’t use this form of input/output. It takes zmm0 and returns zmm0.

i.e., not legal to use for AVX.

 

What we need to see instead is two calls to __svml_sin4(), like below.

        vmovaps %ymm0, %ymm1

        vcvtdq2pd       %xmm1, %ymm0

        vextractf128    $1, %ymm1, %xmm1

        vcvtdq2pd       %xmm1, %ymm1

        callq   __svml_sin4

        vmovups %ymm0, 32(%r15,%r12,8)

        vmovups %ymm1, ymm0

        callq   __svml_sin4

        vmovups %ymm0, (%r15,%r12,8)

 

What would be the most acceptable way to make this happen? Anybody having had a similar need previously?

 

Easiest workaround is to serialize the call above “type legal” vectorization factor. This can be done with a few lines of code,

plus the code to recognize that the call is “SVML” (which is currently string match against “__svml” prefix in my local workspace).

If higher VF is not forced, cost model will likely favor lower VF. Functionally correct, but obviously not an ideal solution.

 

Here are a few ideas I thought about:

1)      Standard LegalizeVectorType() in CodeGen/SelectionDAG doesn’t seem to work. We could define a generic ISD::VECLIB
and try to split into two or more VECLIB nodes, but at that moment we lost the information about which function to call.
We can’t define ISD opcode per function. There will be too many libm entries to deal with. We need a scalable solution.

2)      We could write an IR to IR pass to perform IR level legalization. This is essentially duplicating the functionality of LegalizeVectorType()
but we can make this available for other similar things that can’t use ISD level vector type legalization. This looks to be attractive enough
from that perspective.

3)      We have implemented something similar to 2), but legalization code is specialized for SVML legalization. This was much quicker than
trying to generalize the legalization scheme, but I’d imagine community won’t like it.

4)      Vectorizer emit legalized VECLIB calls. Since it can emit instructions in scalarized form, adding legalized call functionality is in some sense
similar to that. Vectorizer can’t simply choose type legal function name with illegal vector ---- since LegalizeVectorType() will still
end up using one call instead of two.

 

Anything else?

 

Also, doing any of this requires reverse mapping from VECLIB name to scalar function name. What’s the most recommended way to do so?

Can we use TableGen to create a reverse map?

 

Your input is greatly appreciated. Is there a real need/desire for 2) outside of VECLIB (or outside of SVML)?

 

Thanks,

Hideki Saito

Intel Corporation

 

 

 


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

 

 


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev


On 07/02/2018 04:33 PM, Saito, Hideki wrote:

 

>It may not be a full solution for the problems you're trying to solve

 

If we are inventing a new solution, I’d like it also to solve OpenMP declare simd legalization issue. If a small extension of existing scheme

works for mathlib only, I’m happy to take that and discuss OpenMP declare simd issue separately.


I completely agree. We need a solution to handle 'declare simd' calls, or to put it another way, arbitrary user-defined functions. To me, this really looks like an ABI issue. If we have a function, __foo__computeit8(<8 x float> %x), then if our lowering of <8 x float> doesn't match the required register assignments, then we have the wrong ABI. Will https://reviews.llvm.org/D47188 fix this?

 -Hal

 

>Or is there some reason that the vectorizer needs to be aware of those libcalls?

 

I’m a strong believer of CodeGen mapping (scalar and widened) mathlib calls to actual library (or inlined sequence).

So, that question needs to be answered by someone else.

 

Adding Michael and Hal.

 

 

From: Sanjay Patel [[hidden email]]
Sent: Monday, July 02, 2018 11:49 AM
To: Saito, Hideki [hidden email]
Cc: Venkataramanan Kumar [hidden email]; [hidden email]; Masten, Matt [hidden email]; [hidden email]
Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

 

It may not be a full solution for the problems you're trying to solve, but I don't know why adding to include/llvm/CodeGen/RuntimeLibcalls.def is a problem in itself. Certainly, it's a mess that could be organized, especially so we're not repeating everything for each data type as we do right now.

 

So yes, I think that would allow us to remove the VecLib mappings because we are always waiting until codegen to make the translation from generic IR to target-specific libcall. Or is there some reason that the vectorizer needs to be aware of those libcalls?

 

On Mon, Jul 2, 2018 at 11:52 AM, Saito, Hideki <[hidden email]> wrote:

 

Venkat, we did not invent LLVM’s VecLib functionality. The original version of D19544 (https://reviews.llvm.org/D19544?id=55036) was indeed a separate pass to convert widened math lib to SVML.

Our preference for “vectorized sin()” is just widened sin(), that is to be lowered to a specific library call at a later point (either as IR to IR or in CodeGen). Matt tried to sell that idea and it didn’t go through.

Anyone else willing to work with us to try it again? In my opinion, however, this is a related but different topic from legalization issue.

 

Sanjay, I think what you are suggesting would work better if we don’t map math lib calls to VecLib. Otherwise, we’ll have too many RTLIB:VECLIB_ enums, one from each different math function multiplied by each vectorization factor --- for each different VecLib. That’s way too many. If that’s one per different math functions, I’d guess it’s 100+. Still a lot but manageable. This requires those functions to be listed in the intrinsics, right? That’s another reason some people favor VecLib mapping at vectorizer. Those math functions don’t have to be added to the intrinsics.

 

I don’t insist on IR to IR legalization. However, I’m also interested in being able to legalize OpenMP declare simd function calls (**). These are user functions and as such we have no ways to list them as intrinsics or have RTLIB: enums predefined. For each Target, vector function ABI defines how the parameters need to be passed and Legalizer should be implemented based on the ABI, w/o knowing the details of what the user function does. Math lib only solution doesn’t help legalization of OpenMP declare simd.

 

Thanks,

Hideki

 

--------------------------------

(**)

#pragma omp declare simd uniform(a), linear(i)

void foo(float *a, int i);

 

 

#pragma omp simd

for(i) {                   // this loop could be vectorized with VF that’s wider than widest available vector function for foo().
    …
    foo(a, i)
    …

}

 

From: Venkataramanan Kumar [mailto:[hidden email]]
Sent: Sunday, July 01, 2018 11:38 PM
To: Sanjay Patel <[hidden email]>
Cc: Saito, Hideki <[hidden email]>; [hidden email]; Masten, Matt <[hidden email]>; [hidden email]
Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

 

Adding to Ashutosh's comments,  We are also interested in making LLVM generate vector math library calls that are available with glibc (version > 2.22).

 

 

Using the example case given in the reference, we found there are  2 vector versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx) version and _ZGVdN4v_sin (avx2) versions.  Following the SVML path adding new entry in VecDesc structure in TargetLibraryInfo.cpp,  we can generate the vector version.

 

But unable to decide which version to expand in the vectorizer. We needed the  TTI information (ISA ).  It looks like better to legalize or generate them later.

 

regards,

Venkat.

 

 

On 30 June 2018 at 04:04, Sanjay Patel via llvm-dev <[hidden email]> wrote:

Hi Hideki -

 

I hinted at this problem in the summary text of https://reviews.llvm.org/D47610:

Why are we transforming from LLVM intrinsics to platform-specific intrinsics in IR? I don't see the benefit.

 

I don't know if it solves all of the problems you're seeing, but it should be a small change to transform to the platform-specific SVML or other intrinsics in the DAG. We already do this for mathlib calls on Linux for example when we can use the finite versions of the calls. Have a look in SelectionDAGLegalize::ConvertNodeToLibcall():

 

    if (CanUseFiniteLibCall && DAG.getLibInfo().has(LibFunc_log_finite))
      Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_FINITE_F32,
                                        RTLIB::LOG_FINITE_F64,
                                        RTLIB::LOG_FINITE_F80,
                                        RTLIB::LOG_FINITE_F128,
                                        RTLIB::LOG_FINITE_PPCF128));
    else
      Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_F32, RTLIB::LOG_F64,
                                        RTLIB::LOG_F80, RTLIB::LOG_F128,
                                        RTLIB::LOG_PPCF128));

 

 

 

 

On Fri, Jun 29, 2018 at 2:15 PM, Saito, Hideki <[hidden email]> wrote:

 

Ashutosh,

 

Thanks for the repy.

 

Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there.

https://reviews.llvm.org/D19544

There, I see Hal’s review comment “let’s start only with the directly-legal calls”. Apparently, what we have right now

in the trunk is “not legal enough”. I’ll work on the patch to stop bleeding while we continue to discuss legalization topic.

 

I suppose

1)      LV only solution (let LV emit already legalized VECLIB calls) is certainly not scalable. It won’t help if VECLIB calls
are generated elsewhere. Also, keeping VF low enough to prevent the legalization problem is only a workaround,
not a solution.

2)      Assuming that we have to go to IR to IR pass route, there are 3 ways to think:

a.       Go with very generic IR to IR legalization pass comparable to ISD level legalization. This is most general
but I’d think this is the highest cost for development.

b.      Go with Intrinsic-only legalization and then apply VECLIB afterwards. This requires all scalar functions
with VECLIB mapping to be added to intrinsic.

c.       Go with generic enough function call legalization, with the ability to add custom legalization for each VECLIB
(and if needed each VECLIB or non-VECLIB entry).

 

I think the cost of 2.b) and 2.c) are similar and 2.c) seems to be more flexible. So, I guess we don’t really have to tie this

discussion with “letting LV emit widened math call instead of VECLIB”, even though I strongly favor that than LV emitting

VECLIB calls.

 

@Davide, in D19544, @spatel thought LibCallSimplifier has relevance to this legalization topic. Do you know enough about

LibCallSimiplifer to tell whether it can be extended to deal with 2.b) or 2.c)?

 

If we think 2.b)/2.c) are right enough directions, I can clean up what we have and upload it to Phabricator as a starting point

to get to 2.b)/2.c).

 

Continue waiting for more feedback. I guess I shouldn’t expect a lot this week and next due to the big holiday in the U.S.

 

Thanks,

Hideki

 

From: Nema, Ashutosh [mailto:[hidden email]]
Sent: Thursday, June 28, 2018 11:37 PM
To: Saito, Hideki <[hidden email]>
Cc: [hidden email]
Subject: RE: [RFC][VECLIB] how should we legalize VECLIB calls?

 

Hi Saito,

 

At AMD we have our own version of vector library and faced similar problems, we followed the SVML path and from vectorizer generated the respective vector calls. When vectorizer generates the respective calls i.e __svml_sin_4 or __amdlibm_sin_4, later one can perform only string matching to identify the vector lib call. I’m not sure it’s the proper way, may be instead of generating respective calls it’s better to generate some standard call (may be intrinsics) and lower it later. A late IR pass can be introduced to perform lowering, this will lower the intrinsic calls to specific lib calls(__svml_sin_4 or __amdlibm_sin_4 or … ). This can be table driven to decide the action based on the vector library, function name, VF and target information, the action can be full-serialize, partial-serialize(VF8 to 2 VF4) or generate the lib call with same VF.

 

Thanks,

Ashutosh

 

From: llvm-dev [[hidden email]] On Behalf Of Saito, Hideki via llvm-dev
Sent: Friday, June 29, 2018 7:41 AM
To: 'Saito, Hideki via llvm-dev' <[hidden email]>
Subject: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

 

 

Illustrative Example:

 

clang -fveclib=SVML -O3 svml.c -mavx

 

#include <math.h>

void foo(double *a, int N){

  int i;

#pragma clang loop vectorize_width(8)

  for (i=0;i<N;i++){

    a[i] = sin(i);

  }

}

 

Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer.

This is 8-element SVML sin() called with 8-element argument. On the surface, this looks very good.

Later on, standard vector type legalization kicks-in but only the argument and return data are legalized.

        vmovaps %ymm0, %ymm1

        vcvtdq2pd       %xmm1, %ymm0

        vextractf128    $1, %ymm1, %xmm1

        vcvtdq2pd       %xmm1, %ymm1

        callq   __svml_sin8

        vmovups %ymm1, 32(%r15,%r12,8)

        vmovups %ymm0, (%r15,%r12,8)

Unfortunately, __svml_sin8() doesn’t use this form of input/output. It takes zmm0 and returns zmm0.

i.e., not legal to use for AVX.

 

What we need to see instead is two calls to __svml_sin4(), like below.

        vmovaps %ymm0, %ymm1

        vcvtdq2pd       %xmm1, %ymm0

        vextractf128    $1, %ymm1, %xmm1

        vcvtdq2pd       %xmm1, %ymm1

        callq   __svml_sin4

        vmovups %ymm0, 32(%r15,%r12,8)

        vmovups %ymm1, ymm0

        callq   __svml_sin4

        vmovups %ymm0, (%r15,%r12,8)

 

What would be the most acceptable way to make this happen? Anybody having had a similar need previously?

 

Easiest workaround is to serialize the call above “type legal” vectorization factor. This can be done with a few lines of code,

plus the code to recognize that the call is “SVML” (which is currently string match against “__svml” prefix in my local workspace).

If higher VF is not forced, cost model will likely favor lower VF. Functionally correct, but obviously not an ideal solution.

 

Here are a few ideas I thought about:

1)      Standard LegalizeVectorType() in CodeGen/SelectionDAG doesn’t seem to work. We could define a generic ISD::VECLIB
and try to split into two or more VECLIB nodes, but at that moment we lost the information about which function to call.
We can’t define ISD opcode per function. There will be too many libm entries to deal with. We need a scalable solution.

2)      We could write an IR to IR pass to perform IR level legalization. This is essentially duplicating the functionality of LegalizeVectorType()
but we can make this available for other similar things that can’t use ISD level vector type legalization. This looks to be attractive enough
from that perspective.

3)      We have implemented something similar to 2), but legalization code is specialized for SVML legalization. This was much quicker than
trying to generalize the legalization scheme, but I’d imagine community won’t like it.

4)      Vectorizer emit legalized VECLIB calls. Since it can emit instructions in scalarized form, adding legalized call functionality is in some sense
similar to that. Vectorizer can’t simply choose type legal function name with illegal vector ---- since LegalizeVectorType() will still
end up using one call instead of two.

 

Anything else?

 

Also, doing any of this requires reverse mapping from VECLIB name to scalar function name. What’s the most recommended way to do so?

Can we use TableGen to create a reverse map?

 

Your input is greatly appreciated. Is there a real need/desire for 2) outside of VECLIB (or outside of SVML)?

 

Thanks,

Hideki Saito

Intel Corporation

 

 

 


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

 

 


-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev

Hal>To me, this really looks like an ABI issue.

Being a vectorizer guy, I never thought of it in that way, but I can see why you say it in that way.

Hal >Will https://reviews.llvm.org/D47188 fix this?

This, I know the answer. It does not. Denis, the author of the patch, is one of those who asked us to resolve the SVML legalization issue.
If you consider VecLib function name and also the TTI based availability of the entry as part of its ABI, you can think of the issue as ABI
conformance transformation
        <4 x double> <4 x double> __svml_sin8(<4 x double> <4 x double>)
==>
        <4 x double> __svml_sin4(<4 x double>)
        <4 x double> __svml_sin4(<4 x double>)

And the same could also be true for OpenMP declare SIMD. Do you think equivalent of this ugly thing is also okay?
        <8 x double> __svml_sin4(<8 x double>) <<< note the use of 4-element sin () over 8-elements
==>
        <4 x double> <4 x double> __svml_sin4(<4 x double> <4 x double>)
==>
        <4 x double> __svml_sin4(<4 x double>)
        <4 x double> __svml_sin4(<4 x double>)
This is essentially what has to happen if declare simd says 4-way vector function is available, but LV wants to vectorize the caller loop in 8-way.
Alternative is bump up the availability at LV (only for the name, not the cost) and then later let ABI say "no, only 4-way is available", fix this ABI.
A little convoluted but it may still work.

Everyone reasonably comfortable enough with this "deal with the issue as an ABI resolution" direction? We won't know whether this direction
really works or not until we dig in deeper, but I think this direction should be explored before IR to IR legalization and also before trying to
add bunch of math libs in the intrinsic table.

Any other ideas?

Thanks,
Hideki
-------------------------------------
From: Hal Finkel [mailto:[hidden email]]
Sent: Monday, July 02, 2018 3:59 PM
To: Saito, Hideki <[hidden email]>; Sanjay Patel <[hidden email]>; [hidden email]
Cc: Venkataramanan Kumar <[hidden email]>; [hidden email]; Masten, Matt <[hidden email]>; [hidden email]
Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?


On 07/02/2018 04:33 PM, Saito, Hideki wrote:
 
>It may not be a full solution for the problems you're trying to solve
 
If we are inventing a new solution, I’d like it also to solve OpenMP declare simd legalization issue. If a small extension of existing scheme
works for mathlib only, I’m happy to take that and discuss OpenMP declare simd issue separately.

I completely agree. We need a solution to handle 'declare simd' calls, or to put it another way, arbitrary user-defined functions. To me, this really looks like an ABI issue. If we have a function, __foo__computeit8(<8 x float> %x), then if our lowering of <8 x float> doesn't match the required register assignments, then we have the wrong ABI. Will https://reviews.llvm.org/D47188 fix this?

 -Hal


 
>Or is there some reason that the vectorizer needs to be aware of those libcalls?
 
I’m a strong believer of CodeGen mapping (scalar and widened) mathlib calls to actual library (or inlined sequence).
So, that question needs to be answered by someone else.
 
Adding Michael and Hal.
 
 
From: Sanjay Patel [mailto:[hidden email]]
Sent: Monday, July 02, 2018 11:49 AM
To: Saito, Hideki <[hidden email]>
Cc: Venkataramanan Kumar <[hidden email]>; [hidden email]; Masten, Matt <[hidden email]>; [hidden email]
Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
 
It may not be a full solution for the problems you're trying to solve, but I don't know why adding to include/llvm/CodeGen/RuntimeLibcalls.def is a problem in itself. Certainly, it's a mess that could be organized, especially so we're not repeating everything for each data type as we do right now.
 
So yes, I think that would allow us to remove the VecLib mappings because we are always waiting until codegen to make the translation from generic IR to target-specific libcall. Or is there some reason that the vectorizer needs to be aware of those libcalls?
 
On Mon, Jul 2, 2018 at 11:52 AM, Saito, Hideki <[hidden email]> wrote:
 
Venkat, we did not invent LLVM’s VecLib functionality. The original version of D19544 (https://reviews.llvm.org/D19544?id=55036) was indeed a separate pass to convert widened math lib to SVML.
Our preference for “vectorized sin()” is just widened sin(), that is to be lowered to a specific library call at a later point (either as IR to IR or in CodeGen). Matt tried to sell that idea and it didn’t go through.
Anyone else willing to work with us to try it again? In my opinion, however, this is a related but different topic from legalization issue.
 
Sanjay, I think what you are suggesting would work better if we don’t map math lib calls to VecLib. Otherwise, we’ll have too many RTLIB:VECLIB_ enums, one from each different math function multiplied by each vectorization factor --- for each different VecLib. That’s way too many. If that’s one per different math functions, I’d guess it’s 100+. Still a lot but manageable. This requires those functions to be listed in the intrinsics, right? That’s another reason some people favor VecLib mapping at vectorizer. Those math functions don’t have to be added to the intrinsics.
 
I don’t insist on IR to IR legalization. However, I’m also interested in being able to legalize OpenMP declare simd function calls (**). These are user functions and as such we have no ways to list them as intrinsics or have RTLIB: enums predefined. For each Target, vector function ABI defines how the parameters need to be passed and Legalizer should be implemented based on the ABI, w/o knowing the details of what the user function does. Math lib only solution doesn’t help legalization of OpenMP declare simd.
 
Thanks,
Hideki
 
--------------------------------
(**)
#pragma omp declare simd uniform(a), linear(i)
void foo(float *a, int i);
 

 
#pragma omp simd
for(i) {                   // this loop could be vectorized with VF that’s wider than widest available vector function for foo().
    …
    foo(a, i)
    …
}
 
From: Venkataramanan Kumar [mailto:[hidden email]]
Sent: Sunday, July 01, 2018 11:38 PM
To: Sanjay Patel <[hidden email]>
Cc: Saito, Hideki <[hidden email]>; [hidden email]; Masten, Matt <[hidden email]>; [hidden email]
Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
 
Adding to Ashutosh's comments,  We are also interested in making LLVM generate vector math library calls that are available with glibc (version > 2.22).
 
reference: https://sourceware.org/glibc/wiki/libmvec
 
Using the example case given in the reference, we found there are  2 vector versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx) version and _ZGVdN4v_sin (avx2) versions.  Following the SVML path adding new entry in VecDesc structure in TargetLibraryInfo.cpp,  we can generate the vector version.
 
But unable to decide which version to expand in the vectorizer. We needed the  TTI information (ISA ).  It looks like better to legalize or generate them later.
 
regards,
Venkat.
 
 
On 30 June 2018 at 04:04, Sanjay Patel via llvm-dev <[hidden email]> wrote:
Hi Hideki -
 
I hinted at this problem in the summary text of https://reviews.llvm.org/D47610:
Why are we transforming from LLVM intrinsics to platform-specific intrinsics in IR? I don't see the benefit.
 
I don't know if it solves all of the problems you're seeing, but it should be a small change to transform to the platform-specific SVML or other intrinsics in the DAG. We already do this for mathlib calls on Linux for example when we can use the finite versions of the calls. Have a look in SelectionDAGLegalize::ConvertNodeToLibcall():
 
    if (CanUseFiniteLibCall && DAG.getLibInfo().has(LibFunc_log_finite))
      Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_FINITE_F32,
                                        RTLIB::LOG_FINITE_F64,
                                        RTLIB::LOG_FINITE_F80,
                                        RTLIB::LOG_FINITE_F128,
                                        RTLIB::LOG_FINITE_PPCF128));
    else
      Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_F32, RTLIB::LOG_F64,
                                        RTLIB::LOG_F80, RTLIB::LOG_F128,
                                        RTLIB::LOG_PPCF128));
 
 
 
 
On Fri, Jun 29, 2018 at 2:15 PM, Saito, Hideki <[hidden email]> wrote:
 
Ashutosh,
 
Thanks for the repy.
 
Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there.
      https://reviews.llvm.org/D19544
There, I see Hal’s review comment “let’s start only with the directly-legal calls”. Apparently, what we have right now
in the trunk is “not legal enough”. I’ll work on the patch to stop bleeding while we continue to discuss legalization topic.
 
I suppose
1)      LV only solution (let LV emit already legalized VECLIB calls) is certainly not scalable. It won’t help if VECLIB calls
are generated elsewhere. Also, keeping VF low enough to prevent the legalization problem is only a workaround,
not a solution.
2)      Assuming that we have to go to IR to IR pass route, there are 3 ways to think:
a.       Go with very generic IR to IR legalization pass comparable to ISD level legalization. This is most general
but I’d think this is the highest cost for development.
b.      Go with Intrinsic-only legalization and then apply VECLIB afterwards. This requires all scalar functions
with VECLIB mapping to be added to intrinsic.
c.       Go with generic enough function call legalization, with the ability to add custom legalization for each VECLIB
(and if needed each VECLIB or non-VECLIB entry).
 
I think the cost of 2.b) and 2.c) are similar and 2.c) seems to be more flexible. So, I guess we don’t really have to tie this
discussion with “letting LV emit widened math call instead of VECLIB”, even though I strongly favor that than LV emitting
VECLIB calls.
 
@Davide, in D19544, @spatel thought LibCallSimplifier has relevance to this legalization topic. Do you know enough about
LibCallSimiplifer to tell whether it can be extended to deal with 2.b) or 2.c)?
 
If we think 2.b)/2.c) are right enough directions, I can clean up what we have and upload it to Phabricator as a starting point
to get to 2.b)/2.c).
 
Continue waiting for more feedback. I guess I shouldn’t expect a lot this week and next due to the big holiday in the U.S.
 
Thanks,
Hideki
 
From: Nema, Ashutosh [mailto:[hidden email]]
Sent: Thursday, June 28, 2018 11:37 PM
To: Saito, Hideki <[hidden email]>
Cc: [hidden email]
Subject: RE: [RFC][VECLIB] how should we legalize VECLIB calls?
 
Hi Saito,
 
At AMD we have our own version of vector library and faced similar problems, we followed the SVML path and from vectorizer generated the respective vector calls. When vectorizer generates the respective calls i.e __svml_sin_4 or __amdlibm_sin_4, later one can perform only string matching to identify the vector lib call. I’m not sure it’s the proper way, may be instead of generating respective calls it’s better to generate some standard call (may be intrinsics) and lower it later. A late IR pass can be introduced to perform lowering, this will lower the intrinsic calls to specific lib calls(__svml_sin_4 or __amdlibm_sin_4 or … ). This can be table driven to decide the action based on the vector library, function name, VF and target information, the action can be full-serialize, partial-serialize(VF8 to 2 VF4) or generate the lib call with same VF.
 
Thanks,
Ashutosh
 
From: llvm-dev [mailto:[hidden email]] On Behalf Of Saito, Hideki via llvm-dev
Sent: Friday, June 29, 2018 7:41 AM
To: 'Saito, Hideki via llvm-dev' <[hidden email]>
Subject: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
 
 
Illustrative Example:
 
clang -fveclib=SVML -O3 svml.c -mavx
 
#include <math.h>
void foo(double *a, int N){
  int i;
#pragma clang loop vectorize_width(8)
  for (i=0;i<N;i++){
    a[i] = sin(i);
  }
}
 
Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer.
This is 8-element SVML sin() called with 8-element argument. On the surface, this looks very good.
Later on, standard vector type legalization kicks-in but only the argument and return data are legalized.
        vmovaps %ymm0, %ymm1
        vcvtdq2pd       %xmm1, %ymm0
        vextractf128    $1, %ymm1, %xmm1
        vcvtdq2pd       %xmm1, %ymm1
        callq   __svml_sin8
        vmovups %ymm1, 32(%r15,%r12,8)
        vmovups %ymm0, (%r15,%r12,8)
Unfortunately, __svml_sin8() doesn’t use this form of input/output. It takes zmm0 and returns zmm0.
i.e., not legal to use for AVX.
 
What we need to see instead is two calls to __svml_sin4(), like below.
        vmovaps %ymm0, %ymm1
        vcvtdq2pd       %xmm1, %ymm0
        vextractf128    $1, %ymm1, %xmm1
        vcvtdq2pd       %xmm1, %ymm1
        callq   __svml_sin4
        vmovups %ymm0, 32(%r15,%r12,8)
        vmovups %ymm1, ymm0
        callq   __svml_sin4
        vmovups %ymm0, (%r15,%r12,8)
 
What would be the most acceptable way to make this happen? Anybody having had a similar need previously?
 
Easiest workaround is to serialize the call above “type legal” vectorization factor. This can be done with a few lines of code,
plus the code to recognize that the call is “SVML” (which is currently string match against “__svml” prefix in my local workspace).
If higher VF is not forced, cost model will likely favor lower VF. Functionally correct, but obviously not an ideal solution.
 
Here are a few ideas I thought about:
1)      Standard LegalizeVectorType() in CodeGen/SelectionDAG doesn’t seem to work. We could define a generic ISD::VECLIB
and try to split into two or more VECLIB nodes, but at that moment we lost the information about which function to call.
We can’t define ISD opcode per function. There will be too many libm entries to deal with. We need a scalable solution.
2)      We could write an IR to IR pass to perform IR level legalization. This is essentially duplicating the functionality of LegalizeVectorType()
but we can make this available for other similar things that can’t use ISD level vector type legalization. This looks to be attractive enough
from that perspective.
3)      We have implemented something similar to 2), but legalization code is specialized for SVML legalization. This was much quicker than
trying to generalize the legalization scheme, but I’d imagine community won’t like it.
4)      Vectorizer emit legalized VECLIB calls. Since it can emit instructions in scalarized form, adding legalized call functionality is in some sense
similar to that. Vectorizer can’t simply choose type legal function name with illegal vector ---- since LegalizeVectorType() will still
end up using one call instead of two.
 
Anything else?
 
Also, doing any of this requires reverse mapping from VECLIB name to scalar function name. What’s the most recommended way to do so?
Can we use TableGen to create a reverse map?
 
Your input is greatly appreciated. Is there a real need/desire for 2) outside of VECLIB (or outside of SVML)?
 
Thanks,
Hideki Saito
Intel Corporation
 
 
 

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
 
 


--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev
Would like to know your thoughts of adding a VecLibInst Class whereby we capture the relevant details regarding a VecLib Call like
1. Name of the function called
2. The alignment info
3. Vector Factor
4. Other parameters.

We do it for MemInst like memcpy and memset. Can't we take similar approach for VecLib calls as well?

-----Original Message-----
From: llvm-dev [mailto:[hidden email]] On Behalf Of Saito, Hideki via llvm-dev
Sent: Tuesday, July 3, 2018 6:03 AM
To: Hal Finkel <[hidden email]>; Sanjay Patel <[hidden email]>; [hidden email]
Cc: [hidden email]; [hidden email]; Masten, Matt <[hidden email]>
Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?


Hal>To me, this really looks like an ABI issue.

Being a vectorizer guy, I never thought of it in that way, but I can see why you say it in that way.

Hal >Will https://reviews.llvm.org/D47188 fix this?

This, I know the answer. It does not. Denis, the author of the patch, is one of those who asked us to resolve the SVML legalization issue.
If you consider VecLib function name and also the TTI based availability of the entry as part of its ABI, you can think of the issue as ABI conformance transformation
        <4 x double> <4 x double> __svml_sin8(<4 x double> <4 x double>) ==>
        <4 x double> __svml_sin4(<4 x double>)
        <4 x double> __svml_sin4(<4 x double>)

And the same could also be true for OpenMP declare SIMD. Do you think equivalent of this ugly thing is also okay?
        <8 x double> __svml_sin4(<8 x double>) <<< note the use of 4-element sin () over 8-elements
==>
        <4 x double> <4 x double> __svml_sin4(<4 x double> <4 x double>) ==>
        <4 x double> __svml_sin4(<4 x double>)
        <4 x double> __svml_sin4(<4 x double>)
This is essentially what has to happen if declare simd says 4-way vector function is available, but LV wants to vectorize the caller loop in 8-way.
Alternative is bump up the availability at LV (only for the name, not the cost) and then later let ABI say "no, only 4-way is available", fix this ABI.
A little convoluted but it may still work.

Everyone reasonably comfortable enough with this "deal with the issue as an ABI resolution" direction? We won't know whether this direction really works or not until we dig in deeper, but I think this direction should be explored before IR to IR legalization and also before trying to add bunch of math libs in the intrinsic table.

Any other ideas?

Thanks,
Hideki
-------------------------------------
From: Hal Finkel [mailto:[hidden email]]
Sent: Monday, July 02, 2018 3:59 PM
To: Saito, Hideki <[hidden email]>; Sanjay Patel <[hidden email]>; [hidden email]
Cc: Venkataramanan Kumar <[hidden email]>; [hidden email]; Masten, Matt <[hidden email]>; [hidden email]
Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?


On 07/02/2018 04:33 PM, Saito, Hideki wrote:
 
>It may not be a full solution for the problems you're trying to solve
 
If we are inventing a new solution, I’d like it also to solve OpenMP declare simd legalization issue. If a small extension of existing scheme
works for mathlib only, I’m happy to take that and discuss OpenMP declare simd issue separately.

I completely agree. We need a solution to handle 'declare simd' calls, or to put it another way, arbitrary user-defined functions. To me, this really looks like an ABI issue. If we have a function, __foo__computeit8(<8 x float> %x), then if our lowering of <8 x float> doesn't match the required register assignments, then we have the wrong ABI. Will https://reviews.llvm.org/D47188 fix this?

 -Hal


 
>Or is there some reason that the vectorizer needs to be aware of those libcalls?
 
I’m a strong believer of CodeGen mapping (scalar and widened) mathlib calls to actual library (or inlined sequence).
So, that question needs to be answered by someone else.
 
Adding Michael and Hal.
 
 
From: Sanjay Patel [mailto:[hidden email]]
Sent: Monday, July 02, 2018 11:49 AM
To: Saito, Hideki <[hidden email]>
Cc: Venkataramanan Kumar <[hidden email]>; [hidden email]; Masten, Matt <[hidden email]>; [hidden email]
Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
 
It may not be a full solution for the problems you're trying to solve, but I don't know why adding to include/llvm/CodeGen/RuntimeLibcalls.def is a problem in itself. Certainly, it's a mess that could be organized, especially so we're not repeating everything for each data type as we do right now.
 
So yes, I think that would allow us to remove the VecLib mappings because we are always waiting until codegen to make the translation from generic IR to target-specific libcall. Or is there some reason that the vectorizer needs to be aware of those libcalls?
 
On Mon, Jul 2, 2018 at 11:52 AM, Saito, Hideki <[hidden email]> wrote:
 
Venkat, we did not invent LLVM’s VecLib functionality. The original version of D19544 (https://reviews.llvm.org/D19544?id=55036) was indeed a separate pass to convert widened math lib to SVML.
Our preference for “vectorized sin()” is just widened sin(), that is to be lowered to a specific library call at a later point (either as IR to IR or in CodeGen). Matt tried to sell that idea and it didn’t go through.
Anyone else willing to work with us to try it again? In my opinion, however, this is a related but different topic from legalization issue.
 
Sanjay, I think what you are suggesting would work better if we don’t map math lib calls to VecLib. Otherwise, we’ll have too many RTLIB:VECLIB_ enums, one from each different math function multiplied by each vectorization factor --- for each different VecLib. That’s way too many. If that’s one per different math functions, I’d guess it’s 100+. Still a lot but manageable. This requires those functions to be listed in the intrinsics, right? That’s another reason some people favor VecLib mapping at vectorizer. Those math functions don’t have to be added to the intrinsics.
 
I don’t insist on IR to IR legalization. However, I’m also interested in being able to legalize OpenMP declare simd function calls (**). These are user functions and as such we have no ways to list them as intrinsics or have RTLIB: enums predefined. For each Target, vector function ABI defines how the parameters need to be passed and Legalizer should be implemented based on the ABI, w/o knowing the details of what the user function does. Math lib only solution doesn’t help legalization of OpenMP declare simd.
 
Thanks,
Hideki
 
--------------------------------
(**)
#pragma omp declare simd uniform(a), linear(i)
void foo(float *a, int i);
 

 
#pragma omp simd
for(i) {                   // this loop could be vectorized with VF that’s wider than widest available vector function for foo().
    …
    foo(a, i)
    …
}
 
From: Venkataramanan Kumar [mailto:[hidden email]]
Sent: Sunday, July 01, 2018 11:38 PM
To: Sanjay Patel <[hidden email]>
Cc: Saito, Hideki <[hidden email]>; [hidden email]; Masten, Matt <[hidden email]>; [hidden email]
Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
 
Adding to Ashutosh's comments,  We are also interested in making LLVM generate vector math library calls that are available with glibc (version > 2.22).
 
reference: https://sourceware.org/glibc/wiki/libmvec
 
Using the example case given in the reference, we found there are  2 vector versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx) version and _ZGVdN4v_sin (avx2) versions.  Following the SVML path adding new entry in VecDesc structure in TargetLibraryInfo.cpp,  we can generate the vector version.
 
But unable to decide which version to expand in the vectorizer. We needed the  TTI information (ISA ).  It looks like better to legalize or generate them later.
 
regards,
Venkat.
 
 
On 30 June 2018 at 04:04, Sanjay Patel via llvm-dev <[hidden email]> wrote:
Hi Hideki -
 
I hinted at this problem in the summary text of https://reviews.llvm.org/D47610:
Why are we transforming from LLVM intrinsics to platform-specific intrinsics in IR? I don't see the benefit.
 
I don't know if it solves all of the problems you're seeing, but it should be a small change to transform to the platform-specific SVML or other intrinsics in the DAG. We already do this for mathlib calls on Linux for example when we can use the finite versions of the calls. Have a look in SelectionDAGLegalize::ConvertNodeToLibcall():
 
    if (CanUseFiniteLibCall && DAG.getLibInfo().has(LibFunc_log_finite))
      Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_FINITE_F32,
                                        RTLIB::LOG_FINITE_F64,
                                        RTLIB::LOG_FINITE_F80,
                                        RTLIB::LOG_FINITE_F128,
                                        RTLIB::LOG_FINITE_PPCF128));
    else
      Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_F32, RTLIB::LOG_F64,
                                        RTLIB::LOG_F80, RTLIB::LOG_F128,
                                        RTLIB::LOG_PPCF128));
 
 
 
 
On Fri, Jun 29, 2018 at 2:15 PM, Saito, Hideki <[hidden email]> wrote:
 
Ashutosh,
 
Thanks for the repy.
 
Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there.
      https://reviews.llvm.org/D19544
There, I see Hal’s review comment “let’s start only with the directly-legal calls”. Apparently, what we have right now
in the trunk is “not legal enough”. I’ll work on the patch to stop bleeding while we continue to discuss legalization topic.
 
I suppose
1)      LV only solution (let LV emit already legalized VECLIB calls) is certainly not scalable. It won’t help if VECLIB calls
are generated elsewhere. Also, keeping VF low enough to prevent the legalization problem is only a workaround,
not a solution.
2)      Assuming that we have to go to IR to IR pass route, there are 3 ways to think:
a.       Go with very generic IR to IR legalization pass comparable to ISD level legalization. This is most general
but I’d think this is the highest cost for development.
b.      Go with Intrinsic-only legalization and then apply VECLIB afterwards. This requires all scalar functions
with VECLIB mapping to be added to intrinsic.
c.       Go with generic enough function call legalization, with the ability to add custom legalization for each VECLIB
(and if needed each VECLIB or non-VECLIB entry).
 
I think the cost of 2.b) and 2.c) are similar and 2.c) seems to be more flexible. So, I guess we don’t really have to tie this
discussion with “letting LV emit widened math call instead of VECLIB”, even though I strongly favor that than LV emitting
VECLIB calls.
 
@Davide, in D19544, @spatel thought LibCallSimplifier has relevance to this legalization topic. Do you know enough about
LibCallSimiplifer to tell whether it can be extended to deal with 2.b) or 2.c)?
 
If we think 2.b)/2.c) are right enough directions, I can clean up what we have and upload it to Phabricator as a starting point
to get to 2.b)/2.c).
 
Continue waiting for more feedback. I guess I shouldn’t expect a lot this week and next due to the big holiday in the U.S.
 
Thanks,
Hideki
 
From: Nema, Ashutosh [mailto:[hidden email]]
Sent: Thursday, June 28, 2018 11:37 PM
To: Saito, Hideki <[hidden email]>
Cc: [hidden email]
Subject: RE: [RFC][VECLIB] how should we legalize VECLIB calls?
 
Hi Saito,
 
At AMD we have our own version of vector library and faced similar problems, we followed the SVML path and from vectorizer generated the respective vector calls. When vectorizer generates the respective calls i.e __svml_sin_4 or __amdlibm_sin_4, later one can perform only string matching to identify the vector lib call. I’m not sure it’s the proper way, may be instead of generating respective calls it’s better to generate some standard call (may be intrinsics) and lower it later. A late IR pass can be introduced to perform lowering, this will lower the intrinsic calls to specific lib calls(__svml_sin_4 or __amdlibm_sin_4 or … ). This can be table driven to decide the action based on the vector library, function name, VF and target information, the action can be full-serialize, partial-serialize(VF8 to 2 VF4) or generate the lib call with same VF.
 
Thanks,
Ashutosh
 
From: llvm-dev [mailto:[hidden email]] On Behalf Of Saito, Hideki via llvm-dev
Sent: Friday, June 29, 2018 7:41 AM
To: 'Saito, Hideki via llvm-dev' <[hidden email]>
Subject: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
 
 
Illustrative Example:
 
clang -fveclib=SVML -O3 svml.c -mavx
 
#include <math.h>
void foo(double *a, int N){
  int i;
#pragma clang loop vectorize_width(8)
  for (i=0;i<N;i++){
    a[i] = sin(i);
  }
}
 
Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer.
This is 8-element SVML sin() called with 8-element argument. On the surface, this looks very good.
Later on, standard vector type legalization kicks-in but only the argument and return data are legalized.
        vmovaps %ymm0, %ymm1
        vcvtdq2pd       %xmm1, %ymm0
        vextractf128    $1, %ymm1, %xmm1
        vcvtdq2pd       %xmm1, %ymm1
        callq   __svml_sin8
        vmovups %ymm1, 32(%r15,%r12,8)
        vmovups %ymm0, (%r15,%r12,8)
Unfortunately, __svml_sin8() doesn’t use this form of input/output. It takes zmm0 and returns zmm0.
i.e., not legal to use for AVX.
 
What we need to see instead is two calls to __svml_sin4(), like below.
        vmovaps %ymm0, %ymm1
        vcvtdq2pd       %xmm1, %ymm0
        vextractf128    $1, %ymm1, %xmm1
        vcvtdq2pd       %xmm1, %ymm1
        callq   __svml_sin4
        vmovups %ymm0, 32(%r15,%r12,8)
        vmovups %ymm1, ymm0
        callq   __svml_sin4
        vmovups %ymm0, (%r15,%r12,8)
 
What would be the most acceptable way to make this happen? Anybody having had a similar need previously?
 
Easiest workaround is to serialize the call above “type legal” vectorization factor. This can be done with a few lines of code,
plus the code to recognize that the call is “SVML” (which is currently string match against “__svml” prefix in my local workspace).
If higher VF is not forced, cost model will likely favor lower VF. Functionally correct, but obviously not an ideal solution.
 
Here are a few ideas I thought about:
1)      Standard LegalizeVectorType() in CodeGen/SelectionDAG doesn’t seem to work. We could define a generic ISD::VECLIB
and try to split into two or more VECLIB nodes, but at that moment we lost the information about which function to call.
We can’t define ISD opcode per function. There will be too many libm entries to deal with. We need a scalable solution.
2)      We could write an IR to IR pass to perform IR level legalization. This is essentially duplicating the functionality of LegalizeVectorType()
but we can make this available for other similar things that can’t use ISD level vector type legalization. This looks to be attractive enough
from that perspective.
3)      We have implemented something similar to 2), but legalization code is specialized for SVML legalization. This was much quicker than
trying to generalize the legalization scheme, but I’d imagine community won’t like it.
4)      Vectorizer emit legalized VECLIB calls. Since it can emit instructions in scalarized form, adding legalized call functionality is in some sense
similar to that. Vectorizer can’t simply choose type legal function name with illegal vector ---- since LegalizeVectorType() will still
end up using one call instead of two.
 
Anything else?
 
Also, doing any of this requires reverse mapping from VECLIB name to scalar function name. What’s the most recommended way to do so?
Can we use TableGen to create a reverse map?
 
Your input is greatly appreciated. Is there a real need/desire for 2) outside of VECLIB (or outside of SVML)?
 
Thanks,
Hideki Saito
Intel Corporation
 
 
 

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
 
 


--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev
In reply to this post by Tom Stellard via llvm-dev
On Tue, 3 Jul 2018 at 00:35, Hal Finkel via llvm-dev
<[hidden email]> wrote:
> I completely agree. We need a solution to handle 'declare simd' calls, or to put it another way, arbitrary user-defined functions. To me, this really looks like an ABI issue. If we have a function, __foo__computeit8(<8 x float> %x), then if our lowering of <8 x float> doesn't match the required register assignments, then we have the wrong ABI. Will https://reviews.llvm.org/D47188 fix this?

FYI, Arm has published a draft of a similar proposal:

https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi

They directly mention OpenMP declare simd, but there's no reason we
can't apply something similar to Clang (and GCC) pragmas as well.

cheers,
--renato
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev

For reference, Intel's vector function ABI is located here.
 https://software.intel.com/sites/default/files/managed/b4/c8/Intel-Vector-Function-ABI.pdf

GCC (for x86) has a very similar but slightly different vector function ABI. Intel compiler has a flag to choose between the two (actually does more than two if someone cares about such details).

I see ARM inherited a lot from those, which makes implementers' life easier.


-----Original Message-----
From: Renato Golin [mailto:[hidden email]]
Sent: Tuesday, July 03, 2018 4:43 AM
To: Hal Finkel <[hidden email]>
Cc: Saito, Hideki <[hidden email]>; Sanjay Patel <[hidden email]>; Michael Zolotukhin <[hidden email]>; LLVM Dev <[hidden email]>; Davide Italiano <[hidden email]>; Masten, Matt <[hidden email]>
Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

On Tue, 3 Jul 2018 at 00:35, Hal Finkel via llvm-dev <[hidden email]> wrote:
> I completely agree. We need a solution to handle 'declare simd' calls, or to put it another way, arbitrary user-defined functions. To me, this really looks like an ABI issue. If we have a function, __foo__computeit8(<8 x float> %x), then if our lowering of <8 x float> doesn't match the required register assignments, then we have the wrong ABI. Will https://reviews.llvm.org/D47188 fix this?

FYI, Arm has published a draft of a similar proposal:

https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi

They directly mention OpenMP declare simd, but there's no reason we can't apply something similar to Clang (and GCC) pragmas as well.

cheers,
--renato
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev
In reply to this post by Tom Stellard via llvm-dev

On 07/02/2018 07:32 PM, Saito, Hideki wrote:

> Hal>To me, this really looks like an ABI issue.
>
> Being a vectorizer guy, I never thought of it in that way, but I can see why you say it in that way.
>
> Hal >Will https://reviews.llvm.org/D47188 fix this?
>
> This, I know the answer. It does not. Denis, the author of the patch, is one of those who asked us to resolve the SVML legalization issue.
> If you consider VecLib function name and also the TTI based availability of the entry as part of its ABI, you can think of the issue as ABI
> conformance transformation
> <4 x double> <4 x double> __svml_sin8(<4 x double> <4 x double>)
> ==>
> <4 x double> __svml_sin4(<4 x double>)
> <4 x double> __svml_sin4(<4 x double>)
>
> And the same could also be true for OpenMP declare SIMD. Do you think equivalent of this ugly thing is also okay?
> <8 x double> __svml_sin4(<8 x double>) <<< note the use of 4-element sin () over 8-elements
> ==>
> <4 x double> <4 x double> __svml_sin4(<4 x double> <4 x double>)
> ==>
> <4 x double> __svml_sin4(<4 x double>)
> <4 x double> __svml_sin4(<4 x double>)
> This is essentially what has to happen if declare simd says 4-way vector function is available, but LV wants to vectorize the caller loop in 8-way.
> Alternative is bump up the availability at LV (only for the name, not the cost) and then later let ABI say "no, only 4-way is available", fix this ABI.
> A little convoluted but it may still work.

No. I reread your original message, and I take back what I said. This is
not a proper ABI issue. Looking at this:

>         callq   __svml_sin8
>         vmovups %ymm1, 32(%r15,%r12,8)
>         vmovups %ymm0, (%r15,%r12,8)
> Unfortunately, __svml_sin8() doesn’t use this form of input/output. It takes zmm0 and returns zmm0.
> i.e., not legal to use for AVX.
>  
> What we need to see instead is two calls to __svml_sin4(), like below.

So __svml_sin8 is only for AVX-512 (with 512-bit vectors). For the
purpose of generating code for AVX[-2], it essentially doesn't exist.
Thus, it's not an ABI issue. We'll have the same problem if someone is
targeting AVX-512 and requests as VF of 16. There's no __svml_sin16 (I
presume), so we need to break this down into two calls to __svml_sin8
(plus whatever shuffles are necessary). The vectorizer should do this.
It should not generate calls to functions that don't exist.

Can't we just make the tables used by the vectorizer, where it knows
about available math-library calls, aware of the legal vector widths
based on enabled target features?

Thanks again,
Hal

>
> Everyone reasonably comfortable enough with this "deal with the issue as an ABI resolution" direction? We won't know whether this direction
> really works or not until we dig in deeper, but I think this direction should be explored before IR to IR legalization and also before trying to
> add bunch of math libs in the intrinsic table.
>
> Any other ideas?
>
> Thanks,
> Hideki
> -------------------------------------
> From: Hal Finkel [mailto:[hidden email]]
> Sent: Monday, July 02, 2018 3:59 PM
> To: Saito, Hideki <[hidden email]>; Sanjay Patel <[hidden email]>; [hidden email]
> Cc: Venkataramanan Kumar <[hidden email]>; [hidden email]; Masten, Matt <[hidden email]>; [hidden email]
> Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>
>
> On 07/02/2018 04:33 PM, Saito, Hideki wrote:
>  
>> It may not be a full solution for the problems you're trying to solve
>  
> If we are inventing a new solution, I’d like it also to solve OpenMP declare simd legalization issue. If a small extension of existing scheme
> works for mathlib only, I’m happy to take that and discuss OpenMP declare simd issue separately.
>
> I completely agree. We need a solution to handle 'declare simd' calls, or to put it another way, arbitrary user-defined functions. To me, this really looks like an ABI issue. If we have a function, __foo__computeit8(<8 x float> %x), then if our lowering of <8 x float> doesn't match the required register assignments, then we have the wrong ABI. Will https://reviews.llvm.org/D47188 fix this?
>
>  -Hal
>
>
>  
>> Or is there some reason that the vectorizer needs to be aware of those libcalls?
>  
> I’m a strong believer of CodeGen mapping (scalar and widened) mathlib calls to actual library (or inlined sequence).
> So, that question needs to be answered by someone else.
>  
> Adding Michael and Hal.
>  
>  
> From: Sanjay Patel [mailto:[hidden email]]
> Sent: Monday, July 02, 2018 11:49 AM
> To: Saito, Hideki <[hidden email]>
> Cc: Venkataramanan Kumar <[hidden email]>; [hidden email]; Masten, Matt <[hidden email]>; [hidden email]
> Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>  
> It may not be a full solution for the problems you're trying to solve, but I don't know why adding to include/llvm/CodeGen/RuntimeLibcalls.def is a problem in itself. Certainly, it's a mess that could be organized, especially so we're not repeating everything for each data type as we do right now.
>  
> So yes, I think that would allow us to remove the VecLib mappings because we are always waiting until codegen to make the translation from generic IR to target-specific libcall. Or is there some reason that the vectorizer needs to be aware of those libcalls?
>  
> On Mon, Jul 2, 2018 at 11:52 AM, Saito, Hideki <[hidden email]> wrote:
>  
> Venkat, we did not invent LLVM’s VecLib functionality. The original version of D19544 (https://reviews.llvm.org/D19544?id=55036) was indeed a separate pass to convert widened math lib to SVML.
> Our preference for “vectorized sin()” is just widened sin(), that is to be lowered to a specific library call at a later point (either as IR to IR or in CodeGen). Matt tried to sell that idea and it didn’t go through.
> Anyone else willing to work with us to try it again? In my opinion, however, this is a related but different topic from legalization issue.
>  
> Sanjay, I think what you are suggesting would work better if we don’t map math lib calls to VecLib. Otherwise, we’ll have too many RTLIB:VECLIB_ enums, one from each different math function multiplied by each vectorization factor --- for each different VecLib. That’s way too many. If that’s one per different math functions, I’d guess it’s 100+. Still a lot but manageable. This requires those functions to be listed in the intrinsics, right? That’s another reason some people favor VecLib mapping at vectorizer. Those math functions don’t have to be added to the intrinsics.
>  
> I don’t insist on IR to IR legalization. However, I’m also interested in being able to legalize OpenMP declare simd function calls (**). These are user functions and as such we have no ways to list them as intrinsics or have RTLIB: enums predefined. For each Target, vector function ABI defines how the parameters need to be passed and Legalizer should be implemented based on the ABI, w/o knowing the details of what the user function does. Math lib only solution doesn’t help legalization of OpenMP declare simd.
>  
> Thanks,
> Hideki
>  
> --------------------------------
> (**)
> #pragma omp declare simd uniform(a), linear(i)
> void foo(float *a, int i);
>  
> …
>  
> #pragma omp simd
> for(i) {                   // this loop could be vectorized with VF that’s wider than widest available vector function for foo().
>     …
>     foo(a, i)
>     …
> }
>  
> From: Venkataramanan Kumar [mailto:[hidden email]]
> Sent: Sunday, July 01, 2018 11:38 PM
> To: Sanjay Patel <[hidden email]>
> Cc: Saito, Hideki <[hidden email]>; [hidden email]; Masten, Matt <[hidden email]>; [hidden email]
> Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>  
> Adding to Ashutosh's comments,  We are also interested in making LLVM generate vector math library calls that are available with glibc (version > 2.22).
>  
> reference: https://sourceware.org/glibc/wiki/libmvec
>  
> Using the example case given in the reference, we found there are  2 vector versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx) version and _ZGVdN4v_sin (avx2) versions.  Following the SVML path adding new entry in VecDesc structure in TargetLibraryInfo.cpp,  we can generate the vector version.
>  
> But unable to decide which version to expand in the vectorizer. We needed the  TTI information (ISA ).  It looks like better to legalize or generate them later.
>  
> regards,
> Venkat.
>  
>  
> On 30 June 2018 at 04:04, Sanjay Patel via llvm-dev <[hidden email]> wrote:
> Hi Hideki -
>  
> I hinted at this problem in the summary text of https://reviews.llvm.org/D47610:
> Why are we transforming from LLVM intrinsics to platform-specific intrinsics in IR? I don't see the benefit.
>  
> I don't know if it solves all of the problems you're seeing, but it should be a small change to transform to the platform-specific SVML or other intrinsics in the DAG. We already do this for mathlib calls on Linux for example when we can use the finite versions of the calls. Have a look in SelectionDAGLegalize::ConvertNodeToLibcall():
>  
>     if (CanUseFiniteLibCall && DAG.getLibInfo().has(LibFunc_log_finite))
>       Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_FINITE_F32,
>                                         RTLIB::LOG_FINITE_F64,
>                                         RTLIB::LOG_FINITE_F80,
>                                         RTLIB::LOG_FINITE_F128,
>                                         RTLIB::LOG_FINITE_PPCF128));
>     else
>       Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_F32, RTLIB::LOG_F64,
>                                         RTLIB::LOG_F80, RTLIB::LOG_F128,
>                                         RTLIB::LOG_PPCF128));
>  
>  
>  
>  
> On Fri, Jun 29, 2018 at 2:15 PM, Saito, Hideki <[hidden email]> wrote:
>  
> Ashutosh,
>  
> Thanks for the repy.
>  
> Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there.
>       https://reviews.llvm.org/D19544
> There, I see Hal’s review comment “let’s start only with the directly-legal calls”. Apparently, what we have right now
> in the trunk is “not legal enough”. I’ll work on the patch to stop bleeding while we continue to discuss legalization topic.
>  
> I suppose
> 1)      LV only solution (let LV emit already legalized VECLIB calls) is certainly not scalable. It won’t help if VECLIB calls
> are generated elsewhere. Also, keeping VF low enough to prevent the legalization problem is only a workaround,
> not a solution.
> 2)      Assuming that we have to go to IR to IR pass route, there are 3 ways to think:
> a.       Go with very generic IR to IR legalization pass comparable to ISD level legalization. This is most general
> but I’d think this is the highest cost for development.
> b.      Go with Intrinsic-only legalization and then apply VECLIB afterwards. This requires all scalar functions
> with VECLIB mapping to be added to intrinsic.
> c.       Go with generic enough function call legalization, with the ability to add custom legalization for each VECLIB
> (and if needed each VECLIB or non-VECLIB entry).
>  
> I think the cost of 2.b) and 2.c) are similar and 2.c) seems to be more flexible. So, I guess we don’t really have to tie this
> discussion with “letting LV emit widened math call instead of VECLIB”, even though I strongly favor that than LV emitting
> VECLIB calls.
>  
> @Davide, in D19544, @spatel thought LibCallSimplifier has relevance to this legalization topic. Do you know enough about
> LibCallSimiplifer to tell whether it can be extended to deal with 2.b) or 2.c)?
>  
> If we think 2.b)/2.c) are right enough directions, I can clean up what we have and upload it to Phabricator as a starting point
> to get to 2.b)/2.c).
>  
> Continue waiting for more feedback. I guess I shouldn’t expect a lot this week and next due to the big holiday in the U.S.
>  
> Thanks,
> Hideki
>  
> From: Nema, Ashutosh [mailto:[hidden email]]
> Sent: Thursday, June 28, 2018 11:37 PM
> To: Saito, Hideki <[hidden email]>
> Cc: [hidden email]
> Subject: RE: [RFC][VECLIB] how should we legalize VECLIB calls?
>  
> Hi Saito,
>  
> At AMD we have our own version of vector library and faced similar problems, we followed the SVML path and from vectorizer generated the respective vector calls. When vectorizer generates the respective calls i.e __svml_sin_4 or __amdlibm_sin_4, later one can perform only string matching to identify the vector lib call. I’m not sure it’s the proper way, may be instead of generating respective calls it’s better to generate some standard call (may be intrinsics) and lower it later. A late IR pass can be introduced to perform lowering, this will lower the intrinsic calls to specific lib calls(__svml_sin_4 or __amdlibm_sin_4 or … ). This can be table driven to decide the action based on the vector library, function name, VF and target information, the action can be full-serialize, partial-serialize(VF8 to 2 VF4) or generate the lib call with same VF.
>  
> Thanks,
> Ashutosh
>  
> From: llvm-dev [mailto:[hidden email]] On Behalf Of Saito, Hideki via llvm-dev
> Sent: Friday, June 29, 2018 7:41 AM
> To: 'Saito, Hideki via llvm-dev' <[hidden email]>
> Subject: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>  
>  
> Illustrative Example:
>  
> clang -fveclib=SVML -O3 svml.c -mavx
>  
> #include <math.h>
> void foo(double *a, int N){
>   int i;
> #pragma clang loop vectorize_width(8)
>   for (i=0;i<N;i++){
>     a[i] = sin(i);
>   }
> }
>  
> Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer.
> This is 8-element SVML sin() called with 8-element argument. On the surface, this looks very good.
> Later on, standard vector type legalization kicks-in but only the argument and return data are legalized.
>         vmovaps %ymm0, %ymm1
>         vcvtdq2pd       %xmm1, %ymm0
>         vextractf128    $1, %ymm1, %xmm1
>         vcvtdq2pd       %xmm1, %ymm1
>         callq   __svml_sin8
>         vmovups %ymm1, 32(%r15,%r12,8)
>         vmovups %ymm0, (%r15,%r12,8)
> Unfortunately, __svml_sin8() doesn’t use this form of input/output. It takes zmm0 and returns zmm0.
> i.e., not legal to use for AVX.
>  
> What we need to see instead is two calls to __svml_sin4(), like below.
>         vmovaps %ymm0, %ymm1
>         vcvtdq2pd       %xmm1, %ymm0
>         vextractf128    $1, %ymm1, %xmm1
>         vcvtdq2pd       %xmm1, %ymm1
>         callq   __svml_sin4
>         vmovups %ymm0, 32(%r15,%r12,8)
>         vmovups %ymm1, ymm0
>         callq   __svml_sin4
>         vmovups %ymm0, (%r15,%r12,8)
>  
> What would be the most acceptable way to make this happen? Anybody having had a similar need previously?
>  
> Easiest workaround is to serialize the call above “type legal” vectorization factor. This can be done with a few lines of code,
> plus the code to recognize that the call is “SVML” (which is currently string match against “__svml” prefix in my local workspace).
> If higher VF is not forced, cost model will likely favor lower VF. Functionally correct, but obviously not an ideal solution.
>  
> Here are a few ideas I thought about:
> 1)      Standard LegalizeVectorType() in CodeGen/SelectionDAG doesn’t seem to work. We could define a generic ISD::VECLIB
> and try to split into two or more VECLIB nodes, but at that moment we lost the information about which function to call.
> We can’t define ISD opcode per function. There will be too many libm entries to deal with. We need a scalable solution.
> 2)      We could write an IR to IR pass to perform IR level legalization. This is essentially duplicating the functionality of LegalizeVectorType()
> but we can make this available for other similar things that can’t use ISD level vector type legalization. This looks to be attractive enough
> from that perspective.
> 3)      We have implemented something similar to 2), but legalization code is specialized for SVML legalization. This was much quicker than
> trying to generalize the legalization scheme, but I’d imagine community won’t like it.
> 4)      Vectorizer emit legalized VECLIB calls. Since it can emit instructions in scalarized form, adding legalized call functionality is in some sense
> similar to that. Vectorizer can’t simply choose type legal function name with illegal vector ---- since LegalizeVectorType() will still
> end up using one call instead of two.
>  
> Anything else?
>  
> Also, doing any of this requires reverse mapping from VECLIB name to scalar function name. What’s the most recommended way to do so?
> Can we use TableGen to create a reverse map?
>  
> Your input is greatly appreciated. Is there a real need/desire for 2) outside of VECLIB (or outside of SVML)?
>  
> Thanks,
> Hideki Saito
> Intel Corporation
>  
>  
>  
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>  
>  
>
>

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev
In reply to this post by Tom Stellard via llvm-dev

On 07/03/2018 01:38 AM, Gopalasubramanian, Ganesh wrote:
> Would like to know your thoughts of adding a VecLibInst Class whereby we capture the relevant details regarding a VecLib Call like
> 1. Name of the function called
> 2. The alignment info
> 3. Vector Factor
> 4. Other parameters.
>
> We do it for MemInst like memcpy and memset. Can't we take similar approach for VecLib calls as well?

How would we use this?

 -Hal

>
> -----Original Message-----
> From: llvm-dev [mailto:[hidden email]] On Behalf Of Saito, Hideki via llvm-dev
> Sent: Tuesday, July 3, 2018 6:03 AM
> To: Hal Finkel <[hidden email]>; Sanjay Patel <[hidden email]>; [hidden email]
> Cc: [hidden email]; [hidden email]; Masten, Matt <[hidden email]>
> Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>
>
> Hal>To me, this really looks like an ABI issue.
>
> Being a vectorizer guy, I never thought of it in that way, but I can see why you say it in that way.
>
> Hal >Will https://reviews.llvm.org/D47188 fix this?
>
> This, I know the answer. It does not. Denis, the author of the patch, is one of those who asked us to resolve the SVML legalization issue.
> If you consider VecLib function name and also the TTI based availability of the entry as part of its ABI, you can think of the issue as ABI conformance transformation
> <4 x double> <4 x double> __svml_sin8(<4 x double> <4 x double>) ==>
> <4 x double> __svml_sin4(<4 x double>)
> <4 x double> __svml_sin4(<4 x double>)
>
> And the same could also be true for OpenMP declare SIMD. Do you think equivalent of this ugly thing is also okay?
> <8 x double> __svml_sin4(<8 x double>) <<< note the use of 4-element sin () over 8-elements
> ==>
> <4 x double> <4 x double> __svml_sin4(<4 x double> <4 x double>) ==>
> <4 x double> __svml_sin4(<4 x double>)
> <4 x double> __svml_sin4(<4 x double>)
> This is essentially what has to happen if declare simd says 4-way vector function is available, but LV wants to vectorize the caller loop in 8-way.
> Alternative is bump up the availability at LV (only for the name, not the cost) and then later let ABI say "no, only 4-way is available", fix this ABI.
> A little convoluted but it may still work.
>
> Everyone reasonably comfortable enough with this "deal with the issue as an ABI resolution" direction? We won't know whether this direction really works or not until we dig in deeper, but I think this direction should be explored before IR to IR legalization and also before trying to add bunch of math libs in the intrinsic table.
>
> Any other ideas?
>
> Thanks,
> Hideki
> -------------------------------------
> From: Hal Finkel [mailto:[hidden email]]
> Sent: Monday, July 02, 2018 3:59 PM
> To: Saito, Hideki <[hidden email]>; Sanjay Patel <[hidden email]>; [hidden email]
> Cc: Venkataramanan Kumar <[hidden email]>; [hidden email]; Masten, Matt <[hidden email]>; [hidden email]
> Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>
>
> On 07/02/2018 04:33 PM, Saito, Hideki wrote:
>  
>> It may not be a full solution for the problems you're trying to solve
>  
> If we are inventing a new solution, I’d like it also to solve OpenMP declare simd legalization issue. If a small extension of existing scheme
> works for mathlib only, I’m happy to take that and discuss OpenMP declare simd issue separately.
>
> I completely agree. We need a solution to handle 'declare simd' calls, or to put it another way, arbitrary user-defined functions. To me, this really looks like an ABI issue. If we have a function, __foo__computeit8(<8 x float> %x), then if our lowering of <8 x float> doesn't match the required register assignments, then we have the wrong ABI. Will https://reviews.llvm.org/D47188 fix this?
>
>  -Hal
>
>
>  
>> Or is there some reason that the vectorizer needs to be aware of those libcalls?
>  
> I’m a strong believer of CodeGen mapping (scalar and widened) mathlib calls to actual library (or inlined sequence).
> So, that question needs to be answered by someone else.
>  
> Adding Michael and Hal.
>  
>  
> From: Sanjay Patel [mailto:[hidden email]]
> Sent: Monday, July 02, 2018 11:49 AM
> To: Saito, Hideki <[hidden email]>
> Cc: Venkataramanan Kumar <[hidden email]>; [hidden email]; Masten, Matt <[hidden email]>; [hidden email]
> Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>  
> It may not be a full solution for the problems you're trying to solve, but I don't know why adding to include/llvm/CodeGen/RuntimeLibcalls.def is a problem in itself. Certainly, it's a mess that could be organized, especially so we're not repeating everything for each data type as we do right now.
>  
> So yes, I think that would allow us to remove the VecLib mappings because we are always waiting until codegen to make the translation from generic IR to target-specific libcall. Or is there some reason that the vectorizer needs to be aware of those libcalls?
>  
> On Mon, Jul 2, 2018 at 11:52 AM, Saito, Hideki <[hidden email]> wrote:
>  
> Venkat, we did not invent LLVM’s VecLib functionality. The original version of D19544 (https://reviews.llvm.org/D19544?id=55036) was indeed a separate pass to convert widened math lib to SVML.
> Our preference for “vectorized sin()” is just widened sin(), that is to be lowered to a specific library call at a later point (either as IR to IR or in CodeGen). Matt tried to sell that idea and it didn’t go through.
> Anyone else willing to work with us to try it again? In my opinion, however, this is a related but different topic from legalization issue.
>  
> Sanjay, I think what you are suggesting would work better if we don’t map math lib calls to VecLib. Otherwise, we’ll have too many RTLIB:VECLIB_ enums, one from each different math function multiplied by each vectorization factor --- for each different VecLib. That’s way too many. If that’s one per different math functions, I’d guess it’s 100+. Still a lot but manageable. This requires those functions to be listed in the intrinsics, right? That’s another reason some people favor VecLib mapping at vectorizer. Those math functions don’t have to be added to the intrinsics.
>  
> I don’t insist on IR to IR legalization. However, I’m also interested in being able to legalize OpenMP declare simd function calls (**). These are user functions and as such we have no ways to list them as intrinsics or have RTLIB: enums predefined. For each Target, vector function ABI defines how the parameters need to be passed and Legalizer should be implemented based on the ABI, w/o knowing the details of what the user function does. Math lib only solution doesn’t help legalization of OpenMP declare simd.
>  
> Thanks,
> Hideki
>  
> --------------------------------
> (**)
> #pragma omp declare simd uniform(a), linear(i)
> void foo(float *a, int i);
>  
> …
>  
> #pragma omp simd
> for(i) {                   // this loop could be vectorized with VF that’s wider than widest available vector function for foo().
>     …
>     foo(a, i)
>     …
> }
>  
> From: Venkataramanan Kumar [mailto:[hidden email]]
> Sent: Sunday, July 01, 2018 11:38 PM
> To: Sanjay Patel <[hidden email]>
> Cc: Saito, Hideki <[hidden email]>; [hidden email]; Masten, Matt <[hidden email]>; [hidden email]
> Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>  
> Adding to Ashutosh's comments,  We are also interested in making LLVM generate vector math library calls that are available with glibc (version > 2.22).
>  
> reference: https://sourceware.org/glibc/wiki/libmvec
>  
> Using the example case given in the reference, we found there are  2 vector versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx) version and _ZGVdN4v_sin (avx2) versions.  Following the SVML path adding new entry in VecDesc structure in TargetLibraryInfo.cpp,  we can generate the vector version.
>  
> But unable to decide which version to expand in the vectorizer. We needed the  TTI information (ISA ).  It looks like better to legalize or generate them later.
>  
> regards,
> Venkat.
>  
>  
> On 30 June 2018 at 04:04, Sanjay Patel via llvm-dev <[hidden email]> wrote:
> Hi Hideki -
>  
> I hinted at this problem in the summary text of https://reviews.llvm.org/D47610:
> Why are we transforming from LLVM intrinsics to platform-specific intrinsics in IR? I don't see the benefit.
>  
> I don't know if it solves all of the problems you're seeing, but it should be a small change to transform to the platform-specific SVML or other intrinsics in the DAG. We already do this for mathlib calls on Linux for example when we can use the finite versions of the calls. Have a look in SelectionDAGLegalize::ConvertNodeToLibcall():
>  
>     if (CanUseFiniteLibCall && DAG.getLibInfo().has(LibFunc_log_finite))
>       Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_FINITE_F32,
>                                         RTLIB::LOG_FINITE_F64,
>                                         RTLIB::LOG_FINITE_F80,
>                                         RTLIB::LOG_FINITE_F128,
>                                         RTLIB::LOG_FINITE_PPCF128));
>     else
>       Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_F32, RTLIB::LOG_F64,
>                                         RTLIB::LOG_F80, RTLIB::LOG_F128,
>                                         RTLIB::LOG_PPCF128));
>  
>  
>  
>  
> On Fri, Jun 29, 2018 at 2:15 PM, Saito, Hideki <[hidden email]> wrote:
>  
> Ashutosh,
>  
> Thanks for the repy.
>  
> Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there.
>       https://reviews.llvm.org/D19544
> There, I see Hal’s review comment “let’s start only with the directly-legal calls”. Apparently, what we have right now
> in the trunk is “not legal enough”. I’ll work on the patch to stop bleeding while we continue to discuss legalization topic.
>  
> I suppose
> 1)      LV only solution (let LV emit already legalized VECLIB calls) is certainly not scalable. It won’t help if VECLIB calls
> are generated elsewhere. Also, keeping VF low enough to prevent the legalization problem is only a workaround,
> not a solution.
> 2)      Assuming that we have to go to IR to IR pass route, there are 3 ways to think:
> a.       Go with very generic IR to IR legalization pass comparable to ISD level legalization. This is most general
> but I’d think this is the highest cost for development.
> b.      Go with Intrinsic-only legalization and then apply VECLIB afterwards. This requires all scalar functions
> with VECLIB mapping to be added to intrinsic.
> c.       Go with generic enough function call legalization, with the ability to add custom legalization for each VECLIB
> (and if needed each VECLIB or non-VECLIB entry).
>  
> I think the cost of 2.b) and 2.c) are similar and 2.c) seems to be more flexible. So, I guess we don’t really have to tie this
> discussion with “letting LV emit widened math call instead of VECLIB”, even though I strongly favor that than LV emitting
> VECLIB calls.
>  
> @Davide, in D19544, @spatel thought LibCallSimplifier has relevance to this legalization topic. Do you know enough about
> LibCallSimiplifer to tell whether it can be extended to deal with 2.b) or 2.c)?
>  
> If we think 2.b)/2.c) are right enough directions, I can clean up what we have and upload it to Phabricator as a starting point
> to get to 2.b)/2.c).
>  
> Continue waiting for more feedback. I guess I shouldn’t expect a lot this week and next due to the big holiday in the U.S.
>  
> Thanks,
> Hideki
>  
> From: Nema, Ashutosh [mailto:[hidden email]]
> Sent: Thursday, June 28, 2018 11:37 PM
> To: Saito, Hideki <[hidden email]>
> Cc: [hidden email]
> Subject: RE: [RFC][VECLIB] how should we legalize VECLIB calls?
>  
> Hi Saito,
>  
> At AMD we have our own version of vector library and faced similar problems, we followed the SVML path and from vectorizer generated the respective vector calls. When vectorizer generates the respective calls i.e __svml_sin_4 or __amdlibm_sin_4, later one can perform only string matching to identify the vector lib call. I’m not sure it’s the proper way, may be instead of generating respective calls it’s better to generate some standard call (may be intrinsics) and lower it later. A late IR pass can be introduced to perform lowering, this will lower the intrinsic calls to specific lib calls(__svml_sin_4 or __amdlibm_sin_4 or … ). This can be table driven to decide the action based on the vector library, function name, VF and target information, the action can be full-serialize, partial-serialize(VF8 to 2 VF4) or generate the lib call with same VF.
>  
> Thanks,
> Ashutosh
>  
> From: llvm-dev [mailto:[hidden email]] On Behalf Of Saito, Hideki via llvm-dev
> Sent: Friday, June 29, 2018 7:41 AM
> To: 'Saito, Hideki via llvm-dev' <[hidden email]>
> Subject: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>  
>  
> Illustrative Example:
>  
> clang -fveclib=SVML -O3 svml.c -mavx
>  
> #include <math.h>
> void foo(double *a, int N){
>   int i;
> #pragma clang loop vectorize_width(8)
>   for (i=0;i<N;i++){
>     a[i] = sin(i);
>   }
> }
>  
> Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer.
> This is 8-element SVML sin() called with 8-element argument. On the surface, this looks very good.
> Later on, standard vector type legalization kicks-in but only the argument and return data are legalized.
>         vmovaps %ymm0, %ymm1
>         vcvtdq2pd       %xmm1, %ymm0
>         vextractf128    $1, %ymm1, %xmm1
>         vcvtdq2pd       %xmm1, %ymm1
>         callq   __svml_sin8
>         vmovups %ymm1, 32(%r15,%r12,8)
>         vmovups %ymm0, (%r15,%r12,8)
> Unfortunately, __svml_sin8() doesn’t use this form of input/output. It takes zmm0 and returns zmm0.
> i.e., not legal to use for AVX.
>  
> What we need to see instead is two calls to __svml_sin4(), like below.
>         vmovaps %ymm0, %ymm1
>         vcvtdq2pd       %xmm1, %ymm0
>         vextractf128    $1, %ymm1, %xmm1
>         vcvtdq2pd       %xmm1, %ymm1
>         callq   __svml_sin4
>         vmovups %ymm0, 32(%r15,%r12,8)
>         vmovups %ymm1, ymm0
>         callq   __svml_sin4
>         vmovups %ymm0, (%r15,%r12,8)
>  
> What would be the most acceptable way to make this happen? Anybody having had a similar need previously?
>  
> Easiest workaround is to serialize the call above “type legal” vectorization factor. This can be done with a few lines of code,
> plus the code to recognize that the call is “SVML” (which is currently string match against “__svml” prefix in my local workspace).
> If higher VF is not forced, cost model will likely favor lower VF. Functionally correct, but obviously not an ideal solution.
>  
> Here are a few ideas I thought about:
> 1)      Standard LegalizeVectorType() in CodeGen/SelectionDAG doesn’t seem to work. We could define a generic ISD::VECLIB
> and try to split into two or more VECLIB nodes, but at that moment we lost the information about which function to call.
> We can’t define ISD opcode per function. There will be too many libm entries to deal with. We need a scalable solution.
> 2)      We could write an IR to IR pass to perform IR level legalization. This is essentially duplicating the functionality of LegalizeVectorType()
> but we can make this available for other similar things that can’t use ISD level vector type legalization. This looks to be attractive enough
> from that perspective.
> 3)      We have implemented something similar to 2), but legalization code is specialized for SVML legalization. This was much quicker than
> trying to generalize the legalization scheme, but I’d imagine community won’t like it.
> 4)      Vectorizer emit legalized VECLIB calls. Since it can emit instructions in scalarized form, adding legalized call functionality is in some sense
> similar to that. Vectorizer can’t simply choose type legal function name with illegal vector ---- since LegalizeVectorType() will still
> end up using one call instead of two.
>  
> Anything else?
>  
> Also, doing any of this requires reverse mapping from VECLIB name to scalar function name. What’s the most recommended way to do so?
> Can we use TableGen to create a reverse map?
>  
> Your input is greatly appreciated. Is there a real need/desire for 2) outside of VECLIB (or outside of SVML)?
>  
> Thanks,
> Hideki Saito
> Intel Corporation
>  
>  
>  
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>  
>  
>
>

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev
In reply to this post by Tom Stellard via llvm-dev
+ llvm-dev

-----Original Message-----
From: Nema, Ashutosh
Sent: Wednesday, July 4, 2018 12:12 PM
To: Hal Finkel <[hidden email]>; Saito, Hideki <[hidden email]>; Sanjay Patel <[hidden email]>; [hidden email]
Cc: [hidden email]; Masten, Matt <[hidden email]>
Subject: RE: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Hi Hal,

> __svml_sin8 (plus whatever shuffles are necessary).
> The vectorizer should do this.
> It should not generate calls to functions that don't exist.

I'm not sure how vectorizer will do this, consider the case where "-vectorizer-maximize-bandwidth" option is enabled and vectorizer is forced to generate the wider VF, and hence it may generate a call to __svml_sin_* which may not exist.

Are you expecting the vectorizer to lower the calls i.e. __svml_sin_8 to two __svml_sin_4 calls ?

Regards,
Ashutosh

-----Original Message-----
From: llvm-dev [mailto:[hidden email]] On Behalf Of Hal Finkel via llvm-dev
Sent: Wednesday, July 4, 2018 6:40 AM
To: Saito, Hideki <[hidden email]>; Sanjay Patel <[hidden email]>; [hidden email]
Cc: [hidden email]; [hidden email]; Masten, Matt <[hidden email]>
Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?


On 07/02/2018 07:32 PM, Saito, Hideki wrote:

> Hal>To me, this really looks like an ABI issue.
>
> Being a vectorizer guy, I never thought of it in that way, but I can see why you say it in that way.
>
> Hal >Will https://reviews.llvm.org/D47188 fix this?
>
> This, I know the answer. It does not. Denis, the author of the patch, is one of those who asked us to resolve the SVML legalization issue.
> If you consider VecLib function name and also the TTI based
> availability of the entry as part of its ABI, you can think of the issue as ABI conformance transformation
> <4 x double> <4 x double> __svml_sin8(<4 x double> <4 x double>) ==>
> <4 x double> __svml_sin4(<4 x double>)
> <4 x double> __svml_sin4(<4 x double>)
>
> And the same could also be true for OpenMP declare SIMD. Do you think equivalent of this ugly thing is also okay?
> <8 x double> __svml_sin4(<8 x double>) <<< note the use of 4-element sin () over 8-elements
> ==>
> <4 x double> <4 x double> __svml_sin4(<4 x double> <4 x double>) ==>
> <4 x double> __svml_sin4(<4 x double>)
> <4 x double> __svml_sin4(<4 x double>) This is essentially what has
> to happen if declare simd says 4-way vector function is available, but LV wants to vectorize the caller loop in 8-way.
> Alternative is bump up the availability at LV (only for the name, not the cost) and then later let ABI say "no, only 4-way is available", fix this ABI.
> A little convoluted but it may still work.

No. I reread your original message, and I take back what I said. This is not a proper ABI issue. Looking at this:

>         callq   __svml_sin8
>         vmovups %ymm1, 32(%r15,%r12,8)
>         vmovups %ymm0, (%r15,%r12,8)
> Unfortunately, __svml_sin8() doesn’t use this form of input/output. It takes zmm0 and returns zmm0.
> i.e., not legal to use for AVX.
>  
> What we need to see instead is two calls to __svml_sin4(), like below.

So __svml_sin8 is only for AVX-512 (with 512-bit vectors). For the purpose of generating code for AVX[-2], it essentially doesn't exist.
Thus, it's not an ABI issue. We'll have the same problem if someone is targeting AVX-512 and requests as VF of 16. There's no __svml_sin16 (I presume), so we need to break this down into two calls to __svml_sin8 (plus whatever shuffles are necessary). The vectorizer should do this.
It should not generate calls to functions that don't exist.

Can't we just make the tables used by the vectorizer, where it knows about available math-library calls, aware of the legal vector widths based on enabled target features?

Thanks again,
Hal

>
> Everyone reasonably comfortable enough with this "deal with the issue
> as an ABI resolution" direction? We won't know whether this direction
> really works or not until we dig in deeper, but I think this direction should be explored before IR to IR legalization and also before trying to add bunch of math libs in the intrinsic table.
>
> Any other ideas?
>
> Thanks,
> Hideki
> -------------------------------------
> From: Hal Finkel [mailto:[hidden email]]
> Sent: Monday, July 02, 2018 3:59 PM
> To: Saito, Hideki <[hidden email]>; Sanjay Patel
> <[hidden email]>; [hidden email]
> Cc: Venkataramanan Kumar <[hidden email]>;
> [hidden email]; Masten, Matt <[hidden email]>;
> [hidden email]
> Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>
>
> On 07/02/2018 04:33 PM, Saito, Hideki wrote:
>  
>> It may not be a full solution for the problems you're trying to solve
>  
> If we are inventing a new solution, I’d like it also to solve OpenMP
> declare simd legalization issue. If a small extension of existing scheme works for mathlib only, I’m happy to take that and discuss OpenMP declare simd issue separately.
>
> I completely agree. We need a solution to handle 'declare simd' calls, or to put it another way, arbitrary user-defined functions. To me, this really looks like an ABI issue. If we have a function, __foo__computeit8(<8 x float> %x), then if our lowering of <8 x float> doesn't match the required register assignments, then we have the wrong ABI. Will https://reviews.llvm.org/D47188 fix this?
>
>  -Hal
>
>
>  
>> Or is there some reason that the vectorizer needs to be aware of those libcalls?
>  
> I’m a strong believer of CodeGen mapping (scalar and widened) mathlib calls to actual library (or inlined sequence).
> So, that question needs to be answered by someone else.
>  
> Adding Michael and Hal.
>  
>  
> From: Sanjay Patel [mailto:[hidden email]]
> Sent: Monday, July 02, 2018 11:49 AM
> To: Saito, Hideki <[hidden email]>
> Cc: Venkataramanan Kumar <[hidden email]>;
> [hidden email]; Masten, Matt <[hidden email]>;
> [hidden email]
> Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>  
> It may not be a full solution for the problems you're trying to solve, but I don't know why adding to include/llvm/CodeGen/RuntimeLibcalls.def is a problem in itself. Certainly, it's a mess that could be organized, especially so we're not repeating everything for each data type as we do right now.
>  
> So yes, I think that would allow us to remove the VecLib mappings because we are always waiting until codegen to make the translation from generic IR to target-specific libcall. Or is there some reason that the vectorizer needs to be aware of those libcalls?
>  
> On Mon, Jul 2, 2018 at 11:52 AM, Saito, Hideki <[hidden email]> wrote:
>  
> Venkat, we did not invent LLVM’s VecLib functionality. The original version of D19544 (https://reviews.llvm.org/D19544?id=55036) was indeed a separate pass to convert widened math lib to SVML.
> Our preference for “vectorized sin()” is just widened sin(), that is to be lowered to a specific library call at a later point (either as IR to IR or in CodeGen). Matt tried to sell that idea and it didn’t go through.
> Anyone else willing to work with us to try it again? In my opinion, however, this is a related but different topic from legalization issue.
>  
> Sanjay, I think what you are suggesting would work better if we don’t map math lib calls to VecLib. Otherwise, we’ll have too many RTLIB:VECLIB_ enums, one from each different math function multiplied by each vectorization factor --- for each different VecLib. That’s way too many. If that’s one per different math functions, I’d guess it’s 100+. Still a lot but manageable. This requires those functions to be listed in the intrinsics, right? That’s another reason some people favor VecLib mapping at vectorizer. Those math functions don’t have to be added to the intrinsics.
>  
> I don’t insist on IR to IR legalization. However, I’m also interested in being able to legalize OpenMP declare simd function calls (**). These are user functions and as such we have no ways to list them as intrinsics or have RTLIB: enums predefined. For each Target, vector function ABI defines how the parameters need to be passed and Legalizer should be implemented based on the ABI, w/o knowing the details of what the user function does. Math lib only solution doesn’t help legalization of OpenMP declare simd.
>  
> Thanks,
> Hideki
>  
> --------------------------------
> (**)
> #pragma omp declare simd uniform(a), linear(i) void foo(float *a, int
> i);
>  
> …
>  
> #pragma omp simd
> for(i) {                   // this loop could be vectorized with VF that’s wider than widest available vector function for foo().
>     …
>     foo(a, i)
>     …
> }
>  
> From: Venkataramanan Kumar
> [mailto:[hidden email]]
> Sent: Sunday, July 01, 2018 11:38 PM
> To: Sanjay Patel <[hidden email]>
> Cc: Saito, Hideki <[hidden email]>; [hidden email];
> Masten, Matt <[hidden email]>; [hidden email]
> Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>  
> Adding to Ashutosh's comments,  We are also interested in making LLVM generate vector math library calls that are available with glibc (version > 2.22).
>  
> reference: https://sourceware.org/glibc/wiki/libmvec
>  
> Using the example case given in the reference, we found there are  2 vector versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx) version and _ZGVdN4v_sin (avx2) versions.  Following the SVML path adding new entry in VecDesc structure in TargetLibraryInfo.cpp,  we can generate the vector version.
>  
> But unable to decide which version to expand in the vectorizer. We needed the  TTI information (ISA ).  It looks like better to legalize or generate them later.
>  
> regards,
> Venkat.
>  
>  
> On 30 June 2018 at 04:04, Sanjay Patel via llvm-dev <[hidden email]> wrote:
> Hi Hideki -
>  
> I hinted at this problem in the summary text of https://reviews.llvm.org/D47610:
> Why are we transforming from LLVM intrinsics to platform-specific intrinsics in IR? I don't see the benefit.
>  
> I don't know if it solves all of the problems you're seeing, but it should be a small change to transform to the platform-specific SVML or other intrinsics in the DAG. We already do this for mathlib calls on Linux for example when we can use the finite versions of the calls. Have a look in SelectionDAGLegalize::ConvertNodeToLibcall():
>  
>     if (CanUseFiniteLibCall &&
> DAG.getLibInfo().has(LibFunc_log_finite))
>       Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_FINITE_F32,
>                                         RTLIB::LOG_FINITE_F64,
>                                         RTLIB::LOG_FINITE_F80,
>                                         RTLIB::LOG_FINITE_F128,
>                                         RTLIB::LOG_FINITE_PPCF128));
>     else
>       Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_F32,
> RTLIB::LOG_F64,
>                                         RTLIB::LOG_F80,
> RTLIB::LOG_F128,
>                                         RTLIB::LOG_PPCF128));
>  
>  
>  
>  
> On Fri, Jun 29, 2018 at 2:15 PM, Saito, Hideki <[hidden email]> wrote:
>  
> Ashutosh,
>  
> Thanks for the repy.
>  
> Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there.
>       https://reviews.llvm.org/D19544
> There, I see Hal’s review comment “let’s start only with the
> directly-legal calls”. Apparently, what we have right now in the trunk is “not legal enough”. I’ll work on the patch to stop bleeding while we continue to discuss legalization topic.
>  
> I suppose
> 1)      LV only solution (let LV emit already legalized VECLIB calls)
> is certainly not scalable. It won’t help if VECLIB calls are generated
> elsewhere. Also, keeping VF low enough to prevent the legalization problem is only a workaround, not a solution.
> 2)      Assuming that we have to go to IR to IR pass route, there are 3 ways to think:
> a.       Go with very generic IR to IR legalization pass comparable to
> ISD level legalization. This is most general but I’d think this is the highest cost for development.
> b.      Go with Intrinsic-only legalization and then apply VECLIB
> afterwards. This requires all scalar functions with VECLIB mapping to be added to intrinsic.
> c.       Go with generic enough function call legalization, with the
> ability to add custom legalization for each VECLIB (and if needed each VECLIB or non-VECLIB entry).
>  
> I think the cost of 2.b) and 2.c) are similar and 2.c) seems to be
> more flexible. So, I guess we don’t really have to tie this discussion
> with “letting LV emit widened math call instead of VECLIB”, even though I strongly favor that than LV emitting VECLIB calls.
>  
> @Davide, in D19544, @spatel thought LibCallSimplifier has relevance to
> this legalization topic. Do you know enough about LibCallSimiplifer to tell whether it can be extended to deal with 2.b) or 2.c)?
>  
> If we think 2.b)/2.c) are right enough directions, I can clean up what
> we have and upload it to Phabricator as a starting point to get to 2.b)/2.c).
>  
> Continue waiting for more feedback. I guess I shouldn’t expect a lot this week and next due to the big holiday in the U.S.
>  
> Thanks,
> Hideki
>  
> From: Nema, Ashutosh [mailto:[hidden email]]
> Sent: Thursday, June 28, 2018 11:37 PM
> To: Saito, Hideki <[hidden email]>
> Cc: [hidden email]
> Subject: RE: [RFC][VECLIB] how should we legalize VECLIB calls?
>  
> Hi Saito,
>  
> At AMD we have our own version of vector library and faced similar problems, we followed the SVML path and from vectorizer generated the respective vector calls. When vectorizer generates the respective calls i.e __svml_sin_4 or __amdlibm_sin_4, later one can perform only string matching to identify the vector lib call. I’m not sure it’s the proper way, may be instead of generating respective calls it’s better to generate some standard call (may be intrinsics) and lower it later. A late IR pass can be introduced to perform lowering, this will lower the intrinsic calls to specific lib calls(__svml_sin_4 or __amdlibm_sin_4 or … ). This can be table driven to decide the action based on the vector library, function name, VF and target information, the action can be full-serialize, partial-serialize(VF8 to 2 VF4) or generate the lib call with same VF.
>  
> Thanks,
> Ashutosh
>  
> From: llvm-dev [mailto:[hidden email]] On Behalf Of
> Saito, Hideki via llvm-dev
> Sent: Friday, June 29, 2018 7:41 AM
> To: 'Saito, Hideki via llvm-dev' <[hidden email]>
> Subject: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>  
>  
> Illustrative Example:
>  
> clang -fveclib=SVML -O3 svml.c -mavx
>  
> #include <math.h>
> void foo(double *a, int N){
>   int i;
> #pragma clang loop vectorize_width(8)
>   for (i=0;i<N;i++){
>     a[i] = sin(i);
>   }
> }
>  
> Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer.
> This is 8-element SVML sin() called with 8-element argument. On the surface, this looks very good.
> Later on, standard vector type legalization kicks-in but only the argument and return data are legalized.
>         vmovaps %ymm0, %ymm1
>         vcvtdq2pd       %xmm1, %ymm0
>         vextractf128    $1, %ymm1, %xmm1
>         vcvtdq2pd       %xmm1, %ymm1
>         callq   __svml_sin8
>         vmovups %ymm1, 32(%r15,%r12,8)
>         vmovups %ymm0, (%r15,%r12,8)
> Unfortunately, __svml_sin8() doesn’t use this form of input/output. It takes zmm0 and returns zmm0.
> i.e., not legal to use for AVX.
>  
> What we need to see instead is two calls to __svml_sin4(), like below.
>         vmovaps %ymm0, %ymm1
>         vcvtdq2pd       %xmm1, %ymm0
>         vextractf128    $1, %ymm1, %xmm1
>         vcvtdq2pd       %xmm1, %ymm1
>         callq   __svml_sin4
>         vmovups %ymm0, 32(%r15,%r12,8)
>         vmovups %ymm1, ymm0
>         callq   __svml_sin4
>         vmovups %ymm0, (%r15,%r12,8)
>  
> What would be the most acceptable way to make this happen? Anybody having had a similar need previously?
>  
> Easiest workaround is to serialize the call above “type legal”
> vectorization factor. This can be done with a few lines of code, plus the code to recognize that the call is “SVML” (which is currently string match against “__svml” prefix in my local workspace).
> If higher VF is not forced, cost model will likely favor lower VF. Functionally correct, but obviously not an ideal solution.
>  
> Here are a few ideas I thought about:
> 1)      Standard LegalizeVectorType() in CodeGen/SelectionDAG doesn’t
> seem to work. We could define a generic ISD::VECLIB and try to split into two or more VECLIB nodes, but at that moment we lost the information about which function to call.
> We can’t define ISD opcode per function. There will be too many libm entries to deal with. We need a scalable solution.
> 2)      We could write an IR to IR pass to perform IR level
> legalization. This is essentially duplicating the functionality of
> LegalizeVectorType() but we can make this available for other similar things that can’t use ISD level vector type legalization. This looks to be attractive enough from that perspective.
> 3)      We have implemented something similar to 2), but legalization
> code is specialized for SVML legalization. This was much quicker than trying to generalize the legalization scheme, but I’d imagine community won’t like it.
> 4)      Vectorizer emit legalized VECLIB calls. Since it can emit
> instructions in scalarized form, adding legalized call functionality
> is in some sense similar to that. Vectorizer can’t simply choose type legal function name with illegal vector ---- since LegalizeVectorType() will still end up using one call instead of two.
>  
> Anything else?
>  
> Also, doing any of this requires reverse mapping from VECLIB name to scalar function name. What’s the most recommended way to do so?
> Can we use TableGen to create a reverse map?
>  
> Your input is greatly appreciated. Is there a real need/desire for 2) outside of VECLIB (or outside of SVML)?
>  
> Thanks,
> Hideki Saito
> Intel Corporation
>  
>  
>  
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>  
>  
>
>

--
Hal Finkel
Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev
Hi,


On 07/04/2018 08:42 AM, Nema, Ashutosh via llvm-dev wrote:

> + llvm-dev
>
> -----Original Message-----
> From: Nema, Ashutosh
> Sent: Wednesday, July 4, 2018 12:12 PM
> To: Hal Finkel <[hidden email]>; Saito, Hideki <[hidden email]>; Sanjay Patel <[hidden email]>; [hidden email]
> Cc: [hidden email]; Masten, Matt <[hidden email]>
> Subject: RE: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>
> Hi Hal,
>
>> __svml_sin8 (plus whatever shuffles are necessary).
>> The vectorizer should do this.
>> It should not generate calls to functions that don't exist.
> I'm not sure how vectorizer will do this, consider the case where "-vectorizer-maximize-bandwidth" option is enabled and vectorizer is forced to generate the wider VF, and hence it may generate a call to __svml_sin_* which may not exist.
>
> Are you expecting the vectorizer to lower the calls i.e. __svml_sin_8 to two __svml_sin_4 calls ?
>
> Regards,
> Ashutosh
If RV can't find <LIB>_sin16, it will start looking for <LIB>_sin8.

>
> -----Original Message-----
> From: llvm-dev [mailto:[hidden email]] On Behalf Of Hal Finkel via llvm-dev
> Sent: Wednesday, July 4, 2018 6:40 AM
> To: Saito, Hideki <[hidden email]>; Sanjay Patel <[hidden email]>; [hidden email]
> Cc: [hidden email]; [hidden email]; Masten, Matt <[hidden email]>
> Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>
>
> On 07/02/2018 07:32 PM, Saito, Hideki wrote:
>> Hal>To me, this really looks like an ABI issue.
>>
>> Being a vectorizer guy, I never thought of it in that way, but I can see why you say it in that way.
>>
>> Hal >Will https://reviews.llvm.org/D47188 fix this?
>>
>> This, I know the answer. It does not. Denis, the author of the patch, is one of those who asked us to resolve the SVML legalization issue.
>> If you consider VecLib function name and also the TTI based
>> availability of the entry as part of its ABI, you can think of the issue as ABI conformance transformation
>> <4 x double> <4 x double> __svml_sin8(<4 x double> <4 x double>) ==>
>> <4 x double> __svml_sin4(<4 x double>)
>> <4 x double> __svml_sin4(<4 x double>)
>>
>> And the same could also be true for OpenMP declare SIMD. Do you think equivalent of this ugly thing is also okay?
>> <8 x double> __svml_sin4(<8 x double>) <<< note the use of 4-element sin () over 8-elements
>> ==>
>> <4 x double> <4 x double> __svml_sin4(<4 x double> <4 x double>) ==>
>> <4 x double> __svml_sin4(<4 x double>)
>> <4 x double> __svml_sin4(<4 x double>) This is essentially what has
>> to happen if declare simd says 4-way vector function is available, but LV wants to vectorize the caller loop in 8-way.
>> Alternative is bump up the availability at LV (only for the name, not the cost) and then later let ABI say "no, only 4-way is available", fix this ABI.
>> A little convoluted but it may still work.
> No. I reread your original message, and I take back what I said. This is not a proper ABI issue. Looking at this:
>
>>          callq   __svml_sin8
>>          vmovups %ymm1, 32(%r15,%r12,8)
>>          vmovups %ymm0, (%r15,%r12,8)
>> Unfortunately, __svml_sin8() doesn’t use this form of input/output. It takes zmm0 and returns zmm0.
>> i.e., not legal to use for AVX.
>>  
>> What we need to see instead is two calls to __svml_sin4(), like below.
> So __svml_sin8 is only for AVX-512 (with 512-bit vectors). For the purpose of generating code for AVX[-2], it essentially doesn't exist.
> Thus, it's not an ABI issue. We'll have the same problem if someone is targeting AVX-512 and requests as VF of 16. There's no __svml_sin16 (I presume), so we need to break this down into two calls to __svml_sin8 (plus whatever shuffles are necessary). The vectorizer should do this.
> It should not generate calls to functions that don't exist.
>
> Can't we just make the tables used by the vectorizer, where it knows about available math-library calls, aware of the legal vector widths based on enabled target features?
> Thanks again,
> Hal
>
>> Everyone reasonably comfortable enough with this "deal with the issue
>> as an ABI resolution" direction? We won't know whether this direction
>> really works or not until we dig in deeper, but I think this direction should be explored before IR to IR legalization and also before trying to add bunch of math libs in the intrinsic table.
>>
>> Any other ideas?
Function mappings in RV look as follows
(https://github.com/cdl-saarland/rv/blob/develop/include/rv/PlatformInfo.h):
1. argument shapes (per argument, whether it is uniform/varying/"linear"
and the argument's alignment)
2. position of the mask argument (if any)
3. the shape of the returned value (same as for the arguments)
4. the vector width

You will benefit from the shape information as soon as LLVM/VPlan gets a
proper divergence analysis (what is the result shape given the parameter
shapes at a call site?). The shape-based mapping also abstracts away
from target specific mangling schemes for vectorized functions (OpenMP
4.x omp declare simd).

However, we have moved away from letting RV handle a complete list of
mappings directly.

RV vectorizes scalar math functions on the fly if no target specific
mapping is available. This means there would have be one table entry for
every combination of argument shapes/mask positions, which does not scale.

Instead there is a lazy interface (PlatformInfo::getResolver) that takes
in the scalar function name, the argument shapes and whether there is a
non-uniform predicate at the call site. We currently return just one
possible mapping per query but you could also generate a list of
possible mappings and let the vectorizer decide for itself, from this
tailored list, which mapping to use.

This approach will scale not just to math functions.
Behind the curtains, a call to ::getResolver works through a chain of
ResolverServices that can raise their hand if they could provide a
vector implementation for the scalar function.

The first in the chain will check whether this is a math function and if
it should use a VECLIB call (RV does this for SLEEF, the vectorized
functions are actually linked in immediately). Since we are not tied to
a static VECLIB table, we actually allow users to provide an ULP error
bound on the math functions. The SLEEF resolver will only consider
functions that are within that bound
(https://github.com/cdl-saarland/rv/blob/develop/include/rv/sleefLibrary.h).

Further down the chain, you have a resolver that checks whether the
scalar callee is defined in the module and if so, whether it can invoke
whole-function vectorization recursively on the callee (again, given the
precise argument shapes, we will get a precise return value shape). Atm,
we only do this to vectorize and inline scalar SLEEF functions but it is
trivial to do that on the same module.

Thanks,
Simon

>> Thanks,
>> Hideki
>> -------------------------------------
>> From: Hal Finkel [mailto:[hidden email]]
>> Sent: Monday, July 02, 2018 3:59 PM
>> To: Saito, Hideki <[hidden email]>; Sanjay Patel
>> <[hidden email]>; [hidden email]
>> Cc: Venkataramanan Kumar <[hidden email]>;
>> [hidden email]; Masten, Matt <[hidden email]>;
>> [hidden email]
>> Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>>
>>
>> On 07/02/2018 04:33 PM, Saito, Hideki wrote:
>>  
>>> It may not be a full solution for the problems you're trying to solve
>>  
>> If we are inventing a new solution, I’d like it also to solve OpenMP
>> declare simd legalization issue. If a small extension of existing scheme works for mathlib only, I’m happy to take that and discuss OpenMP declare simd issue separately.
>>
>> I completely agree. We need a solution to handle 'declare simd' calls, or to put it another way, arbitrary user-defined functions. To me, this really looks like an ABI issue. If we have a function, __foo__computeit8(<8 x float> %x), then if our lowering of <8 x float> doesn't match the required register assignments, then we have the wrong ABI. Will https://reviews.llvm.org/D47188 fix this?
>>
>>   -Hal
>>
>>
>>  
>>> Or is there some reason that the vectorizer needs to be aware of those libcalls?
>>  
>> I’m a strong believer of CodeGen mapping (scalar and widened) mathlib calls to actual library (or inlined sequence).
>> So, that question needs to be answered by someone else.
>>  
>> Adding Michael and Hal.
>>  
>>  
>> From: Sanjay Patel [mailto:[hidden email]]
>> Sent: Monday, July 02, 2018 11:49 AM
>> To: Saito, Hideki <[hidden email]>
>> Cc: Venkataramanan Kumar <[hidden email]>;
>> [hidden email]; Masten, Matt <[hidden email]>;
>> [hidden email]
>> Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>>  
>> It may not be a full solution for the problems you're trying to solve, but I don't know why adding to include/llvm/CodeGen/RuntimeLibcalls.def is a problem in itself. Certainly, it's a mess that could be organized, especially so we're not repeating everything for each data type as we do right now.
>>  
>> So yes, I think that would allow us to remove the VecLib mappings because we are always waiting until codegen to make the translation from generic IR to target-specific libcall. Or is there some reason that the vectorizer needs to be aware of those libcalls?
>>  
>> On Mon, Jul 2, 2018 at 11:52 AM, Saito, Hideki <[hidden email]> wrote:
>>  
>> Venkat, we did not invent LLVM’s VecLib functionality. The original version of D19544 (https://reviews.llvm.org/D19544?id=55036) was indeed a separate pass to convert widened math lib to SVML.
>> Our preference for “vectorized sin()” is just widened sin(), that is to be lowered to a specific library call at a later point (either as IR to IR or in CodeGen). Matt tried to sell that idea and it didn’t go through.
>> Anyone else willing to work with us to try it again? In my opinion, however, this is a related but different topic from legalization issue.
>>  
>> Sanjay, I think what you are suggesting would work better if we don’t map math lib calls to VecLib. Otherwise, we’ll have too many RTLIB:VECLIB_ enums, one from each different math function multiplied by each vectorization factor --- for each different VecLib. That’s way too many. If that’s one per different math functions, I’d guess it’s 100+. Still a lot but manageable. This requires those functions to be listed in the intrinsics, right? That’s another reason some people favor VecLib mapping at vectorizer. Those math functions don’t have to be added to the intrinsics.
>>  
>> I don’t insist on IR to IR legalization. However, I’m also interested in being able to legalize OpenMP declare simd function calls (**). These are user functions and as such we have no ways to list them as intrinsics or have RTLIB: enums predefined. For each Target, vector function ABI defines how the parameters need to be passed and Legalizer should be implemented based on the ABI, w/o knowing the details of what the user function does. Math lib only solution doesn’t help legalization of OpenMP declare simd.
>>  
>> Thanks,
>> Hideki
>>  
>> --------------------------------
>> (**)
>> #pragma omp declare simd uniform(a), linear(i) void foo(float *a, int
>> i);
>>  
>> …
>>  
>> #pragma omp simd
>> for(i) {                   // this loop could be vectorized with VF that’s wider than widest available vector function for foo().
>>      …
>>      foo(a, i)
>>      …
>> }
>>  
>> From: Venkataramanan Kumar
>> [mailto:[hidden email]]
>> Sent: Sunday, July 01, 2018 11:38 PM
>> To: Sanjay Patel <[hidden email]>
>> Cc: Saito, Hideki <[hidden email]>; [hidden email];
>> Masten, Matt <[hidden email]>; [hidden email]
>> Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>>  
>> Adding to Ashutosh's comments,  We are also interested in making LLVM generate vector math library calls that are available with glibc (version > 2.22).
>>  
>> reference: https://sourceware.org/glibc/wiki/libmvec
>>  
>> Using the example case given in the reference, we found there are  2 vector versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx) version and _ZGVdN4v_sin (avx2) versions.  Following the SVML path adding new entry in VecDesc structure in TargetLibraryInfo.cpp,  we can generate the vector version.
>>  
>> But unable to decide which version to expand in the vectorizer. We needed the  TTI information (ISA ).  It looks like better to legalize or generate them later.
>>  
>> regards,
>> Venkat.
>>  
>>  
>> On 30 June 2018 at 04:04, Sanjay Patel via llvm-dev <[hidden email]> wrote:
>> Hi Hideki -
>>  
>> I hinted at this problem in the summary text of https://reviews.llvm.org/D47610:
>> Why are we transforming from LLVM intrinsics to platform-specific intrinsics in IR? I don't see the benefit.
>>  
>> I don't know if it solves all of the problems you're seeing, but it should be a small change to transform to the platform-specific SVML or other intrinsics in the DAG. We already do this for mathlib calls on Linux for example when we can use the finite versions of the calls. Have a look in SelectionDAGLegalize::ConvertNodeToLibcall():
>>  
>>      if (CanUseFiniteLibCall &&
>> DAG.getLibInfo().has(LibFunc_log_finite))
>>        Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_FINITE_F32,
>>                                          RTLIB::LOG_FINITE_F64,
>>                                          RTLIB::LOG_FINITE_F80,
>>                                          RTLIB::LOG_FINITE_F128,
>>                                          RTLIB::LOG_FINITE_PPCF128));
>>      else
>>        Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_F32,
>> RTLIB::LOG_F64,
>>                                          RTLIB::LOG_F80,
>> RTLIB::LOG_F128,
>>                                          RTLIB::LOG_PPCF128));
>>  
>>  
>>  
>>  
>> On Fri, Jun 29, 2018 at 2:15 PM, Saito, Hideki <[hidden email]> wrote:
>>  
>> Ashutosh,
>>  
>> Thanks for the repy.
>>  
>> Related earlier topic on this appears in the review of the SVML patch (@mmasten). Adding few names from there.
>>        https://reviews.llvm.org/D19544
>> There, I see Hal’s review comment “let’s start only with the
>> directly-legal calls”. Apparently, what we have right now in the trunk is “not legal enough”. I’ll work on the patch to stop bleeding while we continue to discuss legalization topic.
>>  
>> I suppose
>> 1)      LV only solution (let LV emit already legalized VECLIB calls)
>> is certainly not scalable. It won’t help if VECLIB calls are generated
>> elsewhere. Also, keeping VF low enough to prevent the legalization problem is only a workaround, not a solution.
>> 2)      Assuming that we have to go to IR to IR pass route, there are 3 ways to think:
>> a.       Go with very generic IR to IR legalization pass comparable to
>> ISD level legalization. This is most general but I’d think this is the highest cost for development.
>> b.      Go with Intrinsic-only legalization and then apply VECLIB
>> afterwards. This requires all scalar functions with VECLIB mapping to be added to intrinsic.
>> c.       Go with generic enough function call legalization, with the
>> ability to add custom legalization for each VECLIB (and if needed each VECLIB or non-VECLIB entry).
>>  
>> I think the cost of 2.b) and 2.c) are similar and 2.c) seems to be
>> more flexible. So, I guess we don’t really have to tie this discussion
>> with “letting LV emit widened math call instead of VECLIB”, even though I strongly favor that than LV emitting VECLIB calls.
>>  
>> @Davide, in D19544, @spatel thought LibCallSimplifier has relevance to
>> this legalization topic. Do you know enough about LibCallSimiplifer to tell whether it can be extended to deal with 2.b) or 2.c)?
>>  
>> If we think 2.b)/2.c) are right enough directions, I can clean up what
>> we have and upload it to Phabricator as a starting point to get to 2.b)/2.c).
>>  
>> Continue waiting for more feedback. I guess I shouldn’t expect a lot this week and next due to the big holiday in the U.S.
>>  
>> Thanks,
>> Hideki
>>  
>> From: Nema, Ashutosh [mailto:[hidden email]]
>> Sent: Thursday, June 28, 2018 11:37 PM
>> To: Saito, Hideki <[hidden email]>
>> Cc: [hidden email]
>> Subject: RE: [RFC][VECLIB] how should we legalize VECLIB calls?
>>  
>> Hi Saito,
>>  
>> At AMD we have our own version of vector library and faced similar problems, we followed the SVML path and from vectorizer generated the respective vector calls. When vectorizer generates the respective calls i.e __svml_sin_4 or __amdlibm_sin_4, later one can perform only string matching to identify the vector lib call. I’m not sure it’s the proper way, may be instead of generating respective calls it’s better to generate some standard call (may be intrinsics) and lower it later. A late IR pass can be introduced to perform lowering, this will lower the intrinsic calls to specific lib calls(__svml_sin_4 or __amdlibm_sin_4 or … ). This can be table driven to decide the action based on the vector library, function name, VF and target information, the action can be full-serialize, partial-serialize(VF8 to 2 VF4) or generate the lib call with same VF.
>>  
>> Thanks,
>> Ashutosh
>>  
>> From: llvm-dev [mailto:[hidden email]] On Behalf Of
>> Saito, Hideki via llvm-dev
>> Sent: Friday, June 29, 2018 7:41 AM
>> To: 'Saito, Hideki via llvm-dev' <[hidden email]>
>> Subject: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>>  
>>  
>> Illustrative Example:
>>  
>> clang -fveclib=SVML -O3 svml.c -mavx
>>  
>> #include <math.h>
>> void foo(double *a, int N){
>>    int i;
>> #pragma clang loop vectorize_width(8)
>>    for (i=0;i<N;i++){
>>      a[i] = sin(i);
>>    }
>> }
>>  
>> Currently, this results in a call to <8 x double> __svml_sin8(<8 x double>) after the vectorizer.
>> This is 8-element SVML sin() called with 8-element argument. On the surface, this looks very good.
>> Later on, standard vector type legalization kicks-in but only the argument and return data are legalized.
>>          vmovaps %ymm0, %ymm1
>>          vcvtdq2pd       %xmm1, %ymm0
>>          vextractf128    $1, %ymm1, %xmm1
>>          vcvtdq2pd       %xmm1, %ymm1
>>          callq   __svml_sin8
>>          vmovups %ymm1, 32(%r15,%r12,8)
>>          vmovups %ymm0, (%r15,%r12,8)
>> Unfortunately, __svml_sin8() doesn’t use this form of input/output. It takes zmm0 and returns zmm0.
>> i.e., not legal to use for AVX.
>>  
>> What we need to see instead is two calls to __svml_sin4(), like below.
>>          vmovaps %ymm0, %ymm1
>>          vcvtdq2pd       %xmm1, %ymm0
>>          vextractf128    $1, %ymm1, %xmm1
>>          vcvtdq2pd       %xmm1, %ymm1
>>          callq   __svml_sin4
>>          vmovups %ymm0, 32(%r15,%r12,8)
>>          vmovups %ymm1, ymm0
>>          callq   __svml_sin4
>>          vmovups %ymm0, (%r15,%r12,8)
>>  
>> What would be the most acceptable way to make this happen? Anybody having had a similar need previously?
>>  
>> Easiest workaround is to serialize the call above “type legal”
>> vectorization factor. This can be done with a few lines of code, plus the code to recognize that the call is “SVML” (which is currently string match against “__svml” prefix in my local workspace).
>> If higher VF is not forced, cost model will likely favor lower VF. Functionally correct, but obviously not an ideal solution.
>>  
>> Here are a few ideas I thought about:
>> 1)      Standard LegalizeVectorType() in CodeGen/SelectionDAG doesn’t
>> seem to work. We could define a generic ISD::VECLIB and try to split into two or more VECLIB nodes, but at that moment we lost the information about which function to call.
>> We can’t define ISD opcode per function. There will be too many libm entries to deal with. We need a scalable solution.
>> 2)      We could write an IR to IR pass to perform IR level
>> legalization. This is essentially duplicating the functionality of
>> LegalizeVectorType() but we can make this available for other similar things that can’t use ISD level vector type legalization. This looks to be attractive enough from that perspective.
>> 3)      We have implemented something similar to 2), but legalization
>> code is specialized for SVML legalization. This was much quicker than trying to generalize the legalization scheme, but I’d imagine community won’t like it.
>> 4)      Vectorizer emit legalized VECLIB calls. Since it can emit
>> instructions in scalarized form, adding legalized call functionality
>> is in some sense similar to that. Vectorizer can’t simply choose type legal function name with illegal vector ---- since LegalizeVectorType() will still end up using one call instead of two.
>>  
>> Anything else?
>>  
>> Also, doing any of this requires reverse mapping from VECLIB name to scalar function name. What’s the most recommended way to do so?
>> Can we use TableGen to create a reverse map?
>>  
>> Your input is greatly appreciated. Is there a real need/desire for 2) outside of VECLIB (or outside of SVML)?
>>  
>> Thanks,
>> Hideki Saito
>> Intel Corporation
>>  
>>  
>>  
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>  
>>  
>>
>>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--

Simon Moll
Researcher / PhD Student

Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.31

Tel. +49 (0)681 302-57521 : [hidden email]
Fax. +49 (0)681 302-3065  : http://compilers.cs.uni-saarland.de/people/moll

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev
In reply to this post by Tom Stellard via llvm-dev
Hi,

On 4 July 2018 at 07:42, Nema, Ashutosh via llvm-dev <[hidden email]> wrote:
+ llvm-dev

-----Original Message-----
From: Nema, Ashutosh
Sent: Wednesday, July 4, 2018 12:12 PM
To: Hal Finkel <[hidden email]>; Saito, Hideki <[hidden email]>; Sanjay Patel <[hidden email]>; [hidden email]
Cc: [hidden email]; Masten, Matt <[hidden email]>
Subject: RE: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Hi Hal,

> __svml_sin8 (plus whatever shuffles are necessary).
> The vectorizer should do this.
> It should not generate calls to functions that don't exist.

I'm not sure how vectorizer will do this, consider the case where "-vectorizer-maximize-bandwidth" option is enabled and vectorizer is forced to generate the wider VF, and hence it may generate a call to __svml_sin_* which may not exist.

Are you expecting the vectorizer to lower the calls i.e. __svml_sin_8 to two __svml_sin_4 calls ?

Regards,
Ashutosh

If an accurate cost model was in place (which there isn't), then an "unsupported" vectorization factor should only be selected if it was forced.  However, in this case __svml_sin_8 is the same cost as __svml_sin_4, so the loop vectorizer will select a VF of 8, and generate a call to a function which effectively doesn't exist.

The simplest way to fix it, is to simply only populate the SVML vector library table with __svml_sin_8 when the target is AVX-512.  Alternatively, TLI.isFunctionVectorizable() should check that the entry is available on the target (this is more difficult as the type is not encoded).

I'm guessing that the cost model would then make VF=4 cheaper, so generating calls to __svml_sin_4 (I'm not in work so can't check).   If the vectorization factor was forced to 8, we'll either get a call to the intrinsic llvm.sin.v8f64 (if no-math-errno) or the vectorizer will scalarize the call.  The vectorizer would not generate two calls to __svml_sin_4 although this would be cheaper.

While this problem probably doesn't require the loop vectorizer to have knowledge of the target ABI, others may do.  I'm thinking specifically of D48193:

https://reviews.llvm.org/D48193

In this case we have poor code generation due to the interleave count selected by the loop vectorizer.  I can't see how this can be fixed later, so we will need to expose details of the ABI to the loop vectorizer (see my latest comment D48193#1149705).

Thanks,
Rob.



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev

On 07/04/2018 07:50 AM, Robert Lougher wrote:
Hi,

On 4 July 2018 at 07:42, Nema, Ashutosh via llvm-dev <[hidden email]> wrote:
+ llvm-dev

-----Original Message-----
From: Nema, Ashutosh
Sent: Wednesday, July 4, 2018 12:12 PM
To: Hal Finkel <[hidden email]>; Saito, Hideki <[hidden email]>; Sanjay Patel <[hidden email]>; [hidden email]
Cc: [hidden email]; Masten, Matt <[hidden email]>
Subject: RE: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Hi Hal,

> __svml_sin8 (plus whatever shuffles are necessary).
> The vectorizer should do this.
> It should not generate calls to functions that don't exist.

I'm not sure how vectorizer will do this, consider the case where "-vectorizer-maximize-bandwidth" option is enabled and vectorizer is forced to generate the wider VF, and hence it may generate a call to __svml_sin_* which may not exist.

Are you expecting the vectorizer to lower the calls i.e. __svml_sin_8 to two __svml_sin_4 calls ?

Regards,
Ashutosh

If an accurate cost model was in place (which there isn't), then an "unsupported" vectorization factor should only be selected if it was forced.  However, in this case __svml_sin_8 is the same cost as __svml_sin_4, so the loop vectorizer will select a VF of 8, and generate a call to a function which effectively doesn't exist.

Would it actually be the same, or would there be extra shuffle costs associated with the calls to __svml_sin_4?


The simplest way to fix it, is to simply only populate the SVML vector library table with __svml_sin_8 when the target is AVX-512.

I believe that this is exactly what we should do. When not targeting AVX-512, __svml_sin_8 essentially doesn't exist (i.e. there's no usable ABI via which we can call it), and so it should not appear in the vectorizer's list of options at all.

 -Hal

  Alternatively, TLI.isFunctionVectorizable() should check that the entry is available on the target (this is more difficult as the type is not encoded).

I'm guessing that the cost model would then make VF=4 cheaper, so generating calls to __svml_sin_4 (I'm not in work so can't check).   If the vectorization factor was forced to 8, we'll either get a call to the intrinsic llvm.sin.v8f64 (if no-math-errno) or the vectorizer will scalarize the call.  The vectorizer would not generate two calls to __svml_sin_4 although this would be cheaper.

While this problem probably doesn't require the loop vectorizer to have knowledge of the target ABI, others may do.  I'm thinking specifically of D48193:

https://reviews.llvm.org/D48193

In this case we have poor code generation due to the interleave count selected by the loop vectorizer.  I can't see how this can be fixed later, so we will need to expose details of the ABI to the loop vectorizer (see my latest comment D48193#1149705).

Thanks,
Rob.



-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Tom Stellard via llvm-dev

All,

It looks like we are finally converging into

4)      Vectorizer emit legalized VECLIB calls. Since it can emit instructions in scalarized form, adding legalized call functionality is in some sense similar to that. Vectorizer can’t simply choose type legal function name with illegal vector ---- since LegalizeVectorType() will still end up using one call instead of two.

I was hoping to collectively come up with a better solution, but not at all surprised to see us settling down to this known-to-work practical approach.

We need a more elaborate VECLIB setting, taking per-target function availability into account. Also, much of the "legalized call" mechanism should work for OpenMP declare simd --- and we should make that easier for reuse in case other FE/optimizers want to emit legalized calls.

Simon, is the RV "legalized call emission" code easily reusable outside of RV? If yes, would you be able to restructure it so that it can reside, say, under Transforms/Utils?

I think this RFC is ready to close at the end of this week. Thank you very much for all the lively discussions. If anybody have more inputs, please speak up soon.

Thanks,
Hideki

===========================================
From: Hal Finkel [mailto:[hidden email]]
Sent: Wednesday, July 04, 2018 9:59 AM
To: Robert Lougher <[hidden email]>; Nema, Ashutosh <[hidden email]>
Cc: Saito, Hideki <[hidden email]>; Sanjay Patel <[hidden email]>; [hidden email]; [hidden email]; [hidden email]; Masten, Matt <[hidden email]>
Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?


On 07/04/2018 07:50 AM, Robert Lougher wrote:
Hi,

On 4 July 2018 at 07:42, Nema, Ashutosh via llvm-dev <[hidden email]> wrote:
+ llvm-dev

-----Original Message-----
From: Nema, Ashutosh
Sent: Wednesday, July 4, 2018 12:12 PM
To: Hal Finkel <[hidden email]>; Saito, Hideki <[hidden email]>; Sanjay Patel <[hidden email]>; [hidden email]
Cc: [hidden email]; Masten, Matt <[hidden email]>
Subject: RE: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Hi Hal,

> __svml_sin8 (plus whatever shuffles are necessary).
> The vectorizer should do this.
> It should not generate calls to functions that don't exist.

I'm not sure how vectorizer will do this, consider the case where "-vectorizer-maximize-bandwidth" option is enabled and vectorizer is forced to generate the wider VF, and hence it may generate a call to __svml_sin_* which may not exist.

Are you expecting the vectorizer to lower the calls i.e. __svml_sin_8 to two __svml_sin_4 calls ?

Regards,
Ashutosh

If an accurate cost model was in place (which there isn't), then an "unsupported" vectorization factor should only be selected if it was forced.  However, in this case __svml_sin_8 is the same cost as __svml_sin_4, so the loop vectorizer will select a VF of 8, and generate a call to a function which effectively doesn't exist.

Would it actually be the same, or would there be extra shuffle costs associated with the calls to __svml_sin_4?



The simplest way to fix it, is to simply only populate the SVML vector library table with __svml_sin_8 when the target is AVX-512.

I believe that this is exactly what we should do. When not targeting AVX-512, __svml_sin_8 essentially doesn't exist (i.e. there's no usable ABI via which we can call it), and so it should not appear in the vectorizer's list of options at all.

 -Hal


  Alternatively, TLI.isFunctionVectorizable() should check that the entry is available on the target (this is more difficult as the type is not encoded).
I'm guessing that the cost model would then make VF=4 cheaper, so generating calls to __svml_sin_4 (I'm not in work so can't check).   If the vectorization factor was forced to 8, we'll either get a call to the intrinsic llvm.sin.v8f64 (if no-math-errno) or the vectorizer will scalarize the call.  The vectorizer would not generate two calls to __svml_sin_4 although this would be cheaper.

While this problem probably doesn't require the loop vectorizer to have knowledge of the target ABI, others may do.  I'm thinking specifically of D48193:

https://reviews.llvm.org/D48193
In this case we have poor code generation due to the interleave count selected by the loop vectorizer.  I can't see how this can be fixed later, so we will need to expose details of the ABI to the loop vectorizer (see my latest comment D48193#1149705).
Thanks,
Rob.



--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
12