Hi, Current loop vectorizer uses a range of vectorization factors computed by MaxVF. For each VF, it setups unform and scalar info before building VPlan and the final best VF selection. The best VF is also selected within the VF range. for (unsigned VF = 1; VF <= MaxVF; VF *= 2) { // Collect Uniform and Scalar instructions after vectorization with VF. CM.collectUniformsAndScalars(VF); // Collect the instructions (and their associated costs) that will be more // profitable to scalarize. if (VF > 1) CM.collectInstsToScalarize(VF); } It looks when force vectorization is not given, it is not necessary to setup uniform and scalar info for every VF. For a VF, we can do a check before collectUniformsAndScalars() and collectInstsToScalarize() to see if the types used in
the code can actually yield any vector types(after type legalization) or not. If not, there is no point for this VF to participate in VPlan and VF selection. As the (scalar) types can be collected once for all VFs, I guess it is cheap enough. As both collectUniformsAndScalars()
and collectInstsToScalarize() don’t look cheap, doing such check can speed up vectorization, in particular, for large MaxVFs. Another minor thing is when force vectorization is enabled and MaxVF > 1, expected cost of VF=2 is computed twice at the moment.
bool ForceVectorization = Hints->getForce() == LoopVectorizeHints::FK_Enabled; // Ignore scalar width, because the user explicitly wants vectorization. if (ForceVectorization && MaxVF > 1) { Width = 2; Cost = expectedCost(Width).first / (float)Width; } for (unsigned i = 2; i <= MaxVF; i *= 2) { // Notice that the vector loop needs to be executed less times, so // we need to divide the cost of the vector loops by the width of // the vector elements. VectorizationCostTy C = expectedCost(i); float VectorCost = C.first / (float)i; Cheers, Shixiong (Jason) Xu _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
Thanks for pointing this out and sorry for the delayed response. > It looks when force vectorization is not given, it is not necessary to setup uniform and scalar info for every VF. For a VF, we can do a check before collectUniformsAndScalars() and collectInstsToScalarize() to see if the types used in
the code can actually yield any vector types(after type legalization) or not. If not, there is no point for this VF to participate in VPlan and VF selection. As the (scalar) types can be collected once for all VFs, I guess it is cheap enough. As both collectUniformsAndScalars()
and collectInstsToScalarize() don’t look cheap, doing such check can speed up vectorization, in particular, for large MaxVFs. I’m not sure I understand your proposal. Please, note that uniform values may vary from VF to VF. For example, a branch condition can be uniform (and be kept scalar) for VF=4 but can be divergent for VF=8. For
that reason we have to compute this information for every potential VF since it can be different. Another minor thing is when force vectorization is enabled and MaxVF > 1, expected cost of VF=2 is computed twice at the moment.
Agreed. It would be great if you can submit a patch for that or open a bug. I guess that we just need to introduce a MinVF that is set to 4 when vectorization is forced. Thanks! Diego From: llvm-dev [mailto:[hidden email]]
On Behalf Of Shixiong Xu via llvm-dev Hi, Current loop vectorizer uses a range of vectorization factors computed by MaxVF. For each VF, it setups unform and scalar info before building VPlan and the final best VF selection. The best VF is also selected within the VF range. for (unsigned VF = 1; VF <= MaxVF; VF *= 2) { // Collect Uniform and Scalar instructions after vectorization with VF. CM.collectUniformsAndScalars(VF); // Collect the instructions (and their associated costs) that will be more // profitable to scalarize. if (VF > 1) CM.collectInstsToScalarize(VF); } It looks when force vectorization is not given, it is not necessary to setup uniform and scalar info for every VF. For a VF, we can do a check before collectUniformsAndScalars() and collectInstsToScalarize() to see if the types used in
the code can actually yield any vector types(after type legalization) or not. If not, there is no point for this VF to participate in VPlan and VF selection. As the (scalar) types can be collected once for all VFs, I guess it is cheap enough. As both collectUniformsAndScalars()
and collectInstsToScalarize() don’t look cheap, doing such check can speed up vectorization, in particular, for large MaxVFs. Another minor thing is when force vectorization is enabled and MaxVF > 1, expected cost of VF=2 is computed twice at the moment.
bool ForceVectorization = Hints->getForce() == LoopVectorizeHints::FK_Enabled; // Ignore scalar width, because the user explicitly wants vectorization. if (ForceVectorization && MaxVF > 1) { Width = 2; Cost = expectedCost(Width).first / (float)Width; } for (unsigned i = 2; i <= MaxVF; i *= 2) { // Notice that the vector loop needs to be executed less times, so // we need to divide the cost of the vector loops by the width of // the vector elements. VectorizationCostTy C = expectedCost(i); float VectorCost = C.first / (float)i; Cheers, Shixiong (Jason) Xu _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
Hi Diego, Thanks for your reply. :) Shixiong From: Caballero, Diego <[hidden email]> EXTERNAL MAIL Hi Xu, Thanks for pointing this out and sorry for the delayed response. > It looks when force vectorization is not given, it is not necessary to setup uniform and scalar info for every VF. For a VF, we can do a check before collectUniformsAndScalars() and collectInstsToScalarize() to see if the types used in
the code can actually yield any vector types(after type legalization) or not. If not, there is no point for this VF to participate in VPlan and VF selection. As the (scalar) types can be collected once for all VFs, I guess it is cheap enough. As both collectUniformsAndScalars()
and collectInstsToScalarize() don’t look cheap, doing such check can speed up vectorization, in particular, for large MaxVFs. I’m not sure I understand your proposal. Please, note that uniform values may vary from VF to VF. For example, a branch condition can be uniform (and be kept scalar) for VF=4 but can be divergent for VF=8. For
that reason we have to compute this information for every potential VF since it can be different. Another minor thing is when force vectorization is enabled and MaxVF > 1, expected cost of VF=2 is computed twice at the moment.
Agreed. It would be great if you can submit a patch for that or open a bug. I guess that we just need to introduce a MinVF that is set to 4 when vectorization is forced. Thanks! Diego From: llvm-dev [[hidden email]]
On Behalf Of Shixiong Xu via llvm-dev Hi, Current loop vectorizer uses a range of vectorization factors computed by MaxVF. For each VF, it setups unform and scalar info before building VPlan and the final best VF selection. The best VF is also selected within the VF range. for (unsigned VF = 1; VF <= MaxVF; VF *= 2) { // Collect Uniform and Scalar instructions after vectorization with VF. CM.collectUniformsAndScalars(VF); // Collect the instructions (and their associated costs) that will be more // profitable to scalarize. if (VF > 1) CM.collectInstsToScalarize(VF); } It looks when force vectorization is not given, it is not necessary to setup uniform and scalar info for every VF. For a VF, we can do a check before collectUniformsAndScalars() and collectInstsToScalarize() to see if the types used in
the code can actually yield any vector types(after type legalization) or not. If not, there is no point for this VF to participate in VPlan and VF selection. As the (scalar) types can be collected once for all VFs, I guess it is cheap enough. As both collectUniformsAndScalars()
and collectInstsToScalarize() don’t look cheap, doing such check can speed up vectorization, in particular, for large MaxVFs. Another minor thing is when force vectorization is enabled and MaxVF > 1, expected cost of VF=2 is computed twice at the moment.
bool ForceVectorization = Hints->getForce() == LoopVectorizeHints::FK_Enabled; // Ignore scalar width, because the user explicitly wants vectorization. if (ForceVectorization && MaxVF > 1) { Width = 2; Cost = expectedCost(Width).first / (float)Width; } for (unsigned i = 2; i <= MaxVF; i *= 2) { // Notice that the vector loop needs to be executed less times, so // we need to divide the cost of the vector loops by the width of // the vector elements. VectorizationCostTy C = expectedCost(i); float VectorCost = C.first / (float)i; Cheers, Shixiong (Jason) Xu _______________________________________________ LLVM Developers mailing list [hidden email] http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev |
Free forum by Nabble | Edit this page |