[llvm-dev] RFC: should CVP always narrow the width of lshr?

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[llvm-dev] RFC: should CVP always narrow the width of lshr?

Chris Lattner via llvm-dev

In https://reviews.llvm.org/D46760,
Bixia Zheng notes that given the pattern like [1]
  %za = zext i32 %a to i64
  %udiv = udiv i64 %za, <C>
  %urem = urem i64 %za, <C>
  %uadd = add i64 %udiv, %urem
  ret i64 %uadd

the CVP will reduce the width of udiv/urem,
without considerations for the new instructions,
or whether the new width is legal for the target.
That was implemented in https://reviews.llvm.org/D44102

As Sanjay Patel notes in https://reviews.llvm.org/D47113#1106601,
> We overlooked these questions in the specific case of div/rem
> (D44102) because we assumed that narrower div/rem are always
> better for analysis and codegen <cut>

Now, there is a caveat.
If <C> is constant, and is a power-of-two, instcombine
will transform udiv to lshr, and urem to and.
And CVP pass does not narrow the width of those opcodes.
And instcombine won't narrow them either, because the
zext has multiple uses / that would create more instructions,
which is illegal for instcombine.

So [1] may or may not be handled properly...

There are multiple solutions, and more than one
solution can be implemented, i suppose.
1. https://reviews.llvm.org/D46760
    Somehow relax instcombine, to check for this very specific pattern.
    I don't think we want that.
2. Extend aggressive-instcombine to narrow a sequence of arbitrary
    binops (and maybe even more than binops). This is probably
    the most powerful, but i'm not quite sure how it would look..
3. Teach CVP to always narrow the width of lshr.
    As Sanjay Patel notes in https://reviews.llvm.org/D47113#1106601,
    > <cut> but I doubt that we can extend that reasoning to all binops
    > on all targets. This will fight with transforms in instcombine
    > (see canEvaluateSExtd / canEvaluateZExtd).
    But, this is already sufficient to handle the [1].

Thus, as suggested, i'm sending this RFC.
What do people think about "CVP should always narrow lshr" approach?
Or is that likely to be too bad, and some more concrete approach,
like the aggressive-instcombine extension is highly preferable?

LLVM Developers mailing list
[hidden email]