[llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

David Jones via llvm-dev
Hi All,

I've been working on the GlobalISel combiner recently and I'd like to share the plan for how Combine Rules will be defined in GlobalISel and solicit feedback on it.

This email ended up rather long so:
TL;DR: We're planning to define GlobalISel Combine Rules using MIR syntax with a few bits glued on to interface with the algorithm and escape into C++ when we need to. Eventually, ISel rules may follow suit.

Thanks to the following people for their help in playing with this syntax and discussing it with me over the last couple weeks:
  • Aditya Nandakumar
  • Adrian Prantl
  • Ahmed Bougacha
  • Amara Emerson
  • Jessica Paquette
  • Michael Berg
  • Justin Bogner
  • Roman Tereshin
  • Vedant Kumar
  • Volkan Keles
Their feedback has vastly improved this over the original version and it was through those discussions that the idea of re-using MIR as a declarative definition was born.

Also thanks to the person who raised the issue of debug info at the BoF. I'm afraid I didn't write your name in my notes on the day so hopefully you'll make yourself known so we can give you the credit you're due. The debug info part of the proposal is still a little hand-wavy but it wouldn't have been here at all without you :-)

Scope of this RFC

This RFC is intended to solicit feedback on the syntax, semantics, and overall appearance of the tablegen classes and definitions used to define Combine Rules for GlobalISel and see if there are any flaws in the usability of such definitions. I'd like to keep the mechanics of the Combine algorithm fairly abstract for this discussion as a few things that are syntactically possible will require improvements to the combine algorithm before they can be used and I'd like to leave my thoughts on the algorithm to a later discussion once I have something more concrete. I'll label these areas as they come up.

Overview

GlobalISel requires some passes that replace groups of MachineInstrs with simpler equivalents just like DAGCombine does for SelectionDAG. DAGCombine as a whole is composed of multiple large monoliths of C++ code with one covering the target independent combines, and one for each target. While this gives a lot of power and freedom to do whatever is needed, such an architecture also makes several things difficult:
  • It's difficult to test an individual combine as others can modify the test case before it reaches the relevant rule.
  • It's difficult to debug due to the quantity of transformations being applied, the iterative nature of the algorithm, and the lack of scope-reducing techniques (e.g. bisection) beyond the top-level opcode.
  • It's difficult for a target to opt-out of or specialize a common combine. In-tree targets must inject triple checks, while out-of-tree targets must also pay the price of the resulting merge conflicts.
  • It's difficult to write a target specific combine with higher priority than one common combine but lower priority than another.
  • It's difficult to write analysis tools for the combines such as those to detect unreachable combines, or to prove them correct, etc.
  • It's difficult to investigate improvements to the overall algorithm as there's no means to change the code for all rules.
  • and so on

For these reasons and more it would be useful to have at least partial generation of the combine rules used by GlobalISel. In addition to this, the bulk of matching a combine is boilerplate. It would be useful to defer the boilerplate code to a generator and focus more on the details that make each combine unique.

To that end (and also to eventually drop the SelectionDAG importer for ISel), I've been looking into how we would declare Combine Rules in tablegen and with the help of the people credited above I think we've ended up somewhere that should be nice to work with and opens up a lot of possibilities.

Individual Combine Rules

A Simple Example

Here's a simple example that eliminates a redundant G_TRUNC.

def : GICombineRule<(defs root:$D, operand:$S),
                    (match [{MIR %1(s32) = G_ZEXT %S(s8)
                                 %D(s16) = G_TRUNC %1(s32) }]),
                    (apply [{MIR %D(s16) = G_ZEXT %S(s8) }])>;

There's three parts to this rule:
  • defs declares the interface for the rule. The definitions in the defs section are the glue between the Combine algorithm and the rule, as well as between the match and apply sections. In this case we only define the root of the match and that we have a register variable called $S.
  • match declares the conditions that must be true for the rule to fire. In this case, we require that we match a G_TRUNC whose result is of type s16 and whose operand is type s32 as well as a G_ZEXT whose result is the same as the G_TRUNC's operand, and with an operand of type s8. If all these are true, then we have the option of applying the rule using the apply section. Some algorithms may take every opportunity it sees but others might weigh up different possible choices more carefully.
  • apply declares the result of the rule firing. In this case, we'd build a G_ZEXT whose result is %D and whose operand is %S from the match.

Generally speaking, the Combine rule patterns declare what needs to happen and it's up to tablegen to choose how to achieve it. However, Combines can be rather complicated and a DSL cannot hope to cover everything that might be needed. As such there will be trapdoors into arbitrary C++.

The MIR match block matches by following the use-def edges and does not require that the MachineInstr's are in the exact order specified. It only requires that the instructions are connected to each other as described and that the input MIR is valid (e.g. no uses before defs). It can reach across basic blocks when it's safe to do so. We may need to introduce a means to limit the inter-basic-block reach as I can foresee situations where it may pull expensive operations into a loop if it's not careful about this. My current thinking on this is that it should be limited to moving to similar or cheaper blocks. I can potentially see some use in choosing the insertion point to select cheaper blocks however I think that's best left to passes like MachineLICM rather than having Combine try to do everything.

Syntactically in tablegen, the '[{MIR ... }]' is just an ordinary code block that happens to start with the string 'MIR'. The 'MIR' string is there for editors to use to switch syntax highlighting modes to MIR instead of C++.

Generalizing the Simple Example

Now let's generalize the match section a bit so that it handles all intermediate scalar types. All we need to do is delete the type from %1 like so:
    def : GICombineRule<(defs root:$D, operand:$S),
                        (match [{MIR %1 = G_ZEXT %S(s8)
                                     %D(s16) = G_TRUNC %1 }]),
                        (apply [{MIR %D(s16) = G_ZEXT %S(s8) }])>;
what we've done here is removed the type check on %1 entirely. This is safe for this particular rule since, by definition, G_ZEXT will always produce a larger scalar result type than its scalar input type and G_TRUNC will always produce a smaller one. The only way these can both be the case is if %1 is a a scalar of at least s17. We're still not covering every type combination that we ought to though since %S and %D are still restricted to s8 and s16 respectively. To fix this, we'll need some custom predicates.

A custom predicate is declared like so:
    def isScalarType : GIMatchPredicate<(ins type:$A), bool, [{
      return ${A}.isScalar();
    }]>;
this declares that the predicate takes a type (LLT) named A and returns a bool. It also declares the code used to test the predicate and the place that the variable for A must be inserted. We can potentially use types other than bool such as structures but we'll leave that for later. Completing the generalized example using this (and another) predicate, we get:
    def isLargerType : GIMatchPredicate<(ins type:$A, type:$B), bool, [{
      return ${A}.getSizeInBits() > ${B}.getSizeInBits();
    }]>;
    def : GICombineRule<(defs root:$D, operand:$S, operand:$T),
                        (match [{MIR %T = G_ZEXT %S
                                     %D = G_TRUNC %T }],
                               (isScalarType type:$D),
                               (isLargerType type:$D, type:$S),
                               (isLargerType type:$T, type:$S),
                               (isLargerType type:$T, type:$D)),
                        (apply [{MIR %D = G_ZEXT %S }])>;
We've declared a C++ predicate that takes checks whether a given type is a scalar. We've also declared another which checks that one type has more bits than another. We've then extended the defs for the rule with an additional operand which is used to name the intermediate operand to allow it to be referenced from C++ predicates. Lastly we've called the C++ predicates as part of the match. Each argument is of type 'operand' and needs to be provided as type 'type' so tablegen emits code to test for convertability (MO.isReg() in this case), and to do the conversion (MRI.getType(MO.getReg()) in this case). The end result is that this rule now supports all the type combinations where we extend to a larger type, then truncate to a type that lies between the intermediate and the source in size.

As noted earlier, we can simplify this to:
    def : GICombineRule<(defs root:$D, operand:$S),
                        (match [{MIR %0 = G_ZEXT %S
                                     %D = G_TRUNC %0 }],
                               (isScalarType type:$D),
                               (isLargerType type:$D, type:$S)),
                        (apply [{MIR %D = G_ZEXT %S }])>;
since the definition of G_ZEXT/G_TRUNC is such that $T > $S and $T > $D by definition.

Subtarget specific Rules

Facilities like rule predication based on subtarget features will work just as they do in SelectionDAG:
    def : GICombineRule<...>, Requires<Foo>;

Preserving DILocation and other debug info

This section is still fairly vague and has had less investigation than the others. Here's the current thinking on it though.

Because, we're using MIR it becomes fairly easy to represent debug line information in the rules themselves. Here we expand on the above example to merge the line info for the G_ZEXT and the G_TRUNC into the resulting G_ZEXT using DILocation::getMergedLocation() by simply specifying both debug-location metadata on the new instruction.

    def : GICombineRule<(defs root:$D, operand:$S),
                        (match [{MIR %0 = G_ZEXT %S, debug-location !1
                                     %D = G_TRUNC %0, debug-location !2 }],
                               (isScalarType type:$D),
                               (isLargerType type:$D, type:$S)),
                        (apply [{MIR %D = G_ZEXT %S, debug-location !1, debug-location !2 }])>;

If we add DBG_VALUE to this, then we get:
    def : GICombineRule<(defs root:$D, operand:$S, diexpression:$DExpr),
                        (match [{MIR %0 = G_ZEXT %S, debug-location !1
                                     %D = G_TRUNC %0, debug-location !2
                                     DBG_VALUE %0, !3, !metadata($DExpr), debug-location !4
                                     DBG_VALUE %D, !5, !6, debug-location !7
                                }],
                               (isScalarType type:$D),
                               (isLargerType type:$D, type:$S)),
                        (apply (createDIExprLLVMFragment diexpression:$DExpr, type:$D):$DNewExpr,
                               [{MIR %D = G_ZEXT %S, debug-location !1, debug-location !2
                                     DBG_VALUE %D, !3, !metadata($DNewExpr), debug-location !4
                                     DBG_VALUE %D, !5, !6, debug-location !7
                                }])>;

The semantics of DBG_VALUE in the match and apply sections are a little different from most instructions. They still match based on use-def edges rather than the exact order written like normal instructions. However, it's an optional match so that we don't fail to match due to missing DBG_VALUE's. It can also match multiple DBG_VALUE's if there happen to be multiple variables described by a single vreg. The apply step's DBG_VALUE creates DBG_VALUEs corresponding to each of those that were matched. createDIExprLLVMFragment invokes C++ code which creates a new DIExpression by concatenating a 'DW_OP_LLVM_fragment, 0, sizeof(type)' for the given type to the existing DIExpression.

One interesting benefit of this is that it allows us to verify that all DILocations and DBG_VALUE's are accounted for and handled in some way.

I should mention that I'm not terribly keen on the current syntax for modifying the diexpression above (createDIExprLLVMFragment). I feel we ought to be able to improve on that and make it more inline in the MIR.

Matching immediates and G_CONSTANT/G_FCONSTANT

In GlobalISel, an immediate is in one of three forms. There's int64_t for MachineOperand::isImm() and ConstantInt * for MachineOperand::isCImm() but those forms are uncommon and are only really seen on target instructions. The most common form is a vreg defined by G_CONSTANT or G_FCONSTANT. All three forms are handled by the 'imm' declaration.

Here we declare rule that matches X * 2 using a custom predicate to check the immediate is 2:
    def : GICombineRule<(defs root:$D, reg:$A, imm:$VAL),
                        (match [{MIR %D = G_MUL %A, %VAL }],
                               (isTwo imm:$VAL)),
                        (apply [{MIR %root = MYTGT_DOUBLE
 %A }])>;

Listing the C++ predicates like that will lead to a lot of repetitive and bug-prone code so it will also be possible to embed the predicates in custom variants of imm.
    def imm_2 : GIPredicatedDefKind<(isTwo imm)>;
    def : GICombineRule<(defs root:$D, reg:$A, imm_2:$VAL),
                        (match [{MIR %D = G_MUL %A, %VAL }])
                        (apply [{MIR %root = MYTGT_DOUBLE %A }])>;
These are effectively appended to the match section.

Just matching an immediate isn't enough though. We'd also want to be able to use them in the result and modify them. This works the same way as registers. Here's an example of using the immediate in the apply step in a rule that replaces 2 * (A + B) with 2A + 2B:
    def : GICombineRule<
      (defs root:$root, reg:$A, imm:$B, imm_2:$C),
      (match [{MIR
               %1 = G_ADD %A, %B
               %root = G_MUL %1, %C
             }]),
      (apply [{MIR %1 = G_ADD %A, %A
                   %2 = G_ADD %B, %B
                   %root = G_ADD %1, %2 }])>;
and of course, we'd also want to constant fold the 2B which is achieved in this rule by creating a new operand before we generate the new instructions:
    def : GICombineRule<
      (defs root:$root, reg:$A, imm:$B, imm_2:$C),
      (match [{MIR
               %1 = G_ADD %A, %B
               %root = G_MUL %1, %C
             }]),
      (apply (create_imm [{ 2 * ${B}->getZExtValue() }], apint_value:$B):$NB,
             [{MIR %1 = G_ADD %A, %A
                   %root = G_ADD %1, %NB }])>;
In this definition the (create_imm ...):$NB creates a new operand with MachineOperand::CreateImm(...) and names it $NB for use in the pattern. The code block inside the create_imm used as the argument to CreateImm(), and the apint_value:$B converts the operand to an APInt before being given to the code block much like the custom predicates did earlier.

Passing arbitrary data from match to apply

As mentioned earlier, the defs block defines the interface between the match and apply steps. This can be used to arrange for arbitrary data to be passed from match to apply. 

In the current AArch64PreLegalizeCombiner we have a rule that matches a G_LOAD and all its uses simultaneously and decides on the best way to rewrite it to minimize the sign/zero/any-extend operations. This rule passes a struct (PreferredTuple) between the current C++ equivalent for the match to the current C++ equivalent to the apply. Converting that into this tablegen syntax, we'd write:
    def extending_load_matchdata : GIDefMatchData<"PreferredTuple">;
    def extending_load_predicate : GIMatchPredicate<
         (ins reg:$A, extending_load_matchdata:$B), bool, [{
      return Helper.matchCombineExtendingLoads(${A}, ${matchinfo});
    }]>;
    def extending_loads : GICombineRule<
      (defs root:$root, reg:$A, extending_load_matchdata:$matchinfo),
      (match [{MIR %root = G_LOAD %A }],
             (extending_load_predicate root:$A,
                                       extending_load_matchdata:$matchinfo)),
      (apply (exec [{ Helper.applyCombineExtendingLoads(${root}, ${matchinfo}); }],
                   reg:$root, extending_load_matchdata:$matchinfo)>;

The GIDefMatchData declares a new type of data that can be passed from the match to the apply. Tablegen is responsible for arranging for appropriate storage during the Combine algorithm. The GIMatchPredicate declares a C++ predicate that fills out the PreferredTuple (passed by reference) whenever it returns true for a successful match. We could have made the predicate return std::pair<bool, PreferredTuple> instead but that's less efficient (it would be an sret return on many targets) and would require us to define the truthiness (no examples of are in this email as I expect it to be a rare thing to need) in order to act as a predicate. Normally, you'd feed this into a (create_imm ...) or a (create_operand ...) in the apply section. However, in this particular case the data being passed determines the entirety of the replacement so we escape into arbitrary C++ instead and arrange for the variables to be injected appropriately using the 'exec' operator.

Macros

I simplified the previous example a bit. Rather than only matching a G_LOAD, the current rule in AArch64 can match any of G_LOAD, G_SEXTLOAD, and G_ZEXTLOAD. We need some means to match one of several alternatives as well as collect and re-use common subpatterns. I've yet to look into how this would be practically implemented and this section is a bit vague as a result but here's the current thinking on how it should look and behave:
    def ANYLOAD : GIMacro<(defs def:$R, use:$S, uint64_t:$IDX),
                          (match (oneof [{MIR %R = G_LOAD %S}],
                                        [{MIR %R = G_SEXTLOAD %S}],
                                        [{MIR %R = G_ZEXTLOAD %S}]):$IDX>;
    def extending_loads : GICombineRule<
      (defs root:$root, reg:$A, extending_load_matchdata:$matchinfo, ANYLOAD:$ANYLOAD),
      (match [{MIR %root = ANYLOAD %A }],
             (extending_load_predicate root:$A,
                                       extending_load_matchdata:$matchinfo)),
      (apply (exec [{ Helper.applyCombineExtendingLoads(${root}, ${matchinfo}); }],
                   reg:$root, extending_load_matchdata:$matchinfo)>;
Effectively, we're declaring a fake instruction and import it into this rule, possibly renaming it using the argument name ($ANYLOAD in this case) to provide a more convenient name in the MIR block or to disambiguate multiple instances. Once we've parsed the MIR, we would recursively replace any instance of ANYLOAD with code match one of the alternatives. The variables in the 'defs' section of the macro would be available as $ANYLOAD_R, $ANYLOAD_S, and $ANYLOAD_IDX (we have to use '_' instead of a '.' to fit within tablegen's syntax) with $ANYLOAD_IDX indicating which alternative the (oneof ...) matched. When nested by including a macro in the macro's defs section and using it, the names to access the sub-macros variables would grow longer by underscore separated concatenation, for example '$ANYLOAD_ANYEXT_A'.

'Upside-down' matches (i.e. roots at the top) and similar

This one requires algorithm changes which I'd prefer not to discuss in this RFC. Assuming the underlying algorithm gains support for this, this is how the syntax would look:
def : GICombineRule<
  (defs root:$root, reg:$A),
  (match [{MIR %1 = G_LOAD %root
               %A = G_SEXT %1 }]),
  (apply [{MIR %A = G_SEXTLOAD %root }])>;

The only unusual thing about this rule is that the root isn't at the bottom. Instead of starting at a use and matching towards defs, we're starting at the def and matching towards uses. This has some potentially useful properties. The combine algorithm has to chose an insertion point for the replacement code and the traditional choice has been the root of the match. Assuming we keep doing the same thing, 'upside-down' matching like this is able to avoid the checks that the load is safe to move, is non-volatile, has one use, etc. that would be necessary if we moved the G_LOAD down to the G_SEXT. Also, assuming we keep the same broadly bottom-up processing order as the existing Combine algorithm this kind of rule also has relatively lower priority than conventional rules because the root is seen later. This can be useful as (algorithm-dependent) the MIR may be more settled by the time it tries to match.

Along the same lines, the syntax can potentially support the root being in the middle of a match. This could be used in a similar way to the upside-down match above to control the insertion point and priority. For example:
def : GICombineRule<
  (defs root:$root, reg:$A, reg:$B, reg:$C),
  (match [{MIR %1(s32) = G_TRUNC %A(s64)
               %2(s32) = G_TRUNC %B(s64)
               %root = G_ADD %1, %2
               %C(s64) = G_SEXT %root }]),
  (emit [{MIR %root = G_ADD %A, %B
              %C = G_SEXT_INREG %root }])>;
Unfortunately, I don't have any concrete examples where this would be useful in comparison to a conventional or upside-down match at the moment. I'm mostly keeping the door open as I can see potential for benefits (mostly w.r.t sinking and hoisting safety around an instr that would be difficult to test for that) given an appropriate rule and also because I'm inclined to say that the tablegen syntax shouldn't be the reason it's not possible. It should be up to the Combine algorithm and and tablegen-erate pass involved in specializing the algorithm for a target.

Multiple roots

This one requires algorithm changes which I'd prefer not to discuss in this RFC. Assuming the underlying algorithm gains support for this, this is how the syntax would look:
def : GICombineRule<
  (defs root:$root1, root:$root2, reg:$A, reg:$B),
  (match [{MIR %root1 = G_ADD %A, %B }],
         [{MIR %root2 = G_SUB %A, %B }]),
  (emit [{MIR %root1, %root2 = BUTTERFLY %A, %B }])>;
This would match a G_ADD and G_SUB with operands in common and combine them into a BUTTERFLY operation. You can think of this as two normal rules, one with %root1 as the root and the other with %root2 as the root.

Grouping and Ordering Rules

Combine rules are ordered and grouped using definitions of the form:
    def trivial_combines : GICombineGroup<[copy_prop]>;
    def combines_for_extload: GICombineGroup<[extending_loads]>; 
    def all_combines : GICombineGroup<[trivial_combines, combines_for_extload]>; 
Essentially, we create a long ordered list of all the combines. Tablegen is free to re-order rules so long as the resulting ruleset always behaves as if it were in this specific order. So for example, the list [sext_trunc_sext, zext_trunc_zext, sext_sext, zext_zexts] is free to re-order to [sext_trunc_sext, sext_sext, zext_trunc_zext, zext_zexts] because the rules involved are mutually exclusive due to the root opcode.

So why not make tablegen figure out the order like ISel does? ISel attempts to figure out an order using an scoring system called Complexity along with a hack (AddedComplexity) to allow the user to provide magic numbers to fix the cases it got it wrong. The simplest way to confuse it is with patterns like (add s32, complexpattern1) and (add s32, complexpattern2). These have the same Complexity score but in truth, each one has a (possibly overlapping) range of complexity depending on what C++ code is inside the complexpattern's and which path through that C++ is dynamically taken. If it matches nothing then the score should be 0 but if it matches a dozen nodes it should be 12 (or possibly higher). We don't know which until we try to match that specific pattern. Correctly figuring out an order in the presence of complexpatterns is impossible. Similarly, it's also possible to confuse it with patterns that differ but overlap and add up to the same complexity due to quirks of the scoring system.

Declaring Combine Passes

Combiner passes are defined by subclassing GICombiner like so:
    def CommonPreLegalizerCombiner: GICombiner<
      "
CommonGenPreLegalizerCombiner", [all_combines]> {
      let DisableRuleOption = "common-prelegalizercombiner-disable-rule";
      let Modifiers = [(disable_rule copy_prop)]
    }

This causes tablegen to create a class named 'CommonPreLegalizerCombiner' which you can use to perform combines. This combiner contains all the combines mentioned in the previous section because it includes the 'all_combines' group. However, it disables the copy_prop rule to prevent it from attempting to match. I'll discuss that a bit more below. It also generates a command-line option for asserts builds only which can be used to further disable rules at run-time which will be useful for tracking down bugs or for testing a rule in isolation. I'm hoping that one day tools like bugpoint will be able to search through the individual rules within a combine pass when searching for a minimal reproducer.

The Modifiers field is intended to allow targets to modify an existing ruleset (particularly a target independent one) with additional target specific quirks. For example, one particular rule might be doing more harm than good and should be disabled. Or maybe only a subset of the things it would normally match are safe in which case an extra predicate should be tested. Being able to make minor edits to the ruleset without taking on the whole maintenance burden of the common code or causing code bloat by duplicating tables would be useful for targets that are generally similar to the targets within LLVM but have minor quirks.

Larger scale changes should take an alternate approach to modifiers. It's expected that even targets that are very different from the rest of the pack still have features in common. Such targets can declare their own combiner to replace the common one but still generally make use of the improvements made by the wider community. This is where GICombinerGroup will start to shine as such a target can declare a combiner like so:
    def MyTargetPreLegalizerCombiner: GICombiner<
      "
MyTargetGenPreLegalizerCombiner",
      [common_extend_optimizations,
       common_extending_loads,
       // common_rsqrt_and_nr_iterations, // This target has a real sqrt operation
       mytarget_special_fma,
       common_fma,
       common_bswap_idioms,
       mytarget_bcd_arithmetic
      ]> {
    }
As LLVM improves on the common_* groups, the target benefits from those improvements automatically. However, it doesn't benefit from new groups being added to common_all_combines because it's no longer using that group so entirely new categories of combines added to it would not appear in the combiner. It would still be nice to find out that a new category has appeared though so that a decision can be made on it. To that end, I'm considering adding:
    let Verify = [(has_all_of_except all_combines, common_rsqrt_and_nr_iterations)];
which would cause a warning if something new appeared in the original group.

Conclusion

There is a lot of work that needs to be done to get all this working and some of it may have to change once it runs into the reality of implementation :-). However, we think that this will prove to be a very convenient and powerful syntax with some potential a variety of tools from profilers, to bug reducers, to correctness checking tools.

There is a patch at https://reviews.llvm.org/D54286 makes a start on it to try out the general feel of the syntax but currently lacks the core feature of generating match and apply code from the MIR. I'm going to be looking into that over the next few weeks.

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

David Jones via llvm-dev
Hi Daniel,

Lots of good stuff in there! I especially like the design for specifying
out-of-line predicates. I have a couple of small comments and one major
one below.


On 09.11.18 02:42, Daniel Sanders via llvm-dev wrote:

> _Passing arbitrary data from match to apply_
> _
> _
> As mentioned earlier, the defs block defines the interface between the
> match and apply steps. This can be used to arrange for arbitrary data to
> be passed from match to apply.
>
> In the current AArch64PreLegalizeCombiner we have a rule that matches a
> G_LOAD and all its uses simultaneously and decides on the best way to
> rewrite it to minimize the sign/zero/any-extend operations. This rule
> passes a struct (PreferredTuple) between the current C++ equivalent for
> the match to the current C++ equivalent to the apply. Converting that
> into this tablegen syntax, we'd write:
>      def extending_load_matchdata : GIDefMatchData<"PreferredTuple">;
>      def extending_load_predicate : GIMatchPredicate<
>           (ins reg:$A, extending_load_matchdata:$B), bool, [{
>        return Helper.matchCombineExtendingLoads(${A}, ${matchinfo});

I assume this was intended to be ${B} instead of ${matchinfo}?

I also think you should have 'ins' and 'outs' separately; after all, a
predicate may have to do a combined check on two matched registers /
operands, and produce one or more values for later re-use.

Once you have separate 'ins' and 'outs', the "bool" in there seems a bit
redundant.


>      }]>;
>      def extending_loads : GICombineRule<
>        (defs root:$root, reg:$A, extending_load_matchdata:$matchinfo),
>        (match [{MIR %root = G_LOAD %A }],
>               (extending_load_predicate root:$A,
>                                        
>   extending_load_matchdata:$matchinfo)),
>        (apply (exec [{ Helper.applyCombineExtendingLoads(${root},
> ${matchinfo}); }],
>                     reg:$root, extending_load_matchdata:$matchinfo)>;
>
> The GIDefMatchData declares a new type of data that can be passed from
> the match to the apply. Tablegen is responsible for arranging for
> appropriate storage during the Combine algorithm. The GIMatchPredicate
> declares a C++ predicate that fills out the PreferredTuple (passed by
> reference) whenever it returns true for a successful match. We could
> have made the predicate return std::pair<bool, PreferredTuple> instead
> but that's less efficient (it would be an sret return on many targets)
> and would require us to define the truthiness (no examples of are in
> this email as I expect it to be a rare thing to need) in order to act as
> a predicate. Normally, you'd feed this into a (create_imm ...) or a
> (create_operand ...) in the apply section. However, in this particular
> case the data being passed determines the entirety of the replacement so
> we escape into arbitrary C++ instead and arrange for the variables to be
> injected appropriately using the 'exec' operator.

Have you considered the possibility of defining custom `create` actions
for the apply, analogous to custom predicates?

In your description, it appears there is a fixed list of builtin actions
such as `create_imm`, `create_operand` (not really described anywhere?),
`exec` and the tentative debug info approach.

It seems valuable to be able to define one's own operator for custom
operations.


> _Macros_
> _
> _
> I simplified the previous example a bit. Rather than only matching a
> G_LOAD, the current rule in AArch64 can match any of G_LOAD, G_SEXTLOAD,
> and G_ZEXTLOAD. We need some means to match one of several alternatives
> as well as collect and re-use common subpatterns. I've yet to look into
> how this would be practically implemented and this section is a bit
> vague as a result but here's the current thinking on how it should look
> and behave:
>      def ANYLOAD : GIMacro<(defs def:$R, use:$S, uint64_t:$IDX),
>                            (match (oneof [{MIR %R = G_LOAD %S}],
>                                          [{MIR %R = G_SEXTLOAD %S}],
>                                          [{MIR %R = G_ZEXTLOAD %S}]):$IDX>;
>      def extending_loads : GICombineRule<
>        (defs root:$root, reg:$A, extending_load_matchdata:$matchinfo,
> ANYLOAD:$ANYLOAD),
>        (match [{MIR %root = ANYLOAD %A }],
>               (extending_load_predicate root:$A,
>                                        
>   extending_load_matchdata:$matchinfo)),
>        (apply (exec [{ Helper.applyCombineExtendingLoads(${root},
> ${matchinfo}); }],
>                     reg:$root, extending_load_matchdata:$matchinfo)>;
> Effectively, we're declaring a fake instruction and import it into this
> rule, possibly renaming it using the argument name ($ANYLOAD in this
> case) to provide a more convenient name in the MIR block or to
> disambiguate multiple instances. Once we've parsed the MIR, we would
> recursively replace any instance of ANYLOAD with code match one of the
> alternatives. The variables in the 'defs' section of the macro would be
> available as $ANYLOAD_R, $ANYLOAD_S, and $ANYLOAD_IDX (we have to use
> '_' instead of a '.' to fit within tablegen's syntax) with $ANYLOAD_IDX
> indicating which alternative the (oneof ...) matched. When nested by
> including a macro in the macro's defs section and using it, the names to
> access the sub-macros variables would grow longer by underscore
> separated concatenation, for example '$ANYLOAD_ANYEXT_A'.

This magic generation of variable names seems really wrong to me. Let's
just use a natural operands interface, like so (and I'm also doing
something else here, more on that later):

   def ANYLOAD : GIMacro<
     (ops def:$R, use:$S, imm:$IDX),
     (defs), // optional extra defs that can be used in predicates
     (oneof (match (G_LOAD $R, $S), 0:$IDX),
            (match (G_SEXTLOAD $R, $S), 1:$IDX),
            (match (G_ZEXTLOAD $R, $S), 2:$IDX))>;
   def extending_loads : GICombineRule<
     (defs def:$root, reg:$A, extending_load_matchdata:$matchinfo,
           imm:$idx),
     (match (ANYLOAD $root, $A, $idx),
            (extending_load_predicate $A, $matchinfo)),
     (exec [{ Helper.applyCombineExtendingLoads(
                  ${root}, ${matchinfo}, ${idx}); }])>;

There'd be a natural and simply way to shift predicates and variables in
and out of macros: the following would be equivalent:

   def ANYLOAD : GIMacro<
     (ops def:$R, extendling_load_matchdata:$matchinfo, imm:$IDX),
     (defs use:$S),
     (match (oneof (match (G_LOAD $R, $S), 0:$IDX),
                   (match (G_SEXTLOAD $R, $S), 1:$IDX),
                   (match (G_ZEXTLOAD $R, $S), 2:$IDX)),
            (extending_load_predicate $S, $matchinfo))>;
   def extending_loads : GICombineRule<
     (defs def:$root, extending_load_matchdata:$matchinfo, imm:$idx),
     (ANYLOAD $root, $A, $matchinfo, $idx),
     (exec [{ Helper.applyCombineExtendingLoads(
                  ${root}, ${matchinfo}, ${idx}); }])>;

... and of course it could all be inlined:

   def extending_loads : GICombineRule<
     (defs def:$root, reg:$A, extending_load_matchdata:$matchinfo,
           imm:$idx),
     (match (oneof (match (G_LOAD $root, $A), 0:$idx),
                   (match (G_SEXTLOAD $root, $A), 1:$idx),
                   (match (G_ZEXTLOAD $root, $A), 2:$idx)),
            (extending_load_predicate $A, $matchinfo)),
     (exec [{ Helper.applyCombineExtendingLoads(
                  ${root}, ${matchinfo}, ${idx}); }])>;

In other words, macros would really just be macros or sub-routines.

Let's talk about using DAGs. I think I understand where the temptation
comes from to describe the MIR using code blocks, but I'm also pretty
sure that it's a mistake to do so.

One reason is that it adds yet another parser, which is more maintenance
burden without buying much.

The more important reason is that DAGs compose better with the rest of
TableGen. Consider combine rules defined in multiclasses, for example.
That is very common all through LLVM. In order to mirror an existing
selection rule in the AMDGPU backend, we'd likely want to have something
like this:

   multiclass Arithmetic_i16_GIPats<Instruction ginst,
                                    Instruction inst> {
     // This is probably wishful thinking -- would be okay if we had to
     // split this into two different rules due to the different types
     // of $dst
     def : GICombineRule<
       (defs root:$dst, operand:$src0, operand:$src1),
       (oneof (ginst i16:$dst, i16:$src0, i16:$src1),
              (match (ginst i16:$tmp, i16:$src0, i16:$src1),
                     (G_ZEXT i32:$dst, $tmp))),
       (inst $dst, $src0, $src1)>;

     // A third rule for zext to 64 bits would go here...
   }

Obviously you can achieve this in your proposal as well using string
concatenation, but that does seem rather backwards.

Something else that falls out rather nicely from using DAGs is that it
gives you a natural way to give a name to an _instruction_, to be able
to pass it to a predicate for extended checks (and as a potentially more
natural way of preserving debug locations!).

I have a more explicit example of this later.


> _'Upside-down' matches (i.e. roots at the top) and similar_
>
> This one requires algorithm changes which I'd prefer not to discuss in
> this RFC. Assuming the underlying algorithm gains support for this, this
> is how the syntax would look:
> def : GICombineRule<
>    (defs root:$root, reg:$A),
>    (match [{MIR %1 = G_LOAD %root
>                 %A = G_SEXT %1 }]),
>    (apply [{MIR %A = G_SEXTLOAD %root }])>;
>
> The only unusual thing about this rule is that the root isn't at the
> bottom. Instead of starting at a use and matching towards defs, we're
> starting at the def and matching towards uses. This has some potentially
> useful properties. The combine algorithm has to chose an insertion point
> for the replacement code and the traditional choice has been the root of
> the match. Assuming we keep doing the same thing, 'upside-down' matching
> like this is able to avoid the checks that the load is safe to move, is
> non-volatile, has one use, etc. that would be necessary if we moved the
> G_LOAD down to the G_SEXT. Also, assuming we keep the same broadly
> bottom-up processing order as the existing Combine algorithm this kind
> of rule also has relatively lower priority than conventional rules
> because the root is seen later. This can be useful as
> (algorithm-dependent) the MIR may be more settled by the time it tries
> to match.
>
> Along the same lines, the syntax can potentially support the root being
> in the middle of a match. This could be used in a similar way to the
> upside-down match above to control the insertion point and priority. For
> example:
> def : GICombineRule<
>    (defs root:$root, reg:$A, reg:$B, reg:$C),
>    (match [{MIR %1(s32) = G_TRUNC %A(s64)
>                 %2(s32) = G_TRUNC %B(s64)
>                 %root = G_ADD %1, %2
>                 %C(s64) = G_SEXT %root }]),
>    (emit [{MIR %root = G_ADD %A, %B
>                %C = G_SEXT_INREG %root }])>;
> Unfortunately, I don't have any concrete examples where this would be
> useful in comparison to a conventional or upside-down match at the
> moment. I'm mostly keeping the door open as I can see potential for
> benefits (mostly w.r.t sinking and hoisting safety around an instr that
> would be difficult to test for that) given an appropriate rule and also
> because I'm inclined to say that the tablegen syntax shouldn't be the
> reason it's not possible. It should be up to the Combine algorithm and
> and tablegen-erate pass involved in specializing the algorithm for a target.

I'm concerned that the concrete syntax proposed here mixes a bunch of
concerns that really should be addressed separately:

1. Defining a mapping between semantically equivalent chunks of code.
2. Controlling insertion points as a simple way to keep side effects
under control in many cases.
3. Controlling additional algorithmic aspects of the combine algorithm
(whether you're matching up or down, say).

Could we keep those separate somehow? E.g., don't distinguish between
'root' and 'reg' or 'operand' in the (defs ...) part of the rule; have
TableGen figure out the roots automatically based on walking the graph
implied by the matching rule, and default to using the sink(s) as roots.

Then add some optional let-able variable in the GICombineRule to define
non-default rules for the combine algorithm.

Controlling the insertion point to address loads like in your example
above could be done quite naturally if the MIR is expressed in a DAG:

   def : GICombineRule<
     (match (G_LOAD $tmp, $ptr):$load,
            (G_SEXT $dst, $tmp)),
     (apply (insertat $load),
            (G_SEXTLOAD $dst, $ptr))>;

As you can see, this discussion also makes me wonder whether the (defs)
are needed at all; maybe we can rely on type inference almost everywhere?


> _Multiple roots_
>
> This one requires algorithm changes which I'd prefer not to discuss in
> this RFC. Assuming the underlying algorithm gains support for this, this
> is how the syntax would look:
> def : GICombineRule<
>    (defs root:$root1, root:$root2, reg:$A, reg:$B),
>    (match [{MIR %root1 = G_ADD %A, %B }],
>           [{MIR %root2 = G_SUB %A, %B }]),
>    (emit [{MIR %root1, %root2 = BUTTERFLY %A, %B }])>;
> This would match a G_ADD and G_SUB with operands in common and combine
> them into a BUTTERFLY operation. You can think of this as two normal
> rules, one with %root1 as the root and the other with %root2 as the root.

Again, it seems to me that it would be preferable if we could just
express this rule as:

   def : GICombineRule<
     (match (G_ADD $dst1, $a, $b),
            (G_SUB $dst2, $a, $b),
            reg:$a, reg:$b),
     (BUTTERFLY $dst1, $dst2, $a, $b)>;

or perhaps:

   def : GICombineRule<
     (match (G_ADD $dst1, reg:$a, $b),
            (G_SUB $dst2, $a, reg:$b)),
     (BUTTERFLY $dst1, $dst2, $a, $b)>;

and similar variations.

TableGen should be able to figure out the multiple roots by walking the
dependency graph that is implied by the knowledge about which operands
of G_ADD and G_SUB are defs and uses, respectively.

To sum up, I really like the proposal overall, except I think it would
be a **major** improvement to use DAGs for expressing the MIR, and sure,
there's the odd detail here and there like about GIMacro.

Cheers,
Nicolai


> _Grouping and Ordering Rules_
>
> Combine rules are ordered and grouped using definitions of the form:
>      def trivial_combines : GICombineGroup<[copy_prop]>;
>      def combines_for_extload: GICombineGroup<[extending_loads]>;
>      def all_combines : GICombineGroup<[trivial_combines,
> combines_for_extload]>;
> Essentially, we create a long ordered list of all the combines. Tablegen
> is free to re-order rules so long as the resulting ruleset always
> behaves as if it were in this specific order. So for example, the list
> [sext_trunc_sext, zext_trunc_zext, sext_sext, zext_zexts] is free to
> re-order to [sext_trunc_sext, sext_sext, zext_trunc_zext, zext_zexts]
> because the rules involved are mutually exclusive due to the root opcode.
>
> So why not make tablegen figure out the order like ISel does? ISel
> attempts to figure out an order using an scoring system called
> Complexity along with a hack (AddedComplexity) to allow the user to
> provide magic numbers to fix the cases it got it wrong. The simplest way
> to confuse it is with patterns like (add s32, complexpattern1) and (add
> s32, complexpattern2). These have the same Complexity score but in
> truth, each one has a (possibly overlapping) range of complexity
> depending on what C++ code is inside the complexpattern's and which path
> through that C++ is dynamically taken. If it matches nothing then the
> score should be 0 but if it matches a dozen nodes it should be 12 (or
> possibly higher). We don't know which until we try to match that
> specific pattern. Correctly figuring out an order in the presence of
> complexpatterns is impossible. Similarly, it's also possible to confuse
> it with patterns that differ but overlap and add up to the same
> complexity due to quirks of the scoring system.
>
> _Declaring Combine Passes_
>
> Combiner passes are defined by subclassing GICombiner like so:
>      def CommonPreLegalizerCombiner: GICombiner<
>        "CommonGenPreLegalizerCombiner", [all_combines]> {
>        let DisableRuleOption = "common-prelegalizercombiner-disable-rule";
>        let Modifiers = [(disable_rule copy_prop)]
>      }
>
> This causes tablegen to create a class named
> 'CommonPreLegalizerCombiner' which you can use to perform combines. This
> combiner contains all the combines mentioned in the previous section
> because it includes the 'all_combines' group. However, it disables the
> copy_prop rule to prevent it from attempting to match. I'll discuss that
> a bit more below. It also generates a command-line option for asserts
> builds only which can be used to further disable rules at run-time which
> will be useful for tracking down bugs or for testing a rule in
> isolation. I'm hoping that one day tools like bugpoint will be able to
> search through the individual rules within a combine pass when searching
> for a minimal reproducer.
>
> The Modifiers field is intended to allow targets to modify an existing
> ruleset (particularly a target independent one) with additional target
> specific quirks. For example, one particular rule might be doing more
> harm than good and should be disabled. Or maybe only a subset of the
> things it would normally match are safe in which case an extra predicate
> should be tested. Being able to make minor edits to the ruleset without
> taking on the whole maintenance burden of the common code or causing
> code bloat by duplicating tables would be useful for targets that are
> generally similar to the targets within LLVM but have minor quirks.
>
> Larger scale changes should take an alternate approach to modifiers.
> It's expected that even targets that are very different from the rest of
> the pack still have features in common. Such targets can declare their
> own combiner to replace the common one but still generally make use of
> the improvements made by the wider community. This is where
> GICombinerGroup will start to shine as such a target can declare a
> combiner like so:
>      def MyTargetPreLegalizerCombiner: GICombiner<
>        "MyTargetGenPreLegalizerCombiner",
>        [common_extend_optimizations,
>         common_extending_loads,
>         // common_rsqrt_and_nr_iterations, // This target has a real
> sqrt operation
>         mytarget_special_fma,
>         common_fma,
>         common_bswap_idioms,
>         mytarget_bcd_arithmetic
>        ]> {
>      }
> As LLVM improves on the common_* groups, the target benefits from those
> improvements automatically. However, it doesn't benefit from new groups
> being added to common_all_combines because it's no longer using that
> group so entirely new categories of combines added to it would not
> appear in the combiner. It would still be nice to find out that a new
> category has appeared though so that a decision can be made on it. To
> that end, I'm considering adding:
>      let Verify = [(has_all_of_except all_combines,
> common_rsqrt_and_nr_iterations)];
> which would cause a warning if something new appeared in the original group.
>
> _Conclusion_
>
> There is a lot of work that needs to be done to get all this working and
> some of it may have to change once it runs into the reality of
> implementation :-). However, we think that this will prove to be a very
> convenient and powerful syntax with some potential a variety of tools
> from profilers, to bug reducers, to correctness checking tools.
>
> There is a patch at https://reviews.llvm.org/D54286 makes a start on it
> to try out the general feel of the syntax but currently lacks the core
> feature of generating match and apply code from the MIR. I'm going to be
> looking into that over the next few weeks.
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

David Jones via llvm-dev
In reply to this post by David Jones via llvm-dev
Daniel Sanders via llvm-dev <[hidden email]> writes:

> I've been working on the GlobalISel combiner recently and I'd like to
> share the plan for how Combine Rules will be defined in GlobalISel and
> solicit feedback on it.

This is really great stuff!  I agree with pretty much everything Nicolai
said, particularly the use of DAGs.  That's seems much more natually
TableGen-y to me.  Specific comments are below.

But before that, I've been long pained that we have so much duplicated
code in instcombine and dagcombine.  I know this is way beyond the scope
of this work but do you think the basic concepts could be applied to
produce TableGen-generated instcombine passes?  It would be nice to
re-use many of the rules, for example, except they'd match LLVM IR
rather than MIR.  As you go about implementation, maybe keep this idea
in mind?

> Here's a simple example that eliminates a redundant G_TRUNC.
>
> def : GICombineRule<(defs root:$D, operand:$S),
> (match [{MIR %1(s32) = G_ZEXT %S(s8)
> %D(s16) = G_TRUNC %1(s32) }]),
> (apply [{MIR %D(s16) = G_ZEXT %S(s8) }])>;

The use of '%' vs. '$' here really threw me for a loop.  I would have
expected '$D' and '$S' everywhere.  What's the significance of '%'
vs. '$'?  I know MIR uses '%' for names but must that be the case in
these match blocks?

Of course if we go with Nicolai's use of DAGs this issue seems to go
away.

> * defs declares the interface for the rule. The definitions in the
>   defs section are the glue between the Combine algorithm and the
>   rule, as well as between the match and apply sections. In this case
>   we only define the root of the match and that we have a register
>   variable called $S.

My understanding is that "root" means the sink here, but later in the
"upside-down" example, "root" means the source.  It seems to me the use
of "root" here is a misnomer/confusing.  I liked Nicolai's ideas about
specifying the insert point.  Why do users need to know about "root" at
all?  Is there some other special meaning attached to it?

                               -David
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

David Jones via llvm-dev
Hi Daniel,

Disclaimer: Haven't read the proposal yet.

>  TL;DR: We're planning to define GlobalISel Combine Rules using MIR syntax with a few bits glued on to interface with the algorithm and escape into C++ when we need to. Eventually, ISel rules may follow suit.

I would rather avoid adding a dependency on yet another tablegen
backend to the project unless we are confident it is here to stay.
At a high level, I am not fan of the proposal because it will box us
in a stricter framework that pure C++ that may impede our ability to
prototype different approaches quickly. Longer term, I would like to
have a common framework to IR and MIR to describe combines and I am
afraid that MIR is not the right abstraction for that.

The bottom line is I believe there are too many unknown to go down
that road just yet, but I am not the one doing the work, so your call
:D.

Just my 2c :P.

Cheers,
-Quentin

Le ven. 9 nov. 2018 à 08:36, David Greene via llvm-dev
<[hidden email]> a écrit :

>
> Daniel Sanders via llvm-dev <[hidden email]> writes:
>
> > I've been working on the GlobalISel combiner recently and I'd like to
> > share the plan for how Combine Rules will be defined in GlobalISel and
> > solicit feedback on it.
>
> This is really great stuff!  I agree with pretty much everything Nicolai
> said, particularly the use of DAGs.  That's seems much more natually
> TableGen-y to me.  Specific comments are below.
>
> But before that, I've been long pained that we have so much duplicated
> code in instcombine and dagcombine.  I know this is way beyond the scope
> of this work but do you think the basic concepts could be applied to
> produce TableGen-generated instcombine passes?  It would be nice to
> re-use many of the rules, for example, except they'd match LLVM IR
> rather than MIR.  As you go about implementation, maybe keep this idea
> in mind?
>
> > Here's a simple example that eliminates a redundant G_TRUNC.
> >
> > def : GICombineRule<(defs root:$D, operand:$S),
> > (match [{MIR %1(s32) = G_ZEXT %S(s8)
> > %D(s16) = G_TRUNC %1(s32) }]),
> > (apply [{MIR %D(s16) = G_ZEXT %S(s8) }])>;
>
> The use of '%' vs. '$' here really threw me for a loop.  I would have
> expected '$D' and '$S' everywhere.  What's the significance of '%'
> vs. '$'?  I know MIR uses '%' for names but must that be the case in
> these match blocks?
>
> Of course if we go with Nicolai's use of DAGs this issue seems to go
> away.
>
> > * defs declares the interface for the rule. The definitions in the
> >   defs section are the glue between the Combine algorithm and the
> >   rule, as well as between the match and apply sections. In this case
> >   we only define the root of the match and that we have a register
> >   variable called $S.
>
> My understanding is that "root" means the sink here, but later in the
> "upside-down" example, "root" means the source.  It seems to me the use
> of "root" here is a misnomer/confusing.  I liked Nicolai's ideas about
> specifying the insert point.  Why do users need to know about "root" at
> all?  Is there some other special meaning attached to it?
>
>                                -David
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

David Jones via llvm-dev
Hi Quentin,

On 09.11.18 18:16, Quentin Colombet via llvm-dev wrote:

> Disclaimer: Haven't read the proposal yet.
>
>>   TL;DR: We're planning to define GlobalISel Combine Rules using MIR syntax with a few bits glued on to interface with the algorithm and escape into C++ when we need to. Eventually, ISel rules may follow suit.
>
> I would rather avoid adding a dependency on yet another tablegen
> backend to the project unless we are confident it is here to stay.
> At a high level, I am not fan of the proposal because it will box us
> in a stricter framework that pure C++ that may impede our ability to
> prototype different approaches quickly. Longer term, I would like to
> have a common framework to IR and MIR to describe combines and I am
> afraid that MIR is not the right abstraction for that.

What do you mean by "prototype different approaches quickly"?

It's funny, because I always thought the exact opposite: I felt like
there are constraints in the freedom to explore different approaches
precisely because the combines are not described declaratively, and too
much of how everything fits together is de facto hardcoded in C++.

But maybe we mean different things?

I do agree with you that MIR code blocks seem like the wrong model for
describing combines.

Cheers,
Nicolai


>
> The bottom line is I believe there are too many unknown to go down
> that road just yet, but I am not the one doing the work, so your call
> :D.
>
> Just my 2c :P.
>
> Cheers,
> -Quentin
>
> Le ven. 9 nov. 2018 à 08:36, David Greene via llvm-dev
> <[hidden email]> a écrit :
>>
>> Daniel Sanders via llvm-dev <[hidden email]> writes:
>>
>>> I've been working on the GlobalISel combiner recently and I'd like to
>>> share the plan for how Combine Rules will be defined in GlobalISel and
>>> solicit feedback on it.
>>
>> This is really great stuff!  I agree with pretty much everything Nicolai
>> said, particularly the use of DAGs.  That's seems much more natually
>> TableGen-y to me.  Specific comments are below.
>>
>> But before that, I've been long pained that we have so much duplicated
>> code in instcombine and dagcombine.  I know this is way beyond the scope
>> of this work but do you think the basic concepts could be applied to
>> produce TableGen-generated instcombine passes?  It would be nice to
>> re-use many of the rules, for example, except they'd match LLVM IR
>> rather than MIR.  As you go about implementation, maybe keep this idea
>> in mind?
>>
>>> Here's a simple example that eliminates a redundant G_TRUNC.
>>>
>>> def : GICombineRule<(defs root:$D, operand:$S),
>>> (match [{MIR %1(s32) = G_ZEXT %S(s8)
>>> %D(s16) = G_TRUNC %1(s32) }]),
>>> (apply [{MIR %D(s16) = G_ZEXT %S(s8) }])>;
>>
>> The use of '%' vs. '$' here really threw me for a loop.  I would have
>> expected '$D' and '$S' everywhere.  What's the significance of '%'
>> vs. '$'?  I know MIR uses '%' for names but must that be the case in
>> these match blocks?
>>
>> Of course if we go with Nicolai's use of DAGs this issue seems to go
>> away.
>>
>>> * defs declares the interface for the rule. The definitions in the
>>>    defs section are the glue between the Combine algorithm and the
>>>    rule, as well as between the match and apply sections. In this case
>>>    we only define the root of the match and that we have a register
>>>    variable called $S.
>>
>> My understanding is that "root" means the sink here, but later in the
>> "upside-down" example, "root" means the source.  It seems to me the use
>> of "root" here is a misnomer/confusing.  I liked Nicolai's ideas about
>> specifying the insert point.  Why do users need to know about "root" at
>> all?  Is there some other special meaning attached to it?
>>
>>                                 -David
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

David Jones via llvm-dev
Hi Nicolai,

Le ven. 9 nov. 2018 à 09:24, Nicolai Hähnle <[hidden email]> a écrit :

>
> Hi Quentin,
>
> On 09.11.18 18:16, Quentin Colombet via llvm-dev wrote:
> > Disclaimer: Haven't read the proposal yet.
> >
> >>   TL;DR: We're planning to define GlobalISel Combine Rules using MIR syntax with a few bits glued on to interface with the algorithm and escape into C++ when we need to. Eventually, ISel rules may follow suit.
> >
> > I would rather avoid adding a dependency on yet another tablegen
> > backend to the project unless we are confident it is here to stay.
> > At a high level, I am not fan of the proposal because it will box us
> > in a stricter framework that pure C++ that may impede our ability to
> > prototype different approaches quickly. Longer term, I would like to
> > have a common framework to IR and MIR to describe combines and I am
> > afraid that MIR is not the right abstraction for that.
>
> What do you mean by "prototype different approaches quickly"?

I mean exploring different combine strategies :).
With a model we are stuck with what it was designed to express.
For instance, currently with SDISel there are no way to express
something like paired loads, i.e., patterns like this:
load @a, load @a + 4 => loadpair @a

Because the DAGs needs to be single rooted.
In C++ we do whatever we want (I am not saying it is easy ;).)

>
> It's funny, because I always thought the exact opposite: I felt like
> there are constraints in the freedom to explore different approaches
> precisely because the combines are not described declaratively, and too
> much of how everything fits together is de facto hardcoded in C++.
>
> But maybe we mean different things?

I think we meant the same think and I admit I don't see how C++ could
be more constrained than a declarative description that rely on C++
code to be executed :).
I agree that declarative syntax is easier to express and rewrite though.

>
> I do agree with you that MIR code blocks seem like the wrong model for
> describing combines.
>
> Cheers,
> Nicolai
>
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

David Jones via llvm-dev
On 09.11.18 18:47, Quentin Colombet wrote:

> Le ven. 9 nov. 2018 à 09:24, Nicolai Hähnle <[hidden email]> a écrit :
>>
>> Hi Quentin,
>>
>> On 09.11.18 18:16, Quentin Colombet via llvm-dev wrote:
>>> Disclaimer: Haven't read the proposal yet.
>>>
>>>>    TL;DR: We're planning to define GlobalISel Combine Rules using MIR syntax with a few bits glued on to interface with the algorithm and escape into C++ when we need to. Eventually, ISel rules may follow suit.
>>>
>>> I would rather avoid adding a dependency on yet another tablegen
>>> backend to the project unless we are confident it is here to stay.
>>> At a high level, I am not fan of the proposal because it will box us
>>> in a stricter framework that pure C++ that may impede our ability to
>>> prototype different approaches quickly. Longer term, I would like to
>>> have a common framework to IR and MIR to describe combines and I am
>>> afraid that MIR is not the right abstraction for that.
>>
>> What do you mean by "prototype different approaches quickly"?
>
> I mean exploring different combine strategies :).
> With a model we are stuck with what it was designed to express.
> For instance, currently with SDISel there are no way to express
> something like paired loads, i.e., patterns like this:
> load @a, load @a + 4 => loadpair @a
>
> Because the DAGs needs to be single rooted.
> In C++ we do whatever we want (I am not saying it is easy ;).)

Of course that particular example may not be the best since Daniel is
already talking about multi-root rules ;)


>> It's funny, because I always thought the exact opposite: I felt like
>> there are constraints in the freedom to explore different approaches
>> precisely because the combines are not described declaratively, and too
>> much of how everything fits together is de facto hardcoded in C++.
>>
>> But maybe we mean different things?
>
> I think we meant the same think and I admit I don't see how C++ could
> be more constrained than a declarative description that rely on C++
> code to be executed :).
> I agree that declarative syntax is easier to express and rewrite though.

C++ is less restrictive, but that's precisely why it makes it harder to
do certain experiments :)

I was thinking of experiments along the lines of semi-automatically
combining rewrite rules for something similar to super-optimization.
With a declarative description of rewrite rules, you might be able to
combine them algorithmically to go beyond simple local improvements.

I admit that's a bit speculative, but the point is that if all the rules
are hidden in C++, there's not even a place to start from.

I'll grant you that there are other kinds of experiments that are easier
in C++. The proposal does suggest escape hatches for that...

Cheers,
Nicolai

>
>>
>> I do agree with you that MIR code blocks seem like the wrong model for
>> describing combines.
>>
>> Cheers,
>> Nicolai
>>

--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

David Jones via llvm-dev
In reply to this post by David Jones via llvm-dev
Thanks Nicolai!

On Nov 9, 2018, at 02:55, Nicolai Hähnle <[hidden email]> wrote:

Hi Daniel,

Lots of good stuff in there! I especially like the design for specifying out-of-line predicates. I have a couple of small comments and one major one below.


On 09.11.18 02:42, Daniel Sanders via llvm-dev wrote:
_Passing arbitrary data from match to apply_
_
_
As mentioned earlier, the defs block defines the interface between the match and apply steps. This can be used to arrange for arbitrary data to be passed from match to apply.
In the current AArch64PreLegalizeCombiner we have a rule that matches a G_LOAD and all its uses simultaneously and decides on the best way to rewrite it to minimize the sign/zero/any-extend operations. This rule passes a struct (PreferredTuple) between the current C++ equivalent for the match to the current C++ equivalent to the apply. Converting that into this tablegen syntax, we'd write:
    def extending_load_matchdata : GIDefMatchData<"PreferredTuple">;
    def extending_load_predicate : GIMatchPredicate<
         (ins reg:$A, extending_load_matchdata:$B), bool, [{
      return Helper.matchCombineExtendingLoads(${A}, ${matchinfo});

I assume this was intended to be ${B} instead of ${matchinfo}?

Yes, that's right. There's always at least one of these typos when I type emails about this :-). It looks like I forgot to rename one of them when I moved the code into a predicate definition.

I also think you should have 'ins' and 'outs' separately; after all, a predicate may have to do a combined check on two matched registers / operands, and produce one or more values for later re-use.

Once you have separate 'ins' and 'outs', the "bool" in there seems a bit redundant.

I think there's three kinds values involved in the predicates. The first is the inputs like these values come from other parts of the match and it makes sense that they belong in 'ins'. The second is values that are directly related to the truthiness of the predicate such as bool, unsigned (register number), MachineInstr*, etc.. These are the result of the underlying function and there can only be one of these per predicate. The type can potentially be std::pair or std::tuple to give multiple results provided a suitable conversion to bool is provided. The third is metadata that's just carrying additional information to the apply. These are currently in the 'ins' section and I agree that they don't really make sense there. I can pull them out into a separate 'outs' section. This has a flexibility cost since the user can no longer specify the complete argument order but that's probably a good thing for readability.

Here's an example of a non-bool result:
    def reg_with_shift : GIMatchPredicate<
         (ins reg:$A), (outs uint64_t:$shift), MachineOperandPtr, [{
      // MachineOperand *matchARMShiftedOp2(const MachineOperand &, uint64_t);
      return matchARMShiftedOp2(${A}, ${imm});
    }]>;
    def : GICombineRule<
      (defs root:$D, reg:$S1, reg_with_shift:$S2),
      (match [{MIR %D = G_ADD %S1, %S2 }]),
      (apply [{MIR %D = ADD %S1, %S2, %S2.shift }])>;
(I've just realized there's a gap in the ability to name metadata when used in this way. I've gone with 'S2.shift' for now)

This is equivalent to:
    def reg_with_shift : GIMatchPredicate<
         (ins reg:$A), (outs ptr_to_reg:$reg, imm:$shift), bool, [{
      // bool matchARMShiftedOp2(const MachineOperand &, MachineOperand*&, uint64_t&);
      return matchARMShiftedOp2(${A}, ${reg}, ${imm});
    }]>;
    def : GICombineRule<
      (defs root:$D, reg:$S1, reg_with_shift:$S1),
      (match [{MIR %D = G_ADD %S1, %S2.reg }]),
      (apply [{MIR %D = ADD %S1, %S2, %S2.shift }])>;
but the first definition slightly more efficient for most targets since there's usually a register or two for return values which in this version we're spending solely on a redundant bool and requires fewer argument registers, the second version has to store the MachineOperand* on the stack to pass it by reference. The difference is fairly small when considered on an individual rule but should accumulate on large rulesets or large functions.

    }]>;
    def extending_loads : GICombineRule<
      (defs root:$root, reg:$A, extending_load_matchdata:$matchinfo),
      (match [{MIR %root = G_LOAD %A }],
             (extending_load_predicate root:$A,
                                        extending_load_matchdata:$matchinfo)),
      (apply (exec [{ Helper.applyCombineExtendingLoads(${root}, ${matchinfo}); }],
                   reg:$root, extending_load_matchdata:$matchinfo)>;
The GIDefMatchData declares a new type of data that can be passed from the match to the apply. Tablegen is responsible for arranging for appropriate storage during the Combine algorithm. The GIMatchPredicate declares a C++ predicate that fills out the PreferredTuple (passed by reference) whenever it returns true for a successful match. We could have made the predicate return std::pair<bool, PreferredTuple> instead but that's less efficient (it would be an sret return on many targets) and would require us to define the truthiness (no examples of are in this email as I expect it to be a rare thing to need) in order to act as a predicate. Normally, you'd feed this into a (create_imm ...) or a (create_operand ...) in the apply section. However, in this particular case the data being passed determines the entirety of the replacement so we escape into arbitrary C++ instead and arrange for the variables to be injected appropriately using the 'exec' operator.

Have you considered the possibility of defining custom `create` actions for the apply, analogous to custom predicates?

In your description, it appears there is a fixed list of builtin actions such as `create_imm`, `create_operand` (not really described anywhere?), `exec` and the tentative debug info approach.

It seems valuable to be able to define one's own operator for custom operations.

I didn't go into the mechanical definitions but create_imm is really just a specialization of create_operand and is equivalent to (create_operand [{ MachineOperand::CreateImm( <code> ) }]). The string concatenation needed for that specialization currently doesn't work on the code type but that seems easily fixable. I haven't defined one yet, but I don't see any reason we couldn't define non-operands in a similar way.

_Macros_
_
_
I simplified the previous example a bit. Rather than only matching a G_LOAD, the current rule in AArch64 can match any of G_LOAD, G_SEXTLOAD, and G_ZEXTLOAD. We need some means to match one of several alternatives as well as collect and re-use common subpatterns. I've yet to look into how this would be practically implemented and this section is a bit vague as a result but here's the current thinking on how it should look and behave:
    def ANYLOAD : GIMacro<(defs def:$R, use:$S, uint64_t:$IDX),
                          (match (oneof [{MIR %R = G_LOAD %S}],
                                        [{MIR %R = G_SEXTLOAD %S}],
                                        [{MIR %R = G_ZEXTLOAD %S}]):$IDX>;
    def extending_loads : GICombineRule<
      (defs root:$root, reg:$A, extending_load_matchdata:$matchinfo, ANYLOAD:$ANYLOAD),
      (match [{MIR %root = ANYLOAD %A }],
             (extending_load_predicate root:$A,
                                        extending_load_matchdata:$matchinfo)),
      (apply (exec [{ Helper.applyCombineExtendingLoads(${root}, ${matchinfo}); }],
                   reg:$root, extending_load_matchdata:$matchinfo)>;
Effectively, we're declaring a fake instruction and import it into this rule, possibly renaming it using the argument name ($ANYLOAD in this case) to provide a more convenient name in the MIR block or to disambiguate multiple instances. Once we've parsed the MIR, we would recursively replace any instance of ANYLOAD with code match one of the alternatives. The variables in the 'defs' section of the macro would be available as $ANYLOAD_R, $ANYLOAD_S, and $ANYLOAD_IDX (we have to use '_' instead of a '.' to fit within tablegen's syntax) with $ANYLOAD_IDX indicating which alternative the (oneof ...) matched. When nested by including a macro in the macro's defs section and using it, the names to access the sub-macros variables would grow longer by underscore separated concatenation, for example '$ANYLOAD_ANYEXT_A'.

This magic generation of variable names seems really wrong to me.

I agree. I'd have preferred to use a structure-like style ($ANYLOAD.ANYEXT.A) but while that's fine in a code block, it doesn't fit into tablegen's syntax. Thinking on it a bit more, a better means of getting access to the defs would be:
    def extending_loads : GICombineRule<
      (defs root:$root, reg:$A, extending_load_matchdata:$matchinfo,
            (import_macro ANYLOAD:$ANYLOAD, def:$R, use:$S, uint64_t:$IDX)),
      (match [{MIR %root = ANYLOAD %A }],
             (extending_load_predicate root:$A,
                                        extending_load_matchdata:$matchinfo)),
      (apply (exec [{ Helper.applyCombineExtendingLoads(${root}, ${matchinfo}); }],
                   reg:$root, extending_load_matchdata:$matchinfo)>;
That would enable the rule to specify the names and if we require that of nested macros too we can nest them as needed. We would also want the defs/ops distinction you mention below to allow  macros to have local defs.

Let's just use a natural operands interface, like so (and I'm also doing something else here, more on that later):

 def ANYLOAD : GIMacro<
   (ops def:$R, use:$S, imm:$IDX),
   (defs), // optional extra defs that can be used in predicates
   (oneof (match (G_LOAD $R, $S), 0:$IDX),
          (match (G_SEXTLOAD $R, $S), 1:$IDX),
          (match (G_ZEXTLOAD $R, $S), 2:$IDX))>;
 def extending_loads : GICombineRule<
   (defs def:$root, reg:$A, extending_load_matchdata:$matchinfo,
         imm:$idx),
   (match (ANYLOAD $root, $A, $idx),
          (extending_load_predicate $A, $matchinfo)),
   (exec [{ Helper.applyCombineExtendingLoads(
                ${root}, ${matchinfo}, ${idx}); }])>;

There'd be a natural and simply way to shift predicates and variables in and out of macros: the following would be equivalent:

 def ANYLOAD : GIMacro<
   (ops def:$R, extendling_load_matchdata:$matchinfo, imm:$IDX),
   (defs use:$S),
   (match (oneof (match (G_LOAD $R, $S), 0:$IDX),
                 (match (G_SEXTLOAD $R, $S), 1:$IDX),
                 (match (G_ZEXTLOAD $R, $S), 2:$IDX)),
          (extending_load_predicate $S, $matchinfo))>;
 def extending_loads : GICombineRule<
   (defs def:$root, extending_load_matchdata:$matchinfo, imm:$idx),
   (ANYLOAD $root, $A, $matchinfo, $idx),
   (exec [{ Helper.applyCombineExtendingLoads(
                ${root}, ${matchinfo}, ${idx}); }])>;

... and of course it could all be inlined:

 def extending_loads : GICombineRule<
   (defs def:$root, reg:$A, extending_load_matchdata:$matchinfo,
         imm:$idx),
   (match (oneof (match (G_LOAD $root, $A), 0:$idx),
                 (match (G_SEXTLOAD $root, $A), 1:$idx),
                 (match (G_ZEXTLOAD $root, $A), 2:$idx)),
          (extending_load_predicate $A, $matchinfo)),
   (exec [{ Helper.applyCombineExtendingLoads(
                ${root}, ${matchinfo}, ${idx}); }])>;

In other words, macros would really just be macros or sub-routines.

Let's talk about using DAGs. I think I understand where the temptation comes from to describe the MIR using code blocks, but I'm also pretty sure that it's a mistake to do so.

The main motivation to move away from DAGs came from limitations in DAGs and the general ugliness of the original DAG-based definitions. Once the idea of using MIR came up, we liked the fact that it matched the existing serialization format which makes it easy to turn a specific example from the compiler output into a combine rule and avoid the need for another representation for instructions. We also liked the way it dealt with the difficult cases, MachineMemoryOperands, subregs, etc. and also that it wasn't a new syntax or parser (although it make require modifications to the existing one). It also looked like it be convenient for a tool like Alive (https://www.cs.utah.edu/~regehr/papers/pldi15.pdf) although we didn't really explore that particular thought further.

Ultimately, both representations are DAG's and it comes down to the convenience of the syntax in specifying the rules we want. MIR is very convenient because of it's familiarity and flexibility.

One of the bigger problems with using tablegen's DAG type is that it doesn't deal with multi-result instructions very well. Every time this has been raised on the list w.r.t SelectionDAG the solution has boiled down to 'use C++ instead' and it would be good to fix that so that things like UADDO are representable. You can write a rule that matches something like divmod at the top-level using the 'set' operator:
  (set $D1, $D2, (divmod $A, $B))
but as soon as it's not the top-level, it gets really ugly fast even using pseudo-nodes:
  (set (outs $D1, $D2), (sext (result (G_DIVMOD $A, $B):$T, 0)),
                        (sext (result $T, 1)))
In this example, outs is a pseudo-node needed to distinguish the results from the sinks of the DAG, and result is a pseudo-node that selects one of the multiple results for use by the parent node.  Meanwhile the MIR for is:
  %0, %1 = G_DIVMOD %A, %B
  %D1 = G_SEXT %0
  %D2 = G_SEXT %1
which seems much clearer. It becomes a bigger problem when you start wanting to match target specific instructions as multi-result becomes more common later in the backend, especially post-isel. It's also a problem for upside-down matches, inside-out, and multi-root matches since tablegen's dag type requires a single common root node and can only draw edges in one direction. If you want to have to root in the middle then you have to distort the DAG to bring it to the root and mark up the reversed edges somehow

Moving the result inside the instruction operator helps with these problems but requires a list of dags and all the temporaries need to be named:
  [(G_DIVMOD $T1, $T2, $A, $B),
   (G_SEXT $D1, $T1),
   (G_SEXT $D2, $T2)]
and at that point, we're very nearly at MIR.

Another issue we didn't like with DAG is that you also end up needing a lot of pseudo-nodes like EXTRACT_SUBREG whereas these are natural parts of the MIR syntax. For example:
(set $d, (EXTRACT_SUBREG (MYTGT_ADD $A, $B), sub_lo))
compared to:
%d:sub_lo = MYTGT_ADD %A, %B
and:
(set $d, (REG_SEQUENCE GPR32, (MYTGT_ADD $A, $B), sub_lo, (MYTGT_ADD $C, $D), sub_hi))
compared to:
%d:sub_lo = MYTGT_ADD %A, %B
%d:sub_hi = MYTGT_ADD %C, %D

Along the same lines, I also think that the integrated debug-info is only really practical in MIR. It's possible to shoe-horn DILocation in using a pseudo-node like so:
   (set $d, (add (mul $a, $b):$mul, $c):$add   -> (set (merged_dilocation (muladd $a, $b, $c), $mul, $add)
but there isn't really a good place to modify the DIExpression used by DEBUG_VALUE.

One reason is that it adds yet another parser, which is more maintenance burden without buying much.

The expectation is that we can make use of the existing MIR parser and replace it's MachineInstr/MachineOperand/etc. generation code with pattern-matching generation code. I'm going to be looking into the implementation of that over the next few weeks.

The more important reason is that DAGs compose better with the rest of TableGen. Consider combine rules defined in multiclasses, for example. That is very common all through LLVM. In order to mirror an existing selection rule in the AMDGPU backend, we'd likely want to have something like this:

 multiclass Arithmetic_i16_GIPats<Instruction ginst,
                                  Instruction inst> {
   // This is probably wishful thinking -- would be okay if we had to
   // split this into two different rules due to the different types
   // of $dst
   def : GICombineRule<
     (defs root:$dst, operand:$src0, operand:$src1),
     (oneof (ginst i16:$dst, i16:$src0, i16:$src1),
            (match (ginst i16:$tmp, i16:$src0, i16:$src1),
                   (G_ZEXT i32:$dst, $tmp))),
     (inst $dst, $src0, $src1)>;

   // A third rule for zext to 64 bits would go here...
 }

This is the reason that macros are explicitly imported in the defs section and the imported instances are renameable. The same thing in the MIR syntax (and using the changes to variable naming from above) would be:

 multiclass Arithmetic_i16_GIPats<Instruction ginst,
                                  Instruction inst> {
   def : GICombineRule
<
     # There's a corner case for the double definition of $dst here. I'll come back to that later in the mixing concerns comment
     (defs root:$dst, (import-macro ginst:$ginst, def:$dst, use:$src0, use:$src1)),
     (match [{ %dst:(s16) = ginst %src0:(s16), %src1:(s16) }]), # I've corrected the i16 to s16 here. i16 isn't a type in GlobalISel
     (apply /* I'll come back to this bit */)>;

   // Covers s32 and s64
   def : GICombineRule<
     (defs root:$dst, operand:$src0, operand:$src1, (import-macro ginst:$ginst)),
     (match [{ %0:(s16) = ginst %src0:(s16), %src1:(s16)
               %dst = G_ZEXT %0:(s16) }]
            (isS32OrS64 type:$dst)),
     (apply /* I'll come back to this bit */)>;
 }
We could potentially fold the has G_ZEXT case and no G_ZEXT cases together with a macro but it seems unnecessary complexity in this case. It might be more worth it if there's a similar G_SEXT variant too.

The reason for the "I'll come back to this bit" in the apply section is because you've raised an issue I'd forgotten to deal with. The opcodes in the apply section should also be parameterizable. My first thought on that is that there should be a macro-equivalent for apply such as:
 def : GIExpansion<
   (defs def:$R, use:$S, imm:$IDX),
   (apply (select imm:$IDX,
               (0 [{MIR %R = FOO %S }]),
               (1 [{MIR %R = BAR %S }]),
               (2 [{MIR %0 = BAZ %S
                        %R = BAR %0 }])))>;
or alternatively:
  def : GICombineRule<
    ...,
    ...,
    (apply (create_opcode [{ ${IDX} ? MYTGT_A : MYTGT_B }] ...):$name,
           [{MIR %D = name %s }])>;
The former is more powerful, but the latter is more convenient for trivial cases

Obviously you can achieve this in your proposal as well using string concatenation, but that does seem rather backwards.

Something else that falls out rather nicely from using DAGs is that it gives you a natural way to give a name to an _instruction_, to be able to pass it to a predicate for extended checks (and as a potentially more natural way of preserving debug locations!).

I don't think we need a means to name instructions since instructions can be trivially found from of their results using MachineOperand::getParent().

As noted above, merging debug locations doesn't fit very well into DAG's as you have to use a pseudo node to be able to provide both DILocations on the same result instruction, and there's no real means to include DEBUG_LOC for temporaries that are eliminated and need to be rebased from another value.

I have a more explicit example of this later.


_'Upside-down' matches (i.e. roots at the top) and similar_
This one requires algorithm changes which I'd prefer not to discuss in this RFC. Assuming the underlying algorithm gains support for this, this is how the syntax would look:
def : GICombineRule<
  (defs root:$root, reg:$A),
  (match [{MIR %1 = G_LOAD %root
               %A = G_SEXT %1 }]),
  (apply [{MIR %A = G_SEXTLOAD %root }])>;
The only unusual thing about this rule is that the root isn't at the bottom. Instead of starting at a use and matching towards defs, we're starting at the def and matching towards uses. This has some potentially useful properties. The combine algorithm has to chose an insertion point for the replacement code and the traditional choice has been the root of the match. Assuming we keep doing the same thing, 'upside-down' matching like this is able to avoid the checks that the load is safe to move, is non-volatile, has one use, etc. that would be necessary if we moved the G_LOAD down to the G_SEXT. Also, assuming we keep the same broadly bottom-up processing order as the existing Combine algorithm this kind of rule also has relatively lower priority than conventional rules because the root is seen later. This can be useful as (algorithm-dependent) the MIR may be more settled by the time it tries to match.
Along the same lines, the syntax can potentially support the root being in the middle of a match. This could be used in a similar way to the upside-down match above to control the insertion point and priority. For example:
def : GICombineRule<
  (defs root:$root, reg:$A, reg:$B, reg:$C),
  (match [{MIR %1(s32) = G_TRUNC %A(s64)
               %2(s32) = G_TRUNC %B(s64)
               %root = G_ADD %1, %2
               %C(s64) = G_SEXT %root }]),
  (emit [{MIR %root = G_ADD %A, %B
              %C = G_SEXT_INREG %root }])>;
Unfortunately, I don't have any concrete examples where this would be useful in comparison to a conventional or upside-down match at the moment. I'm mostly keeping the door open as I can see potential for benefits (mostly w.r.t sinking and hoisting safety around an instr that would be difficult to test for that) given an appropriate rule and also because I'm inclined to say that the tablegen syntax shouldn't be the reason it's not possible. It should be up to the Combine algorithm and and tablegen-erate pass involved in specializing the algorithm for a target.

I'm concerned that the concrete syntax proposed here mixes a bunch of concerns that really should be addressed separately:

1. Defining a mapping between semantically equivalent chunks of code.
2. Controlling insertion points as a simple way to keep side effects under control in many cases.
3. Controlling additional algorithmic aspects of the combine algorithm (whether you're matching up or down, say).

I think there may be a misunderstanding with #3. There's no real concept of matching up or down in the rules, there's only the starting point for the match and from there whether it's following the uses or defs isn't something the rule needs to be concerned about. The algorithm needs to be aware of it (and historically hasn't been, DAGCombine broadly runs bottom up and expects to match towards defs and can fail to re-schedule nodes for re-visit in many common cases) but the algorithm is expected to figure that out for itself.

'root' definitions are the only place that #2 and #3 come into the proposed syntax. At the moment, the only requirement the syntax imposes on the algorithm is that it has a concept of a current MachineOperand Def* (and by extension instruction) that will be used as the starting point for the match and that it can choose a valid insertion point for each instruction produced. The insertion point is the same position in the current algorithm but that's algorithm dependent and doesn't have to be the case in the long run. There may even be more than one for rules that emit multiple instructions in the apply section. There's currently no means to specify an insertion point, it's left up to the algorithm to find a place for each instruction.  That said, I'd be surprised if there was an algorithm where the current MachineOperand isn't also at least one of the insertion points. 

* The current upstream code is a bit misleading about that and makes it seem that it's the MachineInstr but I made that mistake before on ISel so I'm intending to change that.

Could we keep those separate somehow? E.g., don't distinguish between 'root' and 'reg' or 'operand' in the (defs ...) part of the rule; have TableGen figure out the roots automatically based on walking the graph implied by the matching rule, and default to using the sink(s) as roots.

operand isn't really connected to this, it's just a wildcard for an MachineOperand that doesn't care whether it's a reg, imm, MCExpr, or something else. 

I'm inclined to agree that it would be nice to get 'root' out of the definition but I don't really see a good way to do it. The sinks aren't always the right choice and are a particularly bad choice on combines like the extending loads since that's trying to match all uses simultaneously and those uses could have any opcode.

Then add some optional let-able variable in the GICombineRule to define non-default rules for the combine algorithm.

Controlling the insertion point to address loads like in your example above could be done quite naturally if the MIR is expressed in a DAG:

 def : GICombineRule<
   (match (G_LOAD $tmp, $ptr):$load,
          (G_SEXT $dst, $tmp)),
   (apply (insertat $load),
          (G_SEXTLOAD $dst, $ptr))>;

This isn't quite the same thing as the extending loads combine in the example. In that combine, it's matching the load and all uses of its result and chosing a replacement based on the global usage of that def across the whole function, rewriting conflicting sext/zext/aext/trunc's to be as cheap as possible (a lot of the detail of that is inside the function but the code is upstream if you want to see the details). This rule is matching a single use and relying on CSE to de-dupe the N G_SEXTLOAD's that come out of it. To illustrate the difference, consider:
%0(s8) = G_LOAD %p
%d1(s16) = G_SEXT %0
%d2(s32) = G_SEXT %0
If I apply the above rule as much as possible to this (twice) with the roots at the sink, I'll get:
%d1(s16) = G_SEXTLOAD %p
%d2(s32) = G_SEXTLOAD %p
whereas if I apply the extending loads combine to it (once), I'll get:
%d2(s32) = G_SEXTLOAD %p
%d1(s16) = G_TRUNC %d2
which will normally be cheaper to execute. In combination with other rules and CSE, the first version could eventually turn into the second version but it requires more memory churn and processing time to get there.

One of the things I'm planning to look into in the algorithm is making it smarter about combines involving multiple-uses in the intermediate operands. DAGCombine is harmfully greedy in this area and will often duplicate expensive operations if one use is combinable but another isn't. This has led to the addition of hasOneUse() checks which solves that but takes things to the opposite extreme. I think it should be possible to find an algorithm that makes an intelligent decision for defs with multiple uses.

As you can see, this discussion also makes me wonder whether the (defs) are needed at all; maybe we can rely on type inference almost everywhere?


_Multiple roots_
This one requires algorithm changes which I'd prefer not to discuss in this RFC. Assuming the underlying algorithm gains support for this, this is how the syntax would look:
def : GICombineRule<
  (defs root:$root1, root:$root2, reg:$A, reg:$B),
  (match [{MIR %root1 = G_ADD %A, %B }],
         [{MIR %root2 = G_SUB %A, %B }]),
  (emit [{MIR %root1, %root2 = BUTTERFLY %A, %B }])>;
This would match a G_ADD and G_SUB with operands in common and combine them into a BUTTERFLY operation. You can think of this as two normal rules, one with %root1 as the root and the other with %root2 as the root.

Again, it seems to me that it would be preferable if we could just express this rule as:

 def : GICombineRule<
   (match (G_ADD $dst1, $a, $b),
          (G_SUB $dst2, $a, $b),
          reg:$a, reg:$b),
   (BUTTERFLY $dst1, $dst2, $a, $b)>;

or perhaps:

 def : GICombineRule<
   (match (G_ADD $dst1, reg:$a, $b),
          (G_SUB $dst2, $a, reg:$b)),
   (BUTTERFLY $dst1, $dst2, $a, $b)>;

and similar variations.

Both of these would be ok when inferring the roots to be the sinks.

TableGen should be able to figure out the multiple roots by walking the dependency graph that is implied by the knowledge about which operands of G_ADD and G_SUB are defs and uses, respectively.

To sum up, I really like the proposal overall, except I think it would be a **major** improvement to use DAGs for expressing the MIR, and sure, there's the odd detail here and there like about GIMacro.

Cheers,
Nicolai

Thanks for all your comments!

_Grouping and Ordering Rules_
Combine rules are ordered and grouped using definitions of the form:
    def trivial_combines : GICombineGroup<[copy_prop]>;
    def combines_for_extload: GICombineGroup<[extending_loads]>;
    def all_combines : GICombineGroup<[trivial_combines, combines_for_extload]>;
Essentially, we create a long ordered list of all the combines. Tablegen is free to re-order rules so long as the resulting ruleset always behaves as if it were in this specific order. So for example, the list [sext_trunc_sext, zext_trunc_zext, sext_sext, zext_zexts] is free to re-order to [sext_trunc_sext, sext_sext, zext_trunc_zext, zext_zexts] because the rules involved are mutually exclusive due to the root opcode.
So why not make tablegen figure out the order like ISel does? ISel attempts to figure out an order using an scoring system called Complexity along with a hack (AddedComplexity) to allow the user to provide magic numbers to fix the cases it got it wrong. The simplest way to confuse it is with patterns like (add s32, complexpattern1) and (add s32, complexpattern2). These have the same Complexity score but in truth, each one has a (possibly overlapping) range of complexity depending on what C++ code is inside the complexpattern's and which path through that C++ is dynamically taken. If it matches nothing then the score should be 0 but if it matches a dozen nodes it should be 12 (or possibly higher). We don't know which until we try to match that specific pattern. Correctly figuring out an order in the presence of complexpatterns is impossible. Similarly, it's also possible to confuse it with patterns that differ but overlap and add up to the same complexity due to quirks of the scoring system.
_Declaring Combine Passes_
Combiner passes are defined by subclassing GICombiner like so:
    def CommonPreLegalizerCombiner: GICombiner<
      "CommonGenPreLegalizerCombiner", [all_combines]> {
      let DisableRuleOption = "common-prelegalizercombiner-disable-rule";
      let Modifiers = [(disable_rule copy_prop)]
    }
This causes tablegen to create a class named 'CommonPreLegalizerCombiner' which you can use to perform combines. This combiner contains all the combines mentioned in the previous section because it includes the 'all_combines' group. However, it disables the copy_prop rule to prevent it from attempting to match. I'll discuss that a bit more below. It also generates a command-line option for asserts builds only which can be used to further disable rules at run-time which will be useful for tracking down bugs or for testing a rule in isolation. I'm hoping that one day tools like bugpoint will be able to search through the individual rules within a combine pass when searching for a minimal reproducer.
The Modifiers field is intended to allow targets to modify an existing ruleset (particularly a target independent one) with additional target specific quirks. For example, one particular rule might be doing more harm than good and should be disabled. Or maybe only a subset of the things it would normally match are safe in which case an extra predicate should be tested. Being able to make minor edits to the ruleset without taking on the whole maintenance burden of the common code or causing code bloat by duplicating tables would be useful for targets that are generally similar to the targets within LLVM but have minor quirks.
Larger scale changes should take an alternate approach to modifiers. It's expected that even targets that are very different from the rest of the pack still have features in common. Such targets can declare their own combiner to replace the common one but still generally make use of the improvements made by the wider community. This is where GICombinerGroup will start to shine as such a target can declare a combiner like so:
    def MyTargetPreLegalizerCombiner: GICombiner<
      "MyTargetGenPreLegalizerCombiner",
      [common_extend_optimizations,
       common_extending_loads,
       // common_rsqrt_and_nr_iterations, // This target has a real sqrt operation
       mytarget_special_fma,
       common_fma,
       common_bswap_idioms,
       mytarget_bcd_arithmetic
      ]> {
    }
As LLVM improves on the common_* groups, the target benefits from those improvements automatically. However, it doesn't benefit from new groups being added to common_all_combines because it's no longer using that group so entirely new categories of combines added to it would not appear in the combiner. It would still be nice to find out that a new category has appeared though so that a decision can be made on it. To that end, I'm considering adding:
    let Verify = [(has_all_of_except all_combines, common_rsqrt_and_nr_iterations)];
which would cause a warning if something new appeared in the original group.
_Conclusion_
There is a lot of work that needs to be done to get all this working and some of it may have to change once it runs into the reality of implementation :-). However, we think that this will prove to be a very convenient and powerful syntax with some potential a variety of tools from profilers, to bug reducers, to correctness checking tools.
There is a patch at https://reviews.llvm.org/D54286 makes a start on it to try out the general feel of the syntax but currently lacks the core feature of generating match and apply code from the MIR. I'm going to be looking into that over the next few weeks.
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

David Jones via llvm-dev
In reply to this post by David Jones via llvm-dev
Thanks David!

> On Nov 9, 2018, at 08:36, David Greene <[hidden email]> wrote:
>
> Daniel Sanders via llvm-dev <[hidden email]> writes:
>
>> I've been working on the GlobalISel combiner recently and I'd like to
>> share the plan for how Combine Rules will be defined in GlobalISel and
>> solicit feedback on it.
>
> This is really great stuff!  I agree with pretty much everything Nicolai
> said, particularly the use of DAGs.  That's seems much more natually
> TableGen-y to me.  Specific comments are below.
>
> But before that, I've been long pained that we have so much duplicated
> code in instcombine and dagcombine.  I know this is way beyond the scope
> of this work but do you think the basic concepts could be applied to
> produce TableGen-generated instcombine passes?  It would be nice to
> re-use many of the rules, for example, except they'd match LLVM IR
> rather than MIR.  As you go about implementation, maybe keep this idea
> in mind?

That's an interesting idea. Certainly tablegenerating InstCombine ought to be possible and sharing code sounds like it ought to be doable. MIR and IR are pretty similar especially after IRTranslator (which is a direct translation) through to the Legalizer (which is the first point target instructions can until targets make custom passes). From the Legalizer to ISel, there's still likely to be a fair amount of overlap between the two as a lot of the G_* opcodes directly correspond to LLVM-IR instructions. The tricky bit will be the escape hatches into C++ would need to either have Instruction/MachineInstr versions or would need to accept both.

>> Here's a simple example that eliminates a redundant G_TRUNC.
>>
>> def : GICombineRule<(defs root:$D, operand:$S),
>> (match [{MIR %1(s32) = G_ZEXT %S(s8)
>> %D(s16) = G_TRUNC %1(s32) }]),
>> (apply [{MIR %D(s16) = G_ZEXT %S(s8) }])>;
>
> The use of '%' vs. '$' here really threw me for a loop.  I would have
> expected '$D' and '$S' everywhere.  What's the significance of '%'
> vs. '$'?  I know MIR uses '%' for names but must that be the case in
> these match blocks?

In MIR, '%' and '$' have a semantic difference when used on operands.
'%foo' is a virtual register named foo but '$foo' is the physical register foo.

The main reason I didn't pick something distinct from either (e.g. '${foo}') is that I'd like to minimize the need to modify the MIR parser to support pattern-specific syntax.

> Of course if we go with Nicolai's use of DAGs this issue seems to go
> away.
>
>> * defs declares the interface for the rule. The definitions in the
>>  defs section are the glue between the Combine algorithm and the
>>  rule, as well as between the match and apply sections. In this case
>>  we only define the root of the match and that we have a register
>>  variable called $S.
>
> My understanding is that "root" means the sink here, but later in the
> "upside-down" example, "root" means the source.  It seems to me the use
> of "root" here is a misnomer/confusing.  I liked Nicolai's ideas about
> specifying the insert point.  Why do users need to know about "root" at
> all?  Is there some other special meaning attached to it?
>
>                               -David

It doesn't correspond with any property of the DAG being matched. It's the entry point that the combine algorithm uses to begin an attempt to match the rule. In DAGCombine terms, it's the first node checked for the rule inside the call to DAGCombine::visit(SDNode *).


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

David Jones via llvm-dev
In reply to this post by David Jones via llvm-dev


On Nov 9, 2018, at 16:18, Daniel Sanders <[hidden email]> wrote:

Thanks Nicolai!

On Nov 9, 2018, at 02:55, Nicolai Hähnle <[hidden email]> wrote:

Hi Daniel,

Lots of good stuff in there! I especially like the design for specifying out-of-line predicates. I have a couple of small comments and one major one below.

Apparently I went over the size limit with my reply to your email. Sorry about that :-). It should arrive once the moderators approve it, if I don't see it on the list by Monday I'll chop it into a couple pieces and re-send.

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

David Jones via llvm-dev
In reply to this post by David Jones via llvm-dev
Thanks Quentin!

On Nov 9, 2018, at 09:47, Quentin Colombet via llvm-dev <[hidden email]> wrote:

Hi Nicolai,

Le ven. 9 nov. 2018 à 09:24, Nicolai Hähnle <[hidden email]> a écrit :

Hi Quentin,

On 09.11.18 18:16, Quentin Colombet via llvm-dev wrote:
Disclaimer: Haven't read the proposal yet.

 TL;DR: We're planning to define GlobalISel Combine Rules using MIR syntax with a few bits glued on to interface with the algorithm and escape into C++ when we need to. Eventually, ISel rules may follow suit.

I would rather avoid adding a dependency on yet another tablegen
backend to the project unless we are confident it is here to stay.
At a high level, I am not fan of the proposal because it will box us
in a stricter framework that pure C++ that may impede our ability to
prototype different approaches quickly. Longer term, I would like to
have a common framework to IR and MIR to describe combines and I am
afraid that MIR is not the right abstraction for that.

What do you mean by "prototype different approaches quickly"?

I mean exploring different combine strategies :).

That's actually my motivation for making it (mostly) declarative now rather than later :-). It's also easier to do it while there's only two rules rather than letting it build up and having to migrate it to something else.

I have some plans for the Combine algorithm to make it more aware of the impact of its changes and able to choose not to make a change if it's going to be harmful to the overall performance. Assuming I can make the plan work, there's a few nice benefits that come along with it. In particular I want to be able to pull the apply step away from the match step and look into delaying the apply until it's seen enough of the MIR to make an informed decision about the cost/benefit of the apply. While I'm looking into this, others will start building up the combine library using the current algorithm. 

With a model we are stuck with what it was designed to express.
For instance, currently with SDISel there are no way to express
something like paired loads, i.e., patterns like this:
load @a, load @a + 4 => loadpair @a

Because the DAGs needs to be single rooted.
In C++ we do whatever we want (I am not saying it is easy ;).)

This particular case is was one of the ones I was thinking of when I was thinking about multi-root matches. The same idea that drives the better decision making should also enable matching of things like ARM's ldm instruction. I think it should also be able to match non-overlapping operations too.

It's funny, because I always thought the exact opposite: I felt like
there are constraints in the freedom to explore different approaches
precisely because the combines are not described declaratively, and too
much of how everything fits together is de facto hardcoded in C++.

But maybe we mean different things?

I think we meant the same think and I admit I don't see how C++ could
be more constrained than a declarative description that rely on C++
code to be executed :).
I agree that declarative syntax is easier to express and rewrite though.


I do agree with you that MIR code blocks seem like the wrong model for
describing combines.

Cheers,
Nicolai

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

David Jones via llvm-dev


Le ven. 9 nov. 2018 à 18:47, Daniel Sanders <[hidden email]> a écrit :
Thanks Quentin!

On Nov 9, 2018, at 09:47, Quentin Colombet via llvm-dev <[hidden email]> wrote:


What do you mean by "prototype different approaches quickly"?

I mean exploring different combine strategies :).

That's actually my motivation for making it (mostly) declarative now rather than later :-). It's also easier to do it while there's only two rules rather than letting it build up and having to migrate it to something else.

As soon as we believe we won’t have to rewrite the rules (or anything to be fair) we already narrowing the options we can try.

I am not saying this is the wrong approach, but we certainly have not pushed it far enough for me to be confident that pushing a whole table gen backend behind the mir rules is worth it. If you were scripting around that and thus being consciously okay to just drop it if it doesn’t work that would be a different discussion :). 

The more we invest into an approach the more difficult it is to move away from it even if that would be the right thing to do (e.g., SDISel).

Anyway, at the end of the day, you’re the ones doing the work.


I have some plans for the Combine algorithm to make it more aware of the impact of its changes and able to choose not to make a change if it's going to be harmful to the overall performance. Assuming I can make the plan work, there's a few nice benefits that come along with it. In particular I want to be able to pull the apply step away from the match step and look into delaying the apply until it's seen enough of the MIR to make an informed decision about the cost/benefit of the apply. While I'm looking into this, others will start building up the combine library using the current algorithm. 

With a model we are stuck with what it was designed to express.
For instance, currently with SDISel there are no way to express
something like paired loads, i.e., patterns like this:
load @a, load @a + 4 => loadpair @a

Because the DAGs needs to be single rooted.
In C++ we do whatever we want (I am not saying it is easy ;).)

This particular case is was one of the ones I was thinking of when I was thinking about multi-root matches. The same idea that drives the better decision making should also enable matching of things like ARM's ldm instruction. I think it should also be able to match non-overlapping operations too.

It's funny, because I always thought the exact opposite: I felt like
there are constraints in the freedom to explore different approaches
precisely because the combines are not described declaratively, and too
much of how everything fits together is de facto hardcoded in C++.

But maybe we mean different things?

I think we meant the same think and I admit I don't see how C++ could
be more constrained than a declarative description that rely on C++
code to be executed :).
I agree that declarative syntax is easier to express and rewrite though.


I do agree with you that MIR code blocks seem like the wrong model for
describing combines.

Cheers,
Nicolai

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

David Jones via llvm-dev
In reply to this post by David Jones via llvm-dev
Thank you for the detailed reply! There's a lot to digest :)  Let me try
to address most of it.


[snip]

>> I also think you should have 'ins' and 'outs' separately; after all, a
>> predicate may have to do a combined check on two matched registers /
>> operands, and produce one or more values for later re-use.
>>
>> Once you have separate 'ins' and 'outs', the "bool" in there seems a
>> bit redundant.
>
> I think there's three kinds values involved in the predicates. The first
> is the inputs like these values come from other parts of the match and
> it makes sense that they belong in 'ins'. The second is values that are
> directly related to the truthiness of the predicate such as bool,
> unsigned (register number), MachineInstr*, etc.. These are the result of
> the underlying function and there can only be one of these per
> predicate. The type can potentially be std::pair or std::tuple to give
> multiple results provided a suitable conversion to bool is provided. The
> third is metadata that's just carrying additional information to the
> apply. These are currently in the 'ins' section and I agree that they
> don't really make sense there. I can pull them out into a separate
> 'outs' section. This has a flexibility cost since the user can no longer
> specify the complete argument order but that's probably a good thing for
> readability.
>
> Here's an example of a non-bool result:
>      def reg_with_shift : GIMatchPredicate<
>           (ins reg:$A), (outs uint64_t:$shift), MachineOperandPtr, [{
>        // MachineOperand *matchARMShiftedOp2(const MachineOperand &,
> uint64_t);
>        return matchARMShiftedOp2(${A}, ${imm});
>      }]>;
>      def : GICombineRule<
>        (defs root:$D, reg:$S1, reg_with_shift:$S2),
>        (match [{MIR %D = G_ADD %S1, %S2 }]),
>        (apply [{MIR %D = ADD %S1, %S2, %S2.shift }])>;
> (I've just realized there's a gap in the ability to name metadata when
> used in this way. I've gone with 'S2.shift' for now)
>
> This is equivalent to:
>      def reg_with_shift : GIMatchPredicate<
>           (ins reg:$A), (outs ptr_to_reg:$reg, imm:$shift), bool, [{
>        // bool matchARMShiftedOp2(const MachineOperand &,
> MachineOperand*&, uint64_t&);
>        return matchARMShiftedOp2(${A}, ${reg}, ${imm});
>      }]>;
>      def : GICombineRule<
>        (defs root:$D, reg:$S1, reg_with_shift:$S1),
>        (match [{MIR %D = G_ADD %S1, %S2.reg }]),
>        (apply [{MIR %D = ADD %S1, %S2, %S2.shift }])>;
> but the first definition slightly more efficient for most targets since
> there's usually a register or two for return values which in this
> version we're spending solely on a redundant bool and requires fewer
> argument registers, the second version has to store the MachineOperand*
> on the stack to pass it by reference. The difference is fairly small
> when considered on an individual rule but should accumulate on large
> rulesets or large functions.

Okay, I see now where you're coming from, and the goal makes a lot of
sense to me. I need some help with the syntax and semantics though :)

First, just to clarify, the intention is for matchARMShiftedOp2 to match
something like "G_SHL %X, ${imm}", right? Why not express that directly
in MIR?

But for the sake of the argument I'll assume there's a good reason for
the escape to C++. The way I understood your original description of
GIMatchPredicate, its most generic and versatile form of use is as an
explicit entry in the (match) list. Like so:

   def reg_with_shift : GIMatchPredicate<
       (ins reg:$A), (outs uint64_t:$shift), MachineOperandPtr, [{
     // MachineOperand *matchARMShiftedOp2(const MachineOperand &,
     //                                    uint64_t &);
     return matchARMShiftedOp2(${A}, ${imm});
   }]>;
   def : GICombineRule<
       (defs root:$D, reg:$S1, reg:$S2),
       (match [{MIR %D = G_ADD %S1, %tmp }],
              (reg_with_shift $tmp, $shift):$S2),
       (apply [{MIR %D = ADD %S1, %S2, %shift }])>;

I believe that the kind of syntax you showed, in whatever form it
eventually ends up, should just be syntactic sugar. It would be
de-sugared to essentially what I've shown here, right?

So then the question becomes: what is the sugared syntax, if any?

I must say that I really don't like your first example, i.e. this one:

   def : GICombineRule<
       (defs root:$D, reg:$S1, reg_with_shift:$S2),
       (match [{MIR %D = G_ADD %S1, %S2 }]),
       (apply [{MIR %D = ADD %S1, %S2, %S2.shift }])>;

The syntax very strongly suggests that $S2 will be assigned to the RHS
operand of the G_ADD, but there seems to be non-local magic which causes
it to be assigned something else instead -- unless I've misunderstood
the purpose of the matchARMShiftedOp2?


>>>     }]>;
>>>     def extending_loads : GICombineRule<
>>>       (defs root:$root, reg:$A, extending_load_matchdata:$matchinfo),
>>>       (match [{MIR %root = G_LOAD %A }],
>>>              (extending_load_predicate root:$A,
>>>                                         extending_load_matchdata:$matchinfo)),
>>>       (apply (exec [{ Helper.applyCombineExtendingLoads(${root},
>>> ${matchinfo}); }],
>>>                    reg:$root, extending_load_matchdata:$matchinfo)>;
>>> The GIDefMatchData declares a new type of data that can be passed
>>> from the match to the apply. Tablegen is responsible for arranging
>>> for appropriate storage during the Combine algorithm.
>>> The GIMatchPredicate declares a C++ predicate that fills out the
>>> PreferredTuple (passed by reference) whenever it returns true for a
>>> successful match. We could have made the predicate return
>>> std::pair<bool, PreferredTuple> instead but that's less efficient (it
>>> would be an sret return on many targets) and would require us to
>>> define the truthiness (no examples of are in this email as I expect
>>> it to be a rare thing to need) in order to act as a predicate.
>>> Normally, you'd feed this into a (create_imm ...) or a
>>> (create_operand ...) in the apply section. However, in this
>>> particular case the data being passed determines the entirety of the
>>> replacement so we escape into arbitrary C++ instead and arrange for
>>> the variables to be injected appropriately using the 'exec' operator.
>>
>> Have you considered the possibility of defining custom `create`
>> actions for the apply, analogous to custom predicates?
>>
>> In your description, it appears there is a fixed list of builtin
>> actions such as `create_imm`, `create_operand` (not really described
>> anywhere?), `exec` and the tentative debug info approach.
>>
>> It seems valuable to be able to define one's own operator for custom
>> operations.
>
> I didn't go into the mechanical definitions but create_imm is really
> just a specialization of create_operand and is equivalent to
> (create_operand [{ MachineOperand::CreateImm( <code> ) }]). The string
> concatenation needed for that specialization currently doesn't work on
> the code type but that seems easily fixable. I haven't defined one yet,
> but I don't see any reason we couldn't define non-operands in a similar way.

Okay, that sounds good to me.

I'm fairly sure that concatenation on code types works, but it may be
the case that it currently requires some back-and-forth casts with the
string type. Anyway, like you say, should be easy to fix.


[snip]

>> Let's talk about using DAGs. I think I understand where the temptation
>> comes from to describe the MIR using code blocks, but I'm also pretty
>> sure that it's a mistake to do so.
>
> The main motivation to move away from DAGs came from limitations in DAGs
> and the general ugliness of the original DAG-based definitions. Once the
> idea of using MIR came up, we liked the fact that it matched the
> existing serialization format which makes it easy to turn a specific
> example from the compiler output into a combine rule and avoid the need
> for another representation for instructions. We also liked the way it
> dealt with the difficult cases, MachineMemoryOperands, subregs, etc. and
> also that it wasn't a new syntax or parser (although it make require
> modifications to the existing one). It also looked like it be convenient
> for a tool like Alive
> (https://www.cs.utah.edu/~regehr/papers/pldi15.pdf) although we didn't
> really explore that particular thought further.
>
> Ultimately, both representations are DAG's and it comes down to the
> convenience of the syntax in specifying the rules we want. MIR is very
> convenient because of it's familiarity and flexibility.
>
> One of the bigger problems with using tablegen's DAG type is that it
> doesn't deal with multi-result instructions very well. Every time this
> has been raised on the list w.r.t SelectionDAG the solution has boiled
> down to 'use C++ instead' and it would be good to fix that so that
> things like UADDO are representable. You can write a rule that matches
> something like divmod at the top-level using the 'set' operator:
>    (set $D1, $D2, (divmod $A, $B))
> but as soon as it's not the top-level, it gets really ugly fast even
> using pseudo-nodes:
>    (set (outs $D1, $D2), (sext (result (G_DIVMOD $A, $B):$T, 0)),
>                          (sext (result $T, 1)))
> In this example, outs is a pseudo-node needed to distinguish the results
> from the sinks of the DAG, and result is a pseudo-node that selects one
> of the multiple results for use by the parent node.  Meanwhile the MIR
> for is:
>    %0, %1 = G_DIVMOD %A, %B
>    %D1 = G_SEXT %0
>    %D2 = G_SEXT %1
> which seems much clearer. It becomes a bigger problem when you start
> wanting to match target specific instructions as multi-result becomes
> more common later in the backend, especially post-isel. It's also a
> problem for upside-down matches, inside-out, and multi-root matches
> since tablegen's dag type requires a single common root node and can
> only draw edges in one direction. If you want to have to root in the
> middle then you have to distort the DAG to bring it to the root and mark
> up the reversed edges somehow
>
> Moving the result inside the instruction operator helps with these
> problems but requires a list of dags and all the temporaries need to be
> named:
>    [(G_DIVMOD $T1, $T2, $A, $B),
>     (G_SEXT $D1, $T1),
>     (G_SEXT $D2, $T2)]
> and at that point, we're very nearly at MIR.

Yes, that was kind of the point. Expressing MIR, but in a way that is
native to TableGen :)

I should probably clarify that when I wrote "DAG", I meant that only in
the sense of the TableGen 'dag' type, which should probably just be
called "S-expression" instead of "DAG".

I absolutely believe that a list-of-instructions approach is better than
what the SDISel currently uses, for all the reasons you mentioned
(multiple defs, multiple roots).

I simply think using the 'dag' type in TableGen is a much better
approach because it composes with other TableGen frontend features
(e.g., all the substitution rules "just work", you can programmatically
compose a dag/sexpr from multiple pieces using !con, etc.).


> Another issue we didn't like with DAG is that you also end up needing a
> lot of pseudo-nodes like EXTRACT_SUBREG whereas these are natural parts
> of the MIR syntax. For example:
> (set $d, (EXTRACT_SUBREG (MYTGT_ADD $A, $B), sub_lo))
> compared to:
> %d:sub_lo = MYTGT_ADD %A, %B
> and:
> (set $d, (REG_SEQUENCE GPR32, (MYTGT_ADD $A, $B), sub_lo, (MYTGT_ADD $C,
> $D), sub_hi))
> compared to:
> %d:sub_lo = MYTGT_ADD %A, %B
> %d:sub_hi = MYTGT_ADD %C, %D

How about:

   (MYTGT_ADD (sub_lo $d), $A, $B),
   (MYTGT_ADD (sub_hi $d), $C, $D),

That seems acceptable to me.


> Along the same lines, I also think that the integrated debug-info is
> only really practical in MIR. It's possible to shoe-horn DILocation in
> using a pseudo-node like so:
>     (set $d, (add (mul $a, $b):$mul, $c):$add   -> (set
> (merged_dilocation (muladd $a, $b, $c), $mul, $add)
> but there isn't really a good place to modify the DIExpression used by
> DEBUG_VALUE.
>
>> One reason is that it adds yet another parser, which is more
>> maintenance burden without buying much.
>
> The expectation is that we can make use of the existing MIR parser and
> replace it's MachineInstr/MachineOperand/etc. generation code with
> pattern-matching generation code. I'm going to be looking into the
> implementation of that over the next few weeks.
>
>> The more important reason is that DAGs compose better with the rest of
>> TableGen. Consider combine rules defined in multiclasses, for example.
>> That is very common all through LLVM. In order to mirror an existing
>> selection rule in the AMDGPU backend, we'd likely want to have
>> something like this:
>>
>>  multiclass Arithmetic_i16_GIPats<Instruction ginst,
>>                                   Instruction inst> {
>>    // This is probably wishful thinking -- would be okay if we had to
>>    // split this into two different rules due to the different types
>>    // of $dst
>>    def : GICombineRule<
>>      (defs root:$dst, operand:$src0, operand:$src1),
>>      (oneof (ginst i16:$dst, i16:$src0, i16:$src1),
>>             (match (ginst i16:$tmp, i16:$src0, i16:$src1),
>>                    (G_ZEXT i32:$dst, $tmp))),
>>      (inst $dst, $src0, $src1)>;
>>
>>    // A third rule for zext to 64 bits would go here...
>>  }
>
> This is the reason that macros are explicitly imported in the defs
> section and the imported instances are renameable. The same thing in the
> MIR syntax (and using the changes to variable naming from above) would be:
>
>   multiclass Arithmetic_i16_GIPats<Instruction ginst,
>                                    Instruction inst> {
>     def : GICombineRule<
>       # There's a corner case for the double definition of $dst here.
> I'll come back to that later in the mixing concerns comment
>       (defs root:$dst, (import-macro ginst:$ginst, def:$dst, use:$src0,
> use:$src1)),
>       (match [{ %dst:(s16) = ginst %src0:(s16), %src1:(s16) }]), # I've
> corrected the i16 to s16 here. i16 isn't a type in GlobalISel
>       (apply /* I'll come back to this bit */)>;
>
>     // Covers s32 and s64
>     def : GICombineRule<
>       (defs root:$dst, operand:$src0, operand:$src1, (import-macro
> ginst:$ginst)),
>       (match [{ %0:(s16) = ginst %src0:(s16), %src1:(s16)
>                 %dst = G_ZEXT %0:(s16) }]
>              (isS32OrS64 type:$dst)),
>       (apply /* I'll come back to this bit */)>;
>   }
> We could potentially fold the has G_ZEXT case and no G_ZEXT cases
> together with a macro but it seems unnecessary complexity in this case.
> It might be more worth it if there's a similar G_SEXT variant too.
>
> The reason for the "I'll come back to this bit" in the apply section is
> because you've raised an issue I'd forgotten to deal with. The opcodes
> in the apply section should also be parameterizable. My first thought on
> that is that there should be a macro-equivalent for apply such as:
>   def : GIExpansion<
>     (defs def:$R, use:$S, imm:$IDX),
>     (apply (select imm:$IDX,
>                 (0 [{MIR %R = FOO %S }]),
>                 (1 [{MIR %R = BAR %S }]),
>                 (2 [{MIR %0 = BAZ %S
>                          %R = BAR %0 }])))>;
> or alternatively:
>    def : GICombineRule<
>      ...,
>      ...,
>      (apply (create_opcode [{ ${IDX} ? MYTGT_A : MYTGT_B }] ...):$name,
>             [{MIR %D = name %s }])>;
> The former is more powerful, but the latter is more convenient for
> trivial cases

I admit that you've lost me a little here :)

Okay, I see how importing an instruction as a macro is possible without
string concatenation, although I have to say that having to express this
explicit import still feels quite a bit more cumbersome than the
strawman syntax I've shown using the 'dag' type.

Again, the 'dag' type just composes better with the rest of TableGen.

On the apply-side though, I don't understand where the difficulty comes
from. Each instantiation of the multiclass has a single `inst' which
needs to go into the apply rule. That shouldn't be so difficult to
express -- at least with the 'dag' type it's trivial.


>> Obviously you can achieve this in your proposal as well using string
>> concatenation, but that does seem rather backwards.
>>
>> Something else that falls out rather nicely from using DAGs is that it
>> gives you a natural way to give a name to an _instruction_, to be able
>> to pass it to a predicate for extended checks (and as a potentially
>> more natural way of preserving debug locations!).
>
> I don't think we need a means to name instructions since instructions
> can be trivially found from of their results using
> MachineOperand::getParent().
>
> As noted above, merging debug locations doesn't fit very well into DAG's
> as you have to use a pseudo node to be able to provide both DILocations
> on the same result instruction, and there's no real means to include
> DEBUG_LOC for temporaries that are eliminated and need to be rebased
> from another value.

Hmm... I feel like we need more concrete examples for debug info to
discuss this. One of the reasons I've brought up naming instructions is
precisely because I thought it could simplify transferring DebugLocs.

Again, I'm not suggesting the use of a DAG like in SDSel, as I think
that's a mistake. I simply mean the 'dag' type of TableGen, which should
really be called 'sexpr' or something.


>> I have a more explicit example of this later.
>>
>>
>>> _'Upside-down' matches (i.e. roots at the top) and similar_
>>> This one requires algorithm changes which I'd prefer not to discuss
>>> in this RFC. Assuming the underlying algorithm gains support for
>>> this, this is how the syntax would look:
>>> def : GICombineRule<
>>>   (defs root:$root, reg:$A),
>>>   (match [{MIR %1 = G_LOAD %root
>>>                %A = G_SEXT %1 }]),
>>>   (apply [{MIR %A = G_SEXTLOAD %root }])>;
>>> The only unusual thing about this rule is that the root isn't at the
>>> bottom. Instead of starting at a use and matching towards defs, we're
>>> starting at the def and matching towards uses. This has some
>>> potentially useful properties. The combine algorithm has to chose an
>>> insertion point for the replacement code and the traditional choice
>>> has been the root of the match. Assuming we keep doing the same
>>> thing, 'upside-down' matching like this is able to avoid the checks
>>> that the load is safe to move, is non-volatile, has one use, etc.
>>> that would be necessary if we moved the G_LOAD down to the G_SEXT.
>>> Also, assuming we keep the same broadly bottom-up processing order as
>>> the existing Combine algorithm this kind of rule also has relatively
>>> lower priority than conventional rules because the root is seen
>>> later. This can be useful as (algorithm-dependent) the MIR may be
>>> more settled by the time it tries to match.
>>> Along the same lines, the syntax can potentially support the root
>>> being in the middle of a match. This could be used in a similar way
>>> to the upside-down match above to control the insertion point and
>>> priority. For example:
>>> def : GICombineRule<
>>>   (defs root:$root, reg:$A, reg:$B, reg:$C),
>>>   (match [{MIR %1(s32) = G_TRUNC %A(s64)
>>>                %2(s32) = G_TRUNC %B(s64)
>>>                %root = G_ADD %1, %2
>>>                %C(s64) = G_SEXT %root }]),
>>>   (emit [{MIR %root = G_ADD %A, %B
>>>               %C = G_SEXT_INREG %root }])>;
>>> Unfortunately, I don't have any concrete examples where this would be
>>> useful in comparison to a conventional or upside-down match at the
>>> moment. I'm mostly keeping the door open as I can see potential for
>>> benefits (mostly w.r.t sinking and hoisting safety around an instr
>>> that would be difficult to test for that) given an appropriate rule
>>> and also because I'm inclined to say that the tablegen syntax
>>> shouldn't be the reason it's not possible. It should be up to the
>>> Combine algorithm and and tablegen-erate pass involved in
>>> specializing the algorithm for a target.
>>
>> I'm concerned that the concrete syntax proposed here mixes a bunch of
>> concerns that really should be addressed separately:
>>
>> 1. Defining a mapping between semantically equivalent chunks of code.
>> 2. Controlling insertion points as a simple way to keep side effects
>> under control in many cases.
>> 3. Controlling additional algorithmic aspects of the combine algorithm
>> (whether you're matching up or down, say).
>
> I think there may be a misunderstanding with #3. There's no real concept
> of matching up or down in the rules, there's only the starting point for
> the match and from there whether it's following the uses or defs isn't
> something the rule needs to be concerned about. The algorithm needs to
> be aware of it (and historically hasn't been, DAGCombine broadly runs
> bottom up and expects to match towards defs and can fail to re-schedule
> nodes for re-visit in many common cases) but the algorithm is expected
> to figure that out for itself.
>
> 'root' definitions are the only place that #2 and #3 come into the
> proposed syntax. At the moment, the only requirement the syntax imposes
> on the algorithm is that it has a concept of a current MachineOperand
> Def* (and by extension instruction) that will be used as the starting
> point for the match and that it can choose a valid insertion point for
> each instruction produced. The insertion point is the same position in
> the current algorithm but that's algorithm dependent and doesn't have to
> be the case in the long run. There may even be more than one for rules
> that emit multiple instructions in the apply section. There's currently
> no means to specify an insertion point, it's left up to the algorithm to
> find a place for each instruction.  That said, I'd be surprised if there
> was an algorithm where the current MachineOperand isn't also at least
> one of the insertion points.
>
> * The current upstream code is a bit misleading about that and makes it
> seem that it's the MachineInstr but I made that mistake before on ISel
> so I'm intending to change that.
>
>> Could we keep those separate somehow? E.g., don't distinguish between
>> 'root' and 'reg' or 'operand' in the (defs ...) part of the rule; have
>> TableGen figure out the roots automatically based on walking the graph
>> implied by the matching rule, and default to using the sink(s) as roots.
>
> operand isn't really connected to this, it's just a wildcard for an
> MachineOperand that doesn't care whether it's a reg, imm, MCExpr, or
> something else.
>
> I'm inclined to agree that it would be nice to get 'root' out of the
> definition but I don't really see a good way to do it. The sinks aren't
> always the right choice and are a particularly bad choice on combines
> like the extending loads since that's trying to match all uses
> simultaneously and those uses could have any opcode.

How about having it as a lettable variable? As in (using my favorite
syntax, but it should translate either way):

   let MatchRoot = "$tmp" in
   def : GICombineRule<
       (match (G_LOAD $tmp, $ptr),
              (G_SEXT $dst, $tmp)),
       (apply (G_SEXTLOAD $dst, $ptr))>;

I'm using $tmp instead of $ptr as the root here since you wrote above
that it needs to be a Def.

Which makes the discussion about the insertion point potentially
confusing, since $tmp no longer appears in the (apply). But maybe that
doesn't matter.

If MatchRoot is not set, it'd just default to using a sink -- that
should be by far the most common case.

Also, it'd be nice to get rid of the (defs) part of the syntax entirely
for brevity. I believe that should be possible.


>> Then add some optional let-able variable in the GICombineRule to
>> define non-default rules for the combine algorithm.
>>
>> Controlling the insertion point to address loads like in your example
>> above could be done quite naturally if the MIR is expressed in a DAG:
>>
>>  def : GICombineRule<
>>    (match (G_LOAD $tmp, $ptr):$load,
>>           (G_SEXT $dst, $tmp)),
>>    (apply (insertat $load),
>>           (G_SEXTLOAD $dst, $ptr))>;
>
> This isn't quite the same thing as the extending loads combine in the
> example. In that combine, it's matching the load and all uses of its
> result and chosing a replacement based on the global usage of that def
> across the whole function, rewriting conflicting sext/zext/aext/trunc's
> to be as cheap as possible (a lot of the detail of that is inside the
> function but the code is upstream if you want to see the details). This
> rule is matching a single use and relying on CSE to de-dupe the N
> G_SEXTLOAD's that come out of it. To illustrate the difference, consider:
> %0(s8) = G_LOAD %p
> %d1(s16) = G_SEXT %0
> %d2(s32) = G_SEXT %0
> If I apply the above rule as much as possible to this (twice) with the
> roots at the sink, I'll get:
> %d1(s16) = G_SEXTLOAD %p
> %d2(s32) = G_SEXTLOAD %p
> whereas if I apply the extending loads combine to it (once), I'll get:
> %d2(s32) = G_SEXTLOAD %p
> %d1(s16) = G_TRUNC %d2
> which will normally be cheaper to execute. In combination with other
> rules and CSE, the first version could eventually turn into the second
> version but it requires more memory churn and processing time to get there.

You've lost me again, sorry. Where does the G_TRUNC suddenly come from?
Is that the rule that uses `Helper.matchCombineExtendingLoads`?

In that case, don't worry, I only intended my example to be comparable
to the simpler G_LOAD / G_SEXTLOAD rule without C++ escape that you
showed in your original mail as well.

If, on the other hand, you have a plan to get to the more advanced
result of matching multiple G_SEXTs and using G_TRUNC _without_ a C++
escape, I'd be _very_ interested to learn more about that!


> One of the things I'm planning to look into in the algorithm is making
> it smarter about combines involving multiple-uses in the intermediate
> operands. DAGCombine is harmfully greedy in this area and will often
> duplicate expensive operations if one use is combinable but another
> isn't. This has led to the addition of hasOneUse() checks which solves
> that but takes things to the opposite extreme. I think it should be
> possible to find an algorithm that makes an intelligent decision for
> defs with multiple uses.

Absolutely agreed that we need better solutions here.

In the AMDGPU backend, we have some rather interesting challenges around
those. This is perhaps a bit of a far future tangent, but one of them is
that floating point operations can optionally negate their operands;
however, this can be at the cost of a larger instruction encoding, and
the rules for that are rather complex. So there's a problem of
optimizing *where* we insert the negations to improve code size.

In a computation that is a tree with a single sink, an optimal solution
can be found in linear time using dynamic programming (maybe there's an
easier way, but I'm not aware of one). With general DAGs, the problem is
of course much harder.

The negation is just one example, and it would be pretty awesome if we
could use a declarative database of combines to make progress at the
very least in local search optimizations.

Thanks again for your detailed reply!
Cheers,
Nicolai



>> As you can see, this discussion also makes me wonder whether the
>> (defs) are needed at all; maybe we can rely on type inference almost
>> everywhere?
>>
>>
>>> _Multiple roots_
>>> This one requires algorithm changes which I'd prefer not to discuss
>>> in this RFC. Assuming the underlying algorithm gains support for
>>> this, this is how the syntax would look:
>>> def : GICombineRule<
>>>   (defs root:$root1, root:$root2, reg:$A, reg:$B),
>>>   (match [{MIR %root1 = G_ADD %A, %B }],
>>>          [{MIR %root2 = G_SUB %A, %B }]),
>>>   (emit [{MIR %root1, %root2 = BUTTERFLY %A, %B }])>;
>>> This would match a G_ADD and G_SUB with operands in common and
>>> combine them into a BUTTERFLY operation. You can think of this as two
>>> normal rules, one with %root1 as the root and the other with %root2
>>> as the root.
>>
>> Again, it seems to me that it would be preferable if we could just
>> express this rule as:
>>
>>  def : GICombineRule<
>>    (match (G_ADD $dst1, $a, $b),
>>           (G_SUB $dst2, $a, $b),
>>           reg:$a, reg:$b),
>>    (BUTTERFLY $dst1, $dst2, $a, $b)>;
>>
>> or perhaps:
>>
>>  def : GICombineRule<
>>    (match (G_ADD $dst1, reg:$a, $b),
>>           (G_SUB $dst2, $a, reg:$b)),
>>    (BUTTERFLY $dst1, $dst2, $a, $b)>;
>>
>> and similar variations.
>
> Both of these would be ok when inferring the roots to be the sinks.
>
>> TableGen should be able to figure out the multiple roots by walking
>> the dependency graph that is implied by the knowledge about which
>> operands of G_ADD and G_SUB are defs and uses, respectively.
>>
>> To sum up, I really like the proposal overall, except I think it would
>> be a **major** improvement to use DAGs for expressing the MIR, and
>> sure, there's the odd detail here and there like about GIMacro.
>>
>> Cheers,
>> Nicolai
>
> Thanks for all your comments!
>
>>> _Grouping and Ordering Rules_
>>> Combine rules are ordered and grouped using definitions of the form:
>>>     def trivial_combines : GICombineGroup<[copy_prop]>;
>>>     def combines_for_extload: GICombineGroup<[extending_loads]>;
>>>     def all_combines : GICombineGroup<[trivial_combines,
>>> combines_for_extload]>;
>>> Essentially, we create a long ordered list of all the combines.
>>> Tablegen is free to re-order rules so long as the resulting ruleset
>>> always behaves as if it were in this specific order. So for example,
>>> the list [sext_trunc_sext, zext_trunc_zext, sext_sext, zext_zexts] is
>>> free to re-order to [sext_trunc_sext, sext_sext, zext_trunc_zext,
>>> zext_zexts] because the rules involved are mutually exclusive due to
>>> the root opcode.
>>> So why not make tablegen figure out the order like ISel does? ISel
>>> attempts to figure out an order using an scoring system called
>>> Complexity along with a hack (AddedComplexity) to allow the user to
>>> provide magic numbers to fix the cases it got it wrong. The simplest
>>> way to confuse it is with patterns like (add s32, complexpattern1)
>>> and (add s32, complexpattern2). These have the same Complexity score
>>> but in truth, each one has a (possibly overlapping) range of
>>> complexity depending on what C++ code is inside the complexpattern's
>>> and which path through that C++ is dynamically taken. If it matches
>>> nothing then the score should be 0 but if it matches a dozen nodes it
>>> should be 12 (or possibly higher). We don't know which until we try
>>> to match that specific pattern. Correctly figuring out an order in
>>> the presence of complexpatterns is impossible. Similarly, it's also
>>> possible to confuse it with patterns that differ but overlap and add
>>> up to the same complexity due to quirks of the scoring system.
>>> _Declaring Combine Passes_
>>> Combiner passes are defined by subclassing GICombiner like so:
>>>     def CommonPreLegalizerCombiner: GICombiner<
>>>       "CommonGenPreLegalizerCombiner", [all_combines]> {
>>>       let DisableRuleOption = "common-prelegalizercombiner-disable-rule";
>>> let Modifiers = [(disable_rule copy_prop)]
>>>     }
>>> This causes tablegen to create a class named
>>> 'CommonPreLegalizerCombiner' which you can use to perform combines.
>>> This combiner contains all the combines mentioned in the previous
>>> section because it includes the 'all_combines' group. However, it
>>> disables the copy_prop rule to prevent it from attempting to match.
>>> I'll discuss that a bit more below. It also generates a command-line
>>> option for asserts builds only which can be used to further disable
>>> rules at run-time which will be useful for tracking down bugs or for
>>> testing a rule in isolation. I'm hoping that one day tools like
>>> bugpoint will be able to search through the individual rules within a
>>> combine pass when searching for a minimal reproducer.
>>> The Modifiers field is intended to allow targets to modify an
>>> existing ruleset (particularly a target independent one) with
>>> additional target specific quirks. For example, one particular rule
>>> might be doing more harm than good and should be disabled. Or maybe
>>> only a subset of the things it would normally match are safe in which
>>> case an extra predicate should be tested. Being able to make minor
>>> edits to the ruleset without taking on the whole maintenance burden
>>> of the common code or causing code bloat by duplicating tables would
>>> be useful for targets that are generally similar to the targets
>>> within LLVM but have minor quirks.
>>> Larger scale changes should take an alternate approach to modifiers.
>>> It's expected that even targets that are very different from the rest
>>> of the pack still have features in common. Such targets can declare
>>> their own combiner to replace the common one but still generally make
>>> use of the improvements made by the wider community. This is where
>>> GICombinerGroup will start to shine as such a target can declare a
>>> combiner like so:
>>> def MyTargetPreLegalizerCombiner: GICombiner<
>>>       "MyTargetGenPreLegalizerCombiner",
>>> [common_extend_optimizations,
>>>        common_extending_loads,
>>>        // common_rsqrt_and_nr_iterations, // This target has a real
>>> sqrt operation
>>>        mytarget_special_fma,
>>>        common_fma,
>>>        common_bswap_idioms,
>>>        mytarget_bcd_arithmetic
>>> ]> {
>>>     }
>>> As LLVM improves on the common_* groups, the target benefits from
>>> those improvements automatically. However, it doesn't benefit from
>>> new groups being added to common_all_combines because it's no longer
>>> using that group so entirely new categories of combines added to it
>>> would not appear in the combiner. It would still be nice to find out
>>> that a new category has appeared though so that a decision can be
>>> made on it. To that end, I'm considering adding:
>>> let Verify = [(has_all_of_except all_combines,
>>> common_rsqrt_and_nr_iterations)];
>>> which would cause a warning if something new appeared in the original
>>> group.
>>> _Conclusion_
>>> There is a lot of work that needs to be done to get all this working
>>> and some of it may have to change once it runs into the reality of
>>> implementation :-). However, we think that this will prove to be a
>>> very convenient and powerful syntax with some potential a variety of
>>> tools from profilers, to bug reducers, to correctness checking tools.
>>> There is a patch at https://reviews.llvm.org/D54286 makes a start on
>>> it to try out the general feel of the syntax but currently lacks the
>>> core feature of generating match and apply code from the MIR. I'm
>>> going to be looking into that over the next few weeks.
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> [hidden email] <mailto:[hidden email]>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> --
>> Lerne, wie die Welt wirklich ist,
>> Aber vergiss niemals, wie sie sein sollte.
>

--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

David Jones via llvm-dev


On Nov 10, 2018, at 03:28, Nicolai Hähnle <[hidden email]> wrote:

Thank you for the detailed reply! There's a lot to digest :)  Let me try to address most of it.


[snip]
I also think you should have 'ins' and 'outs' separately; after all, a predicate may have to do a combined check on two matched registers / operands, and produce one or more values for later re-use.

Once you have separate 'ins' and 'outs', the "bool" in there seems a bit redundant.
I think there's three kinds values involved in the predicates. The first is the inputs like these values come from other parts of the match and it makes sense that they belong in 'ins'. The second is values that are directly related to the truthiness of the predicate such as bool, unsigned (register number), MachineInstr*, etc.. These are the result of the underlying function and there can only be one of these per predicate. The type can potentially be std::pair or std::tuple to give multiple results provided a suitable conversion to bool is provided. The third is metadata that's just carrying additional information to the apply. These are currently in the 'ins' section and I agree that they don't really make sense there. I can pull them out into a separate 'outs' section. This has a flexibility cost since the user can no longer specify the complete argument order but that's probably a good thing for readability.
Here's an example of a non-bool result:
    def reg_with_shift : GIMatchPredicate<
         (ins reg:$A), (outs uint64_t:$shift), MachineOperandPtr, [{
      // MachineOperand *matchARMShiftedOp2(const MachineOperand &, uint64_t);
      return matchARMShiftedOp2(${A}, ${imm});
    }]>;
    def : GICombineRule<
      (defs root:$D, reg:$S1, reg_with_shift:$S2),
      (match [{MIR %D = G_ADD %S1, %S2 }]),
      (apply [{MIR %D = ADD %S1, %S2, %S2.shift }])>;
(I've just realized there's a gap in the ability to name metadata when used in this way. I've gone with 'S2.shift' for now)
This is equivalent to:
    def reg_with_shift : GIMatchPredicate<
         (ins reg:$A), (outs ptr_to_reg:$reg, imm:$shift), bool, [{
      // bool matchARMShiftedOp2(const MachineOperand &, MachineOperand*&, uint64_t&);
      return matchARMShiftedOp2(${A}, ${reg}, ${imm});
    }]>;
    def : GICombineRule<
      (defs root:$D, reg:$S1, reg_with_shift:$S1),
      (match [{MIR %D = G_ADD %S1, %S2.reg }]),
      (apply [{MIR %D = ADD %S1, %S2, %S2.shift }])>;
but the first definition slightly more efficient for most targets since there's usually a register or two for return values which in this version we're spending solely on a redundant bool and requires fewer argument registers, the second version has to store the MachineOperand* on the stack to pass it by reference. The difference is fairly small when considered on an individual rule but should accumulate on large rulesets or large functions.

Okay, I see now where you're coming from, and the goal makes a lot of sense to me. I need some help with the syntax and semantics though :)

First, just to clarify, the intention is for matchARMShiftedOp2 to match something like "G_SHL %X, ${imm}", right? Why not express that directly in MIR?

Going by http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/CIHBEAGE.html (which admittedly is old docs but it's unlikely to have changed), it needs to be able to match the following:

%0 = G_CONSTANT iN ${imm} # Where imm is any immediate expressible using an unsigned 8-bit immediate rotated by an even number of bits

%0 # I.e. just a plain vreg

%0 = G_CONSTANT iN ${imm} # Where imm is 1 to 32
%2 = G_[AL]SHR %1, %0

%0 = G_CONSTANT iN ${imm} # Where imm is 0 to 31
%2 = G_SHL %1, %0

%0 = G_CONSTANT iN ${imm} # Where imm is 1 to 31
%2 = G_ROR %1, %0 # We don't actually have a rotate right at the moment so this would have to match the equivalent and/shift/or sequence

%0 = G_CONSTANT iN 1
%2 = G_ROR %1, %0 # We don't actually have a rotate right at the moment so this would have to match the equivalent and/shift/or sequence
%3 = G_SEXT %2

%2 = G_[AL]SHR %0, %1

%2 = G_SHL %0, %1

%2 = G_ROR %0, %1 # We don't actually have a rotate right at the moment so this would have to match the equivalent and/shift/or sequence

I don't think it's strictly necessary to escape into C++ for this case (except for checking the immediates have the right value) as there's nothing here that can't be done with patterns. Historically, I believe the reason targets have taken this route in SelectionDAG is mainly down to the fact that SelectionDAG patterns didn't provide a good means to factor out common sub-patterns. Until earlier this year, PatFrags really only defined abbreviations for common sub-patterns rather than defining multiple alternatives. So a target like ARM would have had to duplicate patterns support for every operation that supported it. In the case of ARM that's 153 (9 sub-patterns * 17 instructions supporting it) patterns. By escaping into C++ via ComplexPattern, 17 would do the same job with no code duplication.

FWIW, in the long run I would like most cases of this to move to macros as the tablegen pass ought to be smart enough to factor out the commonality. There will be some that genuinely need the escape hatch but I'd like that to be kept to a minimum as the more matching we can delegate to tablegen, the more we can do interesting things like measure rule coverage, guided fuzzing of rules, profiling the rules, dropping low frequency/profit rules from -O1/-O2, proving correctness, etc. Conversely the more C++ there is, the more work is required to implement those features.

But for the sake of the argument I'll assume there's a good reason for the escape to C++. The way I understood your original description of GIMatchPredicate, its most generic and versatile form of use is as an explicit entry in the (match) list. Like so:

 def reg_with_shift : GIMatchPredicate<
     (ins reg:$A), (outs uint64_t:$shift), MachineOperandPtr, [{
   // MachineOperand *matchARMShiftedOp2(const MachineOperand &,
   //                                    uint64_t &);
   return matchARMShiftedOp2(${A}, ${imm});
 }]>;
 def : GICombineRule<
     (defs root:$D, reg:$S1, reg:$S2),
     (match [{MIR %D = G_ADD %S1, %tmp }],
            (reg_with_shift $tmp, $shift):$S2),
     (apply [{MIR %D = ADD %S1, %S2, %shift }])>;

I believe that the kind of syntax you showed, in whatever form it eventually ends up, should just be syntactic sugar. It would be de-sugared to essentially what I've shown here, right?

Yes, that's right.

So then the question becomes: what is the sugared syntax, if any?

I must say that I really don't like your first example, i.e. this one:

 def : GICombineRule<
     (defs root:$D, reg:$S1, reg_with_shift:$S2),
     (match [{MIR %D = G_ADD %S1, %S2 }]),
     (apply [{MIR %D = ADD %S1, %S2, %S2.shift }])>;

The syntax very strongly suggests that $S2 will be assigned to the RHS operand of the G_ADD, but there seems to be non-local magic which causes it to be assigned something else instead -- unless I've misunderstood the purpose of the matchARMShiftedOp2?

I see your point here. The matched operand and the result of the predicate essentially have the same name at the moment. Something like:
 (apply [{MIR %D = ADD %S1, %S2.result, %S2.shift }])
would resolve the ambiguity but then 'S2.result' is a magically generated name. We'll need to give the result of the predicate a proper name

I suppose we could treat the first 'outs' to mean the return type but I think we ought to be more explicit than that and keep the return value separate from the metadata. Something like:
 def reg_with_shift : GIMatchPredicate<
     (ins reg:$A), (outs bool), (metadata_out MachineOperandPtr:$result, uint64_t:$shift), [{
   // bool matchARMShiftedOp2(const MachineOperand &, MachineOperand *&, uint64_t &);
   return matchARMShiftedOp2(${A}, ${imm});
 }]>;
 def reg_with_shift : GIMatchPredicate<
     (ins reg:$A), (outs MachineOperandPtr:$result), (
metadata_out uint64_t:$shift), [{
   // MachineOperand *matchARMShiftedOp2(const MachineOperand &, uint64_t &);
   return matchARMShiftedOp2(${A}, ${imm});
 }]>;
We'd also need something to make using the original %S2 an error as I'm sure passing the operand being matched instead of the operand returned by the predicate will be a common mistake otherwise:
let ApplyCannotUseMatchedOperand = 1;
I'd also be inclined to make it a warning for this to be unset if the result has a name.

    }]>;
    def extending_loads : GICombineRule<
      (defs root:$root, reg:$A, extending_load_matchdata:$matchinfo),
      (match [{MIR %root = G_LOAD %A }],
             (extending_load_predicate root:$A,
                                        extending_load_matchdata:$matchinfo)),
      (apply (exec [{ Helper.applyCombineExtendingLoads(${root}, ${matchinfo}); }],
                   reg:$root, extending_load_matchdata:$matchinfo)>;
The GIDefMatchData declares a new type of data that can be passed from the match to the apply. Tablegen is responsible for arranging for appropriate storage during the Combine algorithm. The GIMatchPredicate declares a C++ predicate that fills out the PreferredTuple (passed by reference) whenever it returns true for a successful match. We could have made the predicate return std::pair<bool, PreferredTuple> instead but that's less efficient (it would be an sret return on many targets) and would require us to define the truthiness (no examples of are in this email as I expect it to be a rare thing to need) in order to act as a predicate. Normally, you'd feed this into a (create_imm ...) or a (create_operand ...) in the apply section. However, in this particular case the data being passed determines the entirety of the replacement so we escape into arbitrary C++ instead and arrange for the variables to be injected appropriately using the 'exec' operator.

Have you considered the possibility of defining custom `create` actions for the apply, analogous to custom predicates?

In your description, it appears there is a fixed list of builtin actions such as `create_imm`, `create_operand` (not really described anywhere?), `exec` and the tentative debug info approach.

It seems valuable to be able to define one's own operator for custom operations.
I didn't go into the mechanical definitions but create_imm is really just a specialization of create_operand and is equivalent to (create_operand [{ MachineOperand::CreateImm( <code> ) }]). The string concatenation needed for that specialization currently doesn't work on the code type but that seems easily fixable. I haven't defined one yet, but I don't see any reason we couldn't define non-operands in a similar way.

Okay, that sounds good to me.

I'm fairly sure that concatenation on code types works, but it may be the case that it currently requires some back-and-forth casts with the string type. Anyway, like you say, should be easy to fix.


[snip]
Let's talk about using DAGs. I think I understand where the temptation comes from to describe the MIR using code blocks, but I'm also pretty sure that it's a mistake to do so.
The main motivation to move away from DAGs came from limitations in DAGs and the general ugliness of the original DAG-based definitions. Once the idea of using MIR came up, we liked the fact that it matched the existing serialization format which makes it easy to turn a specific example from the compiler output into a combine rule and avoid the need for another representation for instructions. We also liked the way it dealt with the difficult cases, MachineMemoryOperands, subregs, etc. and also that it wasn't a new syntax or parser (although it make require modifications to the existing one). It also looked like it be convenient for a tool like Alive (https://www.cs.utah.edu/~regehr/papers/pldi15.pdf) although we didn't really explore that particular thought further.
Ultimately, both representations are DAG's and it comes down to the convenience of the syntax in specifying the rules we want. MIR is very convenient because of it's familiarity and flexibility.
One of the bigger problems with using tablegen's DAG type is that it doesn't deal with multi-result instructions very well. Every time this has been raised on the list w.r.t SelectionDAG the solution has boiled down to 'use C++ instead' and it would be good to fix that so that things like UADDO are representable. You can write a rule that matches something like divmod at the top-level using the 'set' operator:
  (set $D1, $D2, (divmod $A, $B))
but as soon as it's not the top-level, it gets really ugly fast even using pseudo-nodes:
  (set (outs $D1, $D2), (sext (result (G_DIVMOD $A, $B):$T, 0)),
                        (sext (result $T, 1)))
In this example, outs is a pseudo-node needed to distinguish the results from the sinks of the DAG, and result is a pseudo-node that selects one of the multiple results for use by the parent node.  Meanwhile the MIR for is:
  %0, %1 = G_DIVMOD %A, %B
  %D1 = G_SEXT %0
  %D2 = G_SEXT %1
which seems much clearer. It becomes a bigger problem when you start wanting to match target specific instructions as multi-result becomes more common later in the backend, especially post-isel. It's also a problem for upside-down matches, inside-out, and multi-root matches since tablegen's dag type requires a single common root node and can only draw edges in one direction. If you want to have to root in the middle then you have to distort the DAG to bring it to the root and mark up the reversed edges somehow
Moving the result inside the instruction operator helps with these problems but requires a list of dags and all the temporaries need to be named:
  [(G_DIVMOD $T1, $T2, $A, $B),
   (G_SEXT $D1, $T1),
   (G_SEXT $D2, $T2)]
and at that point, we're very nearly at MIR.

Yes, that was kind of the point. Expressing MIR, but in a way that is native to TableGen :)

I should probably clarify that when I wrote "DAG", I meant that only in the sense of the TableGen 'dag' type, which should probably just be called "S-expression" instead of "DAG".

I absolutely believe that a list-of-instructions approach is better than what the SDISel currently uses, for all the reasons you mentioned (multiple defs, multiple roots).

Ah ok. I thought that you were aiming for SDISel style DAG's. :-)

At first glance, I think everything can fit into this style. For example, matching could be something like:
  (G_DIVMOD $T1, $T2, $A, $B, debug-loc:$DL1)
  (DEBUG_VALUE $T1, debug-expr:$DE1)
  (DEBUG_VALUE $T2, debug-expr:$DE2)
and the corresponding apply:
  (DIVMOD $T1, $T2, $A, $B, debug-loc:$DL1)
  (DEBUG_VALUE $T1, (modify-expr debug-expr:$DE1, ...))
  (DEBUG_VALUE $T2, (modify-expr debug-expr:$DE2, ...))
macros would be the same as instructions, etc. I'll work through the examples in my notes and see if I can find something reasonable for everything in this style.

One potential nit, is that it it's a little odd for the defs to be on the right hand side of the opcode. It's easy enough to change:
  (set $T1, $T2, (G_DIVMOD $A, $B))
but I think that's worse to read. I'd be inclined to go for everything on the right and just label the defs:
  (G_DIVMOD def:$T1, def:$T2, use:$A, use:$B, debug-loc:$DL1)

I simply think using the 'dag' type in TableGen is a much better approach because it composes with other TableGen frontend features (e.g., all the substitution rules "just work", you can programmatically compose a dag/sexpr from multiple pieces using !con, etc.).

That makes sense to me.

Another issue we didn't like with DAG is that you also end up needing a lot of pseudo-nodes like EXTRACT_SUBREG whereas these are natural parts of the MIR syntax. For example:
(set $d, (EXTRACT_SUBREG (MYTGT_ADD $A, $B), sub_lo))
compared to:
%d:sub_lo = MYTGT_ADD %A, %B
and:
(set $d, (REG_SEQUENCE GPR32, (MYTGT_ADD $A, $B), sub_lo, (MYTGT_ADD $C, $D), sub_hi))
compared to:
%d:sub_lo = MYTGT_ADD %A, %B
%d:sub_hi = MYTGT_ADD %C, %D

How about:

 (MYTGT_ADD (sub_lo $d), $A, $B),
 (MYTGT_ADD (sub_hi $d), $C, $D),

That seems acceptable to me.

That looks good to me too.

Along the same lines, I also think that the integrated debug-info is only really practical in MIR. It's possible to shoe-horn DILocation in using a pseudo-node like so:
   (set $d, (add (mul $a, $b):$mul, $c):$add   -> (set (merged_dilocation (muladd $a, $b, $c), $mul, $add)
but there isn't really a good place to modify the DIExpression used by DEBUG_VALUE.
One reason is that it adds yet another parser, which is more maintenance burden without buying much.
The expectation is that we can make use of the existing MIR parser and replace it's MachineInstr/MachineOperand/etc. generation code with pattern-matching generation code. I'm going to be looking into the implementation of that over the next few weeks.
The more important reason is that DAGs compose better with the rest of TableGen. Consider combine rules defined in multiclasses, for example. That is very common all through LLVM. In order to mirror an existing selection rule in the AMDGPU backend, we'd likely want to have something like this:

 multiclass Arithmetic_i16_GIPats<Instruction ginst,
                                  Instruction inst> {
   // This is probably wishful thinking -- would be okay if we had to
   // split this into two different rules due to the different types
   // of $dst
   def : GICombineRule<
     (defs root:$dst, operand:$src0, operand:$src1),
     (oneof (ginst i16:$dst, i16:$src0, i16:$src1),
            (match (ginst i16:$tmp, i16:$src0, i16:$src1),
                   (G_ZEXT i32:$dst, $tmp))),
     (inst $dst, $src0, $src1)>;

   // A third rule for zext to 64 bits would go here...
 }
This is the reason that macros are explicitly imported in the defs section and the imported instances are renameable. The same thing in the MIR syntax (and using the changes to variable naming from above) would be:
 multiclass Arithmetic_i16_GIPats<Instruction ginst,
                                  Instruction inst> {
   def : GICombineRule<
     # There's a corner case for the double definition of $dst here. I'll come back to that later in the mixing concerns comment
     (defs root:$dst, (import-macro ginst:$ginst, def:$dst, use:$src0, use:$src1)),
     (match [{ %dst:(s16) = ginst %src0:(s16), %src1:(s16) }]), # I've corrected the i16 to s16 here. i16 isn't a type in GlobalISel
     (apply /* I'll come back to this bit */)>;
   // Covers s32 and s64
   def : GICombineRule<
     (defs root:$dst, operand:$src0, operand:$src1, (import-macro ginst:$ginst)),
     (match [{ %0:(s16) = ginst %src0:(s16), %src1:(s16)
               %dst = G_ZEXT %0:(s16) }]
            (isS32OrS64 type:$dst)),
     (apply /* I'll come back to this bit */)>;
 }
We could potentially fold the has G_ZEXT case and no G_ZEXT cases together with a macro but it seems unnecessary complexity in this case. It might be more worth it if there's a similar G_SEXT variant too.
The reason for the "I'll come back to this bit" in the apply section is because you've raised an issue I'd forgotten to deal with. The opcodes in the apply section should also be parameterizable. My first thought on that is that there should be a macro-equivalent for apply such as:
 def : GIExpansion<
   (defs def:$R, use:$S, imm:$IDX),
   (apply (select imm:$IDX,
               (0 [{MIR %R = FOO %S }]),
               (1 [{MIR %R = BAR %S }]),
               (2 [{MIR %0 = BAZ %S
                        %R = BAR %0 }])))>;
or alternatively:
  def : GICombineRule<
    ...,
    ...,
    (apply (create_opcode [{ ${IDX} ? MYTGT_A : MYTGT_B }] ...):$name,
           [{MIR %D = name %s }])>;
The former is more powerful, but the latter is more convenient for trivial cases

I admit that you've lost me a little here :)

Okay, I see how importing an instruction as a macro is possible without string concatenation, although I have to say that having to express this explicit import still feels quite a bit more cumbersome than the strawman syntax I've shown using the 'dag' type.

Again, the 'dag' type just composes better with the rest of TableGen.

With your reply above, I'm coming around to the dag type. I'll see how the examples in my notes end up in that style.

On the apply-side though, I don't understand where the difficulty comes from. Each instantiation of the multiclass has a single `inst' which needs to go into the apply rule. That shouldn't be so difficult to express -- at least with the 'dag' type it's trivial.

Yeah, I went off on a tangent a bit :-).

For your example: Yes, there's only one inst for each multiclass instantiation so it's simple.

For the tangent I was on, consider:
%0 = G_CONSTANT iN %x
%1 = G_CONSTANT iN %y
%2 = G_LSHR %a, %x
%3 = G_LSHR %1, %y
=>
%0 = G_CONSTANT iN %z # z = x+y
%1 = G_LSHR %a, %0
and:
%0 = G_CONSTANT iN %x
%1 = G_CONSTANT iN %y
%2 = G_ASHR %a, %x
%3 = G_ASHR %1, %y
=>
%0 = G_CONSTANT iN %z # z = x+y
%1 = G_ASHR %a, %0
and:
%0 = G_CONSTANT iN %x
%1 = G_CONSTANT iN %y
%2 = G_SHL %a, %x
%3 = G_SHL %1, %y
=>
%0 = G_CONSTANT iN %z # z = x+y
%1 = G_SHL %a, %0

These rules are very similar so it would be nice to merge them like so:
%0 = G_CONSTANT iN %x
%1 = G_CONSTANT iN %y
%2 = {G_SHL,G_ASHR,G_LSHR} %a, %x
%3 = <same-shift-as-%2> %1, %y
=>
%0 = G_CONSTANT iN %z # z = x+y
%1 = <same-shift-as-match> %a, %0

Obviously you can achieve this in your proposal as well using string concatenation, but that does seem rather backwards.

Something else that falls out rather nicely from using DAGs is that it gives you a natural way to give a name to an _instruction_, to be able to pass it to a predicate for extended checks (and as a potentially more natural way of preserving debug locations!).
I don't think we need a means to name instructions since instructions can be trivially found from of their results using MachineOperand::getParent().
As noted above, merging debug locations doesn't fit very well into DAG's as you have to use a pseudo node to be able to provide both DILocations on the same result instruction, and there's no real means to include DEBUG_LOC for temporaries that are eliminated and need to be rebased from another value.

Hmm... I feel like we need more concrete examples for debug info to discuss this. One of the reasons I've brought up naming instructions is precisely because I thought it could simplify transferring DebugLocs.

Again, I'm not suggesting the use of a DAG like in SDSel, as I think that's a mistake. I simply mean the 'dag' type of TableGen, which should really be called 'sexpr' or something.

I think we can use the same idea you suggested for subregisters:
  (DIVMOD $T1, $T2, $A, $B, (merge_debug_loc $DL1, $DL2))

I have a more explicit example of this later.


_'Upside-down' matches (i.e. roots at the top) and similar_
This one requires algorithm changes which I'd prefer not to discuss in this RFC. Assuming the underlying algorithm gains support for this, this is how the syntax would look:
def : GICombineRule<
  (defs root:$root, reg:$A),
  (match [{MIR %1 = G_LOAD %root
               %A = G_SEXT %1 }]),
  (apply [{MIR %A = G_SEXTLOAD %root }])>;
The only unusual thing about this rule is that the root isn't at the bottom. Instead of starting at a use and matching towards defs, we're starting at the def and matching towards uses. This has some potentially useful properties. The combine algorithm has to chose an insertion point for the replacement code and the traditional choice has been the root of the match. Assuming we keep doing the same thing, 'upside-down' matching like this is able to avoid the checks that the load is safe to move, is non-volatile, has one use, etc. that would be necessary if we moved the G_LOAD down to the G_SEXT. Also, assuming we keep the same broadly bottom-up processing order as the existing Combine algorithm this kind of rule also has relatively lower priority than conventional rules because the root is seen later. This can be useful as (algorithm-dependent) the MIR may be more settled by the time it tries to match.
Along the same lines, the syntax can potentially support the root being in the middle of a match. This could be used in a similar way to the upside-down match above to control the insertion point and priority. For example:
def : GICombineRule<
  (defs root:$root, reg:$A, reg:$B, reg:$C),
  (match [{MIR %1(s32) = G_TRUNC %A(s64)
               %2(s32) = G_TRUNC %B(s64)
               %root = G_ADD %1, %2
               %C(s64) = G_SEXT %root }]),
  (emit [{MIR %root = G_ADD %A, %B
              %C = G_SEXT_INREG %root }])>;
Unfortunately, I don't have any concrete examples where this would be useful in comparison to a conventional or upside-down match at the moment. I'm mostly keeping the door open as I can see potential for benefits (mostly w.r.t sinking and hoisting safety around an instr that would be difficult to test for that) given an appropriate rule and also because I'm inclined to say that the tablegen syntax shouldn't be the reason it's not possible. It should be up to the Combine algorithm and and tablegen-erate pass involved in specializing the algorithm for a target.

I'm concerned that the concrete syntax proposed here mixes a bunch of concerns that really should be addressed separately:

1. Defining a mapping between semantically equivalent chunks of code.
2. Controlling insertion points as a simple way to keep side effects under control in many cases.
3. Controlling additional algorithmic aspects of the combine algorithm (whether you're matching up or down, say).
I think there may be a misunderstanding with #3. There's no real concept of matching up or down in the rules, there's only the starting point for the match and from there whether it's following the uses or defs isn't something the rule needs to be concerned about. The algorithm needs to be aware of it (and historically hasn't been, DAGCombine broadly runs bottom up and expects to match towards defs and can fail to re-schedule nodes for re-visit in many common cases) but the algorithm is expected to figure that out for itself.
'root' definitions are the only place that #2 and #3 come into the proposed syntax. At the moment, the only requirement the syntax imposes on the algorithm is that it has a concept of a current MachineOperand Def* (and by extension instruction) that will be used as the starting point for the match and that it can choose a valid insertion point for each instruction produced. The insertion point is the same position in the current algorithm but that's algorithm dependent and doesn't have to be the case in the long run. There may even be more than one for rules that emit multiple instructions in the apply section. There's currently no means to specify an insertion point, it's left up to the algorithm to find a place for each instruction.  That said, I'd be surprised if there was an algorithm where the current MachineOperand isn't also at least one of the insertion points.
* The current upstream code is a bit misleading about that and makes it seem that it's the MachineInstr but I made that mistake before on ISel so I'm intending to change that.
Could we keep those separate somehow? E.g., don't distinguish between 'root' and 'reg' or 'operand' in the (defs ...) part of the rule; have TableGen figure out the roots automatically based on walking the graph implied by the matching rule, and default to using the sink(s) as roots.
operand isn't really connected to this, it's just a wildcard for an MachineOperand that doesn't care whether it's a reg, imm, MCExpr, or something else.
I'm inclined to agree that it would be nice to get 'root' out of the definition but I don't really see a good way to do it. The sinks aren't always the right choice and are a particularly bad choice on combines like the extending loads since that's trying to match all uses simultaneously and those uses could have any opcode.

How about having it as a lettable variable? As in (using my favorite syntax, but it should translate either way):

 let MatchRoot = "$tmp" in
 def : GICombineRule<
     (match (G_LOAD $tmp, $ptr),
            (G_SEXT $dst, $tmp)),
     (apply (G_SEXTLOAD $dst, $ptr))>;

I'm using $tmp instead of $ptr as the root here since you wrote above that it needs to be a Def.

Which makes the discussion about the insertion point potentially confusing, since $tmp no longer appears in the (apply). But maybe that doesn't matter.

The insertion point is up to the algorithm anyway so it makes sense for it to not be visible.

If MatchRoot is not set, it'd just default to using a sink -- that should be by far the most common case.

If we change that to all the sinks then we cover the common multi-root case too.

Also, it'd be nice to get rid of the (defs) part of the syntax entirely for brevity. I believe that should be possible.

I'm not very keen to remove the defs section but I'll see whether it's still needed in this style. Even if it's removable though, having it is likely to reduce the time spent on the tablegen pass as there's a cost to having to deduce information. I also think it's also important for error checking since it provides a means to declare which values you expect to see both in the match/apply whereas it's expected that some are only matched.

Then add some optional let-able variable in the GICombineRule to define non-default rules for the combine algorithm.

Controlling the insertion point to address loads like in your example above could be done quite naturally if the MIR is expressed in a DAG:

 def : GICombineRule<
   (match (G_LOAD $tmp, $ptr):$load,
          (G_SEXT $dst, $tmp)),
   (apply (insertat $load),
          (G_SEXTLOAD $dst, $ptr))>;
This isn't quite the same thing as the extending loads combine in the example. In that combine, it's matching the load and all uses of its result and chosing a replacement based on the global usage of that def across the whole function, rewriting conflicting sext/zext/aext/trunc's to be as cheap as possible (a lot of the detail of that is inside the function but the code is upstream if you want to see the details). This rule is matching a single use and relying on CSE to de-dupe the N G_SEXTLOAD's that come out of it. To illustrate the difference, consider:
%0(s8) = G_LOAD %p
%d1(s16) = G_SEXT %0
%d2(s32) = G_SEXT %0
If I apply the above rule as much as possible to this (twice) with the roots at the sink, I'll get:
%d1(s16) = G_SEXTLOAD %p
%d2(s32) = G_SEXTLOAD %p
whereas if I apply the extending loads combine to it (once), I'll get:
%d2(s32) = G_SEXTLOAD %p
%d1(s16) = G_TRUNC %d2
which will normally be cheaper to execute. In combination with other rules and CSE, the first version could eventually turn into the second version but it requires more memory churn and processing time to get there.

You've lost me again, sorry. Where does the G_TRUNC suddenly come from? Is that the rule that uses `Helper.matchCombineExtendingLoads`?

In that case, don't worry, I only intended my example to be comparable to the simpler G_LOAD / G_SEXTLOAD rule without C++ escape that you showed in your original mail as well.

Yes, it comes from the Helper.applyCombineExtendingLoads(...), which looking back at the thread is from the macros section and not the upside down matching section you were commenting on. Sorry about that

If, on the other hand, you have a plan to get to the more advanced result of matching multiple G_SEXTs and using G_TRUNC _without_ a C++ escape, I'd be _very_ interested to learn more about that!

It's easy enough to match multiple uses in the current rule definitions. You just describe each use in the match section and have a predicate check that the operands aren't the same. However, C++ is the only way to look at all the users of a given value and modify them differently at the moment. I've had some vague thoughts about a (for-each-use ...) in the match and apply section but I haven't given that much thought yet.

One of the things I'm planning to look into in the algorithm is making it smarter about combines involving multiple-uses in the intermediate operands. DAGCombine is harmfully greedy in this area and will often duplicate expensive operations if one use is combinable but another isn't. This has led to the addition of hasOneUse() checks which solves that but takes things to the opposite extreme. I think it should be possible to find an algorithm that makes an intelligent decision for defs with multiple uses.

Absolutely agreed that we need better solutions here.

In the AMDGPU backend, we have some rather interesting challenges around those. This is perhaps a bit of a far future tangent, but one of them is that floating point operations can optionally negate their operands; however, this can be at the cost of a larger instruction encoding, and the rules for that are rather complex. So there's a problem of optimizing *where* we insert the negations to improve code size.

In a computation that is a tree with a single sink, an optimal solution can be found in linear time using dynamic programming (maybe there's an easier way, but I'm not aware of one). With general DAGs, the problem is of course much harder.

The negation is just one example, and it would be pretty awesome if we could use a declarative database of combines to make progress at the very least in local search optimizations.

It certainly seems that something like this ought to be possible. A set of combines could find the local minimums and it's also possible for combine rules to avoid making changes and record the possible places for the negation and the associated costs which ought to make it possible to reach the optimum for the tree.

Along the same lines, it occurs to me that known-bits analysis is very similar and could be done declaratively with rules like:
 def : GIKnownBitsRule<
     (match reg:$dst, reg:$src))
     (match (G_ZEXT $dst, $src)),
     (apply (exec [{ KnownZero.setBitsFrom(MRI.getType(${src}).getSizeInBits()) }], reg:$src))>;

Thanks again for your detailed reply!
Cheers,
Nicolai



As you can see, this discussion also makes me wonder whether the (defs) are needed at all; maybe we can rely on type inference almost everywhere?


_Multiple roots_
This one requires algorithm changes which I'd prefer not to discuss in this RFC. Assuming the underlying algorithm gains support for this, this is how the syntax would look:
def : GICombineRule<
  (defs root:$root1, root:$root2, reg:$A, reg:$B),
  (match [{MIR %root1 = G_ADD %A, %B }],
         [{MIR %root2 = G_SUB %A, %B }]),
  (emit [{MIR %root1, %root2 = BUTTERFLY %A, %B }])>;
This would match a G_ADD and G_SUB with operands in common and combine them into a BUTTERFLY operation. You can think of this as two normal rules, one with %root1 as the root and the other with %root2 as the root.

Again, it seems to me that it would be preferable if we could just express this rule as:

 def : GICombineRule<
   (match (G_ADD $dst1, $a, $b),
          (G_SUB $dst2, $a, $b),
          reg:$a, reg:$b),
   (BUTTERFLY $dst1, $dst2, $a, $b)>;

or perhaps:

 def : GICombineRule<
   (match (G_ADD $dst1, reg:$a, $b),
          (G_SUB $dst2, $a, reg:$b)),
   (BUTTERFLY $dst1, $dst2, $a, $b)>;

and similar variations.
Both of these would be ok when inferring the roots to be the sinks.
TableGen should be able to figure out the multiple roots by walking the dependency graph that is implied by the knowledge about which operands of G_ADD and G_SUB are defs and uses, respectively.

To sum up, I really like the proposal overall, except I think it would be a **major** improvement to use DAGs for expressing the MIR, and sure, there's the odd detail here and there like about GIMacro.

Cheers,
Nicolai
Thanks for all your comments!
_Grouping and Ordering Rules_
Combine rules are ordered and grouped using definitions of the form:
    def trivial_combines : GICombineGroup<[copy_prop]>;
    def combines_for_extload: GICombineGroup<[extending_loads]>;
    def all_combines : GICombineGroup<[trivial_combines, combines_for_extload]>;
Essentially, we create a long ordered list of all the combines. Tablegen is free to re-order rules so long as the resulting ruleset always behaves as if it were in this specific order. So for example, the list [sext_trunc_sext, zext_trunc_zext, sext_sext, zext_zexts] is free to re-order to [sext_trunc_sext, sext_sext, zext_trunc_zext, zext_zexts] because the rules involved are mutually exclusive due to the root opcode.
So why not make tablegen figure out the order like ISel does? ISel attempts to figure out an order using an scoring system called Complexity along with a hack (AddedComplexity) to allow the user to provide magic numbers to fix the cases it got it wrong. The simplest way to confuse it is with patterns like (add s32, complexpattern1) and (add s32, complexpattern2). These have the same Complexity score but in truth, each one has a (possibly overlapping) range of complexity depending on what C++ code is inside the complexpattern's and which path through that C++ is dynamically taken. If it matches nothing then the score should be 0 but if it matches a dozen nodes it should be 12 (or possibly higher). We don't know which until we try to match that specific pattern. Correctly figuring out an order in the presence of complexpatterns is impossible. Similarly, it's also possible to confuse it with patterns that differ but overlap and add up to the same complexity due to quirks of the scoring system.
_Declaring Combine Passes_
Combiner passes are defined by subclassing GICombiner like so:
    def CommonPreLegalizerCombiner: GICombiner<
      "CommonGenPreLegalizerCombiner", [all_combines]> {
      let DisableRuleOption = "common-prelegalizercombiner-disable-rule";
let Modifiers = [(disable_rule copy_prop)]
    }
This causes tablegen to create a class named 'CommonPreLegalizerCombiner' which you can use to perform combines. This combiner contains all the combines mentioned in the previous section because it includes the 'all_combines' group. However, it disables the copy_prop rule to prevent it from attempting to match. I'll discuss that a bit more below. It also generates a command-line option for asserts builds only which can be used to further disable rules at run-time which will be useful for tracking down bugs or for testing a rule in isolation. I'm hoping that one day tools like bugpoint will be able to search through the individual rules within a combine pass when searching for a minimal reproducer.
The Modifiers field is intended to allow targets to modify an existing ruleset (particularly a target independent one) with additional target specific quirks. For example, one particular rule might be doing more harm than good and should be disabled. Or maybe only a subset of the things it would normally match are safe in which case an extra predicate should be tested. Being able to make minor edits to the ruleset without taking on the whole maintenance burden of the common code or causing code bloat by duplicating tables would be useful for targets that are generally similar to the targets within LLVM but have minor quirks.
Larger scale changes should take an alternate approach to modifiers. It's expected that even targets that are very different from the rest of the pack still have features in common. Such targets can declare their own combiner to replace the common one but still generally make use of the improvements made by the wider community. This is where GICombinerGroup will start to shine as such a target can declare a combiner like so:
def MyTargetPreLegalizerCombiner: GICombiner<
      "MyTargetGenPreLegalizerCombiner",
[common_extend_optimizations,
       common_extending_loads,
       // common_rsqrt_and_nr_iterations, // This target has a real sqrt operation
       mytarget_special_fma,
       common_fma,
       common_bswap_idioms,
       mytarget_bcd_arithmetic
]> {
    }
As LLVM improves on the common_* groups, the target benefits from those improvements automatically. However, it doesn't benefit from new groups being added to common_all_combines because it's no longer using that group so entirely new categories of combines added to it would not appear in the combiner. It would still be nice to find out that a new category has appeared though so that a decision can be made on it. To that end, I'm considering adding:
let Verify = [(has_all_of_except all_combines, common_rsqrt_and_nr_iterations)];
which would cause a warning if something new appeared in the original group.
_Conclusion_
There is a lot of work that needs to be done to get all this working and some of it may have to change once it runs into the reality of implementation :-). However, we think that this will prove to be a very convenient and powerful syntax with some potential a variety of tools from profilers, to bug reducers, to correctness checking tools.
There is a patch at https://reviews.llvm.org/D54286 makes a start on it to try out the general feel of the syntax but currently lacks the core feature of generating match and apply code from the MIR. I'm going to be looking into that over the next few weeks.
_______________________________________________
LLVM Developers mailing list
[hidden email] <[hidden email]>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.

--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

David Jones via llvm-dev
In reply to this post by David Jones via llvm-dev
Daniel Sanders via llvm-dev <[hidden email]> writes:

> That's an interesting idea. Certainly tablegenerating InstCombine
> ought to be possible and sharing code sounds like it ought to be
> doable. MIR and IR are pretty similar especially after IRTranslator
> (which is a direct translation) through to the Legalizer (which is the
> first point target instructions can until targets make custom
> passes). From the Legalizer to ISel, there's still likely to be a fair
> amount of overlap between the two as a lot of the G_* opcodes directly
> correspond to LLVM-IR instructions. The tricky bit will be the escape
> hatches into C++ would need to either have Instruction/MachineInstr
> versions or would need to accept both.

Yes.  I wonder if templates can help with this.  I'm thinking of
LoopInfo, which is parameterized with BasicBlock/Loop or
MachineBasicBlock/MachineLoop, as a guide.  I don't know if Instruction
and MachineInstr have enough APIs in common, though.  Perhaps a
longer-term effort could be made to develop a common API with the same
spellings for both, at least enough for dagcombine purposes.

Your initial response to Nicolai's suggestion to use TableGen's dag type
triggered something else in my mind.  The TableGen SelectionDAG stuff is
really restrictive in a lot of ways because the TableGen dag type is
used to express actual DAGs and you rightly pointed out that that's too
unwieldy for dagcombine.  It's unwieldy for isel too, which has many of
the same issues (multiple outputs, use of things like EXTRACT_SUBREG and
so on).  It's never handled selecting to multiple instructions very well
either.  I wonder if it would be possible to express isel patterns using
TableGen's dag type in the way that Nicolai suggests for dagcombine.  In
a lot of ways, isel is "just another kind of dagcombine."

>> The use of '%' vs. '$' here really threw me for a loop.  I would have
>> expected '$D' and '$S' everywhere.  What's the significance of '%'
>> vs. '$'?  I know MIR uses '%' for names but must that be the case in
>> these match blocks?
>
> In MIR, '%' and '$' have a semantic difference when used on operands.
> '%foo' is a virtual register named foo but '$foo' is the physical register foo.

Ok, thanks for the explanation.  I have not worked extensively with MIR
yet.

> The main reason I didn't pick something distinct from either
> (e.g. '${foo}') is that I'd like to minimize the need to modify the
> MIR parser to support pattern-specific syntax.

Sure, that makes sense.

>> My understanding is that "root" means the sink here, but later in the
>> "upside-down" example, "root" means the source.  It seems to me the use
>> of "root" here is a misnomer/confusing.  I liked Nicolai's ideas about
>> specifying the insert point.  Why do users need to know about "root" at
>> all?  Is there some other special meaning attached to it?
>
> It doesn't correspond with any property of the DAG being matched. It's
> the entry point that the combine algorithm uses to begin an attempt to
> match the rule. In DAGCombine terms, it's the first node checked for
> the rule inside the call to DAGCombine::visit(SDNode *).

Got it.  "root" has a specific meeting for a tree/dag and I think I was
getting hung up on that, but I see how it has meaning as the starting
point of a match.  I was going to say maybe we can find another name for
it, but now I can't think of a better one.  :)

                               -David
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

David Jones via llvm-dev


> On Nov 13, 2018, at 08:01, David Greene <[hidden email]> wrote:
>
> Daniel Sanders via llvm-dev <[hidden email]> writes:
>
>> That's an interesting idea. Certainly tablegenerating InstCombine
>> ought to be possible and sharing code sounds like it ought to be
>> doable. MIR and IR are pretty similar especially after IRTranslator
>> (which is a direct translation) through to the Legalizer (which is the
>> first point target instructions can until targets make custom
>> passes). From the Legalizer to ISel, there's still likely to be a fair
>> amount of overlap between the two as a lot of the G_* opcodes directly
>> correspond to LLVM-IR instructions. The tricky bit will be the escape
>> hatches into C++ would need to either have Instruction/MachineInstr
>> versions or would need to accept both.
>
> Yes.  I wonder if templates can help with this.  I'm thinking of
> LoopInfo, which is parameterized with BasicBlock/Loop or
> MachineBasicBlock/MachineLoop, as a guide.  I don't know if Instruction
> and MachineInstr have enough APIs in common, though.  Perhaps a
> longer-term effort could be made to develop a common API with the same
> spellings for both, at least enough for dagcombine purposes.

Unfortunately, Instruction is a class-hierachy based design with specialized getters for each subclass whereas MachineInstr is a single one-size-fits-all class. It's likely to be difficult to unify them with templates. A common API sounds feasible to some degree, certainly the G_* opcodes and the Instruction hierarchy ought to be able to agree on abstraction that can interact with both, and MachineMemOperand ought to be able to agree with LoadInst/StoreInst/Atomic*Inst

> Your initial response to Nicolai's suggestion to use TableGen's dag type
> triggered something else in my mind.  The TableGen SelectionDAG stuff is
> really restrictive in a lot of ways because the TableGen dag type is
> used to express actual DAGs and you rightly pointed out that that's too
> unwieldy for dagcombine.  It's unwieldy for isel too, which has many of
> the same issues (multiple outputs, use of things like EXTRACT_SUBREG and
> so on).  It's never handled selecting to multiple instructions very well
> either.  I wonder if it would be possible to express isel patterns using
> TableGen's dag type in the way that Nicolai suggests for dagcombine.  In
> a lot of ways, isel is "just another kind of dagcombine."

I expect it to be able to deal with ISel too. The main difference between the two is that ISel's apply step is required to constrain operands to a register class whereas Combine doesn't need to do that but can choose to. I'm intending to share code between the tablegen passes for Combine and ISel so that it will be the same underlying code generator.

The syntax could also work for the Legalizer, and RegBankAlloc too. The legalizer is a degenerate case of matching a single instruction's opcode, types, and MachineMemOperand so it's probably not worth doing there, but using it in RegBankAlloc to specify the alternative code for each bank and the constraints to apply could be useful. The RegBankAlloc pass boils down to lots of rules of the form:
        %2(s32) = ADD %0(s32), %1(s32) => %2:GPR = ADD %0:GPR, %1:GPR
with some contradictory rules like:
        %1(s32) = FNEG %0(s32) => %1:FPR = FNEG %0: FPR
        %1(s32) = FNEG %0(s32) => %1:GPR = XOR %0:GPR, i32 0x80000000
RegBankAlloc's job is then to resolve the contradictions in the most globally efficient way. For example, if the input is already in FPR's then the higher latency of an FNEG might beat the cost of transferring it to GPR and using the lower-latency XOR. However, if the result needs to be in a GPR anyway then it's better to copy to GPR before the FNEG and use the lower-latency XOR instead.

>>> The use of '%' vs. '$' here really threw me for a loop.  I would have
>>> expected '$D' and '$S' everywhere.  What's the significance of '%'
>>> vs. '$'?  I know MIR uses '%' for names but must that be the case in
>>> these match blocks?
>>
>> In MIR, '%' and '$' have a semantic difference when used on operands.
>> '%foo' is a virtual register named foo but '$foo' is the physical register foo.
>
> Ok, thanks for the explanation.  I have not worked extensively with MIR
> yet.
>
>> The main reason I didn't pick something distinct from either
>> (e.g. '${foo}') is that I'd like to minimize the need to modify the
>> MIR parser to support pattern-specific syntax.
>
> Sure, that makes sense.
>
>>> My understanding is that "root" means the sink here, but later in the
>>> "upside-down" example, "root" means the source.  It seems to me the use
>>> of "root" here is a misnomer/confusing.  I liked Nicolai's ideas about
>>> specifying the insert point.  Why do users need to know about "root" at
>>> all?  Is there some other special meaning attached to it?
>>
>> It doesn't correspond with any property of the DAG being matched. It's
>> the entry point that the combine algorithm uses to begin an attempt to
>> match the rule. In DAGCombine terms, it's the first node checked for
>> the rule inside the call to DAGCombine::visit(SDNode *).
>
> Got it.  "root" has a specific meeting for a tree/dag and I think I was
> getting hung up on that, but I see how it has meaning as the starting
> point of a match.  I was going to say maybe we can find another name for
> it, but now I can't think of a better one.  :)

The best I can think of is 'matchpoint' or 'matchfrom'. As mentioned on the thread with Nicolai, we might be able to drop it altogether in favour of a let statement which can a longer name which will help to clarify it.

>                               -David

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

David Jones via llvm-dev
In reply to this post by David Jones via llvm-dev
On 13.11.18 00:13, Daniel Sanders wrote:

>> On Nov 10, 2018, at 03:28, Nicolai Hähnle <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>> Thank you for the detailed reply! There's a lot to digest :)  Let me
>> try to address most of it.
>>
>>
>> [snip]
>>>> I also think you should have 'ins' and 'outs' separately; after all,
>>>> a predicate may have to do a combined check on two matched registers
>>>> / operands, and produce one or more values for later re-use.
>>>>
>>>> Once you have separate 'ins' and 'outs', the "bool" in there seems a
>>>> bit redundant.
>>> I think there's three kinds values involved in the predicates. The
>>> first is the inputs like these values come from other parts of the
>>> match and it makes sense that they belong in 'ins'. The second is
>>> values that are directly related to the truthiness of the predicate
>>> such as bool, unsigned (register number), MachineInstr*, etc.. These
>>> are the result of the underlying function and there can only be one
>>> of these per predicate. The type can potentially be std::pair or
>>> std::tuple to give multiple results provided a suitable conversion to
>>> bool is provided. The third is metadata that's just carrying
>>> additional information to the apply. These are currently in the 'ins'
>>> section and I agree that they don't really make sense there. I can
>>> pull them out into a separate 'outs' section. This has a flexibility
>>> cost since the user can no longer specify the complete argument order
>>> but that's probably a good thing for readability.
>>> Here's an example of a non-bool result:
>>>     def reg_with_shift : GIMatchPredicate<
>>>          (ins reg:$A), (outs uint64_t:$shift), MachineOperandPtr, [{
>>>       // MachineOperand *matchARMShiftedOp2(const MachineOperand &,
>>> uint64_t);
>>>       return matchARMShiftedOp2(${A}, ${imm});
>>>     }]>;
>>>     def : GICombineRule<
>>>       (defs root:$D, reg:$S1, reg_with_shift:$S2),
>>>       (match [{MIR %D = G_ADD %S1, %S2 }]),
>>>       (apply [{MIR %D = ADD %S1, %S2, %S2.shift }])>;
>>> (I've just realized there's a gap in the ability to name metadata
>>> when used in this way. I've gone with 'S2.shift' for now)
>>> This is equivalent to:
>>>     def reg_with_shift : GIMatchPredicate<
>>>          (ins reg:$A), (outs ptr_to_reg:$reg, imm:$shift), bool, [{
>>>       // bool matchARMShiftedOp2(const MachineOperand &,
>>> MachineOperand*&, uint64_t&);
>>>       return matchARMShiftedOp2(${A}, ${reg}, ${imm});
>>>     }]>;
>>>     def : GICombineRule<
>>>       (defs root:$D, reg:$S1, reg_with_shift:$S1),
>>>       (match [{MIR %D = G_ADD %S1, %S2.reg }]),
>>>       (apply [{MIR %D = ADD %S1, %S2, %S2.shift }])>;
>>> but the first definition slightly more efficient for most targets
>>> since there's usually a register or two for return values which in
>>> this version we're spending solely on a redundant bool and requires
>>> fewer argument registers, the second version has to store the
>>> MachineOperand* on the stack to pass it by reference. The difference
>>> is fairly small when considered on an individual rule but should
>>> accumulate on large rulesets or large functions.
>>
>> Okay, I see now where you're coming from, and the goal makes a lot of
>> sense to me. I need some help with the syntax and semantics though :)
>>
>> First, just to clarify, the intention is for matchARMShiftedOp2 to
>> match something like "G_SHL %X, ${imm}", right? Why not express that
>> directly in MIR?
>
> Going by
> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/CIHBEAGE.html 
> (which admittedly is old docs but it's unlikely to have changed), it
> needs to be able to match the following:
>
> %0 = G_CONSTANT iN ${imm} # Where imm is any immediate expressible using
> an unsigned 8-bit immediate rotated by an even number of bits
>
> %0 # I.e. just a plain vreg
>
> %0 = G_CONSTANT iN ${imm} # Where imm is 1 to 32
> %2 = G_[AL]SHR %1, %0
>
> %0 = G_CONSTANT iN ${imm} # Where imm is 0 to 31
> %2 = G_SHL %1, %0
>
> %0 = G_CONSTANT iN ${imm} # Where imm is 1 to 31
> %2 = G_ROR %1, %0 # We don't actually have a rotate right at the moment
> so this would have to match the equivalent and/shift/or sequence
>
> %0 = G_CONSTANT iN 1
> %2 = G_ROR %1, %0 # We don't actually have a rotate right at the moment
> so this would have to match the equivalent and/shift/or sequence
> %3 = G_SEXT %2
>
> %2 = G_[AL]SHR %0, %1
>
> %2 = G_SHL %0, %1
>
> %2 = G_ROR %0, %1 # We don't actually have a rotate right at the moment
> so this would have to match the equivalent and/shift/or sequence
>
> I don't think it's strictly necessary to escape into C++ for this case
> (except for checking the immediates have the right value) as there's
> nothing here that can't be done with patterns. Historically, I believe
> the reason targets have taken this route in SelectionDAG is mainly down
> to the fact that SelectionDAG patterns didn't provide a good means to
> factor out common sub-patterns. Until earlier this year, PatFrags really
> only defined abbreviations for common sub-patterns rather than defining
> multiple alternatives. So a target like ARM would have had to duplicate
> patterns support for every operation that supported it. In the case of
> ARM that's 153 (9 sub-patterns * 17 instructions supporting it)
> patterns. By escaping into C++ via ComplexPattern, 17 would do the same
> job with no code duplication.
>
> FWIW, in the long run I would like most cases of this to move to macros
> as the tablegen pass ought to be smart enough to factor out the
> commonality. There will be some that genuinely need the escape hatch but
> I'd like that to be kept to a minimum as the more matching we can
> delegate to tablegen, the more we can do interesting things like measure
> rule coverage, guided fuzzing of rules, profiling the rules, dropping
> low frequency/profit rules from -O1/-O2, proving correctness, etc.
> Conversely the more C++ there is, the more work is required to implement
> those features.

Totally agreed.


>> But for the sake of the argument I'll assume there's a good reason for
>> the escape to C++. The way I understood your original description of
>> GIMatchPredicate, its most generic and versatile form of use is as an
>> explicit entry in the (match) list. Like so:
>>
>>  def reg_with_shift : GIMatchPredicate<
>>      (ins reg:$A), (outs uint64_t:$shift), MachineOperandPtr, [{
>>    // MachineOperand *matchARMShiftedOp2(const MachineOperand &,
>>    //                                    uint64_t &);
>>    return matchARMShiftedOp2(${A}, ${imm});
>>  }]>;
>>  def : GICombineRule<
>>      (defs root:$D, reg:$S1, reg:$S2),
>>      (match [{MIR %D = G_ADD %S1, %tmp }],
>>             (reg_with_shift $tmp, $shift):$S2),
>>      (apply [{MIR %D = ADD %S1, %S2, %shift }])>;
>>
>> I believe that the kind of syntax you showed, in whatever form it
>> eventually ends up, should just be syntactic sugar. It would be
>> de-sugared to essentially what I've shown here, right?
>
> Yes, that's right.
> >> So then the question becomes: what is the sugared syntax, if any?
>>
>> I must say that I really don't like your first example, i.e. this one:
>>
>>  def : GICombineRule<
>>      (defs root:$D, reg:$S1, reg_with_shift:$S2),
>>      (match [{MIR %D = G_ADD %S1, %S2 }]),
>>      (apply [{MIR %D = ADD %S1, %S2, %S2.shift }])>;
>>
>> The syntax very strongly suggests that $S2 will be assigned to the RHS
>> operand of the G_ADD, but there seems to be non-local magic which
>> causes it to be assigned something else instead -- unless I've
>> misunderstood the purpose of the matchARMShiftedOp2?
>
> I see your point here. The matched operand and the result of the
> predicate essentially have the same name at the moment. Something like:
>   (apply [{MIR %D = ADD %S1, %S2.result, %S2.shift }])
> would resolve the ambiguity but then 'S2.result' is a magically
> generated name. We'll need to give the result of the predicate a proper name
>
> I suppose we could treat the first 'outs' to mean the return type but I
> think we ought to be more explicit than that and keep the return value
> separate from the metadata. Something like:
>   def reg_with_shift : GIMatchPredicate<
>       (ins reg:$A), (outs bool), (metadata_out
> MachineOperandPtr:$result, uint64_t:$shift), [{
>     // bool matchARMShiftedOp2(const MachineOperand &, MachineOperand
> *&, uint64_t &);
>     return matchARMShiftedOp2(${A}, ${imm});
>   }]>;
>   def reg_with_shift : GIMatchPredicate<
>       (ins reg:$A), (outs MachineOperandPtr:$result),
> (metadata_out uint64_t:$shift), [{
>     // MachineOperand *matchARMShiftedOp2(const MachineOperand
> &, uint64_t &);
>     return matchARMShiftedOp2(${A}, ${imm});
>   }]>;
> We'd also need something to make using the original %S2 an error as I'm
> sure passing the operand being matched instead of the operand returned
> by the predicate will be a common mistake otherwise:
> let ApplyCannotUseMatchedOperand = 1;
> I'd also be inclined to make it a warning for this to be unset if the
> result has a name.

Is the ability to inline the reg_with_shift into the (defs) worth this
complexity? I'd prefer to keep things simpler and more explicit.

Especially since doing so seems like it would allow getting rid of
(defs) entirely. (I know, I keep harping on this point :))


[snip]

>>>> Let's talk about using DAGs. I think I understand where the
>>>> temptation comes from to describe the MIR using code blocks, but I'm
>>>> also pretty sure that it's a mistake to do so.
>>> The main motivation to move away from DAGs came from limitations in
>>> DAGs and the general ugliness of the original DAG-based definitions.
>>> Once the idea of using MIR came up, we liked the fact that it matched
>>> the existing serialization format which makes it easy to turn a
>>> specific example from the compiler output into a combine rule and
>>> avoid the need for another representation for instructions. We also
>>> liked the way it dealt with the difficult cases,
>>> MachineMemoryOperands, subregs, etc. and also that it wasn't a new
>>> syntax or parser (although it make require modifications to the
>>> existing one). It also looked like it be convenient for a tool like
>>> Alive (https://www.cs.utah.edu/~regehr/papers/pldi15.pdf) although we
>>> didn't really explore that particular thought further.
>>> Ultimately, both representations are DAG's and it comes down to the
>>> convenience of the syntax in specifying the rules we want. MIR is
>>> very convenient because of it's familiarity and flexibility.
>>> One of the bigger problems with using tablegen's DAG type is that it
>>> doesn't deal with multi-result instructions very well. Every time
>>> this has been raised on the list w.r.t SelectionDAG the solution has
>>> boiled down to 'use C++ instead' and it would be good to fix that so
>>> that things like UADDO are representable. You can write a rule that
>>> matches something like divmod at the top-level using the 'set' operator:
>>>   (set $D1, $D2, (divmod $A, $B))
>>> but as soon as it's not the top-level, it gets really ugly fast even
>>> using pseudo-nodes:
>>>   (set (outs $D1, $D2), (sext (result (G_DIVMOD $A, $B):$T, 0)),
>>>                         (sext (result $T, 1)))
>>> In this example, outs is a pseudo-node needed to distinguish the
>>> results from the sinks of the DAG, and result is a pseudo-node that
>>> selects one of the multiple results for use by the parent node.
>>>  Meanwhile the MIR for is:
>>>   %0, %1 = G_DIVMOD %A, %B
>>>   %D1 = G_SEXT %0
>>>   %D2 = G_SEXT %1
>>> which seems much clearer. It becomes a bigger problem when you start
>>> wanting to match target specific instructions as multi-result becomes
>>> more common later in the backend, especially post-isel. It's also a
>>> problem for upside-down matches, inside-out, and multi-root matches
>>> since tablegen's dag type requires a single common root node and can
>>> only draw edges in one direction. If you want to have to root in the
>>> middle then you have to distort the DAG to bring it to the root and
>>> mark up the reversed edges somehow
>>> Moving the result inside the instruction operator helps with these
>>> problems but requires a list of dags and all the temporaries need to
>>> be named:
>>>   [(G_DIVMOD $T1, $T2, $A, $B),
>>>    (G_SEXT $D1, $T1),
>>>    (G_SEXT $D2, $T2)]
>>> and at that point, we're very nearly at MIR.
>>
>> Yes, that was kind of the point. Expressing MIR, but in a way that is
>> native to TableGen :)
>>
>> I should probably clarify that when I wrote "DAG", I meant that only
>> in the sense of the TableGen 'dag' type, which should probably just be
>> called "S-expression" instead of "DAG".
>>
>> I absolutely believe that a list-of-instructions approach is better
>> than what the SDISel currently uses, for all the reasons you mentioned
>> (multiple defs, multiple roots).
>
> Ah ok. I thought that you were aiming for SDISel style DAG's. :-)
>
> At first glance, I think everything can fit into this style. For
> example, matching could be something like:
>    (G_DIVMOD $T1, $T2, $A, $B, debug-loc:$DL1)
>    (DEBUG_VALUE $T1, debug-expr:$DE1)
>    (DEBUG_VALUE $T2, debug-expr:$DE2)
> and the corresponding apply:
>    (DIVMOD $T1, $T2, $A, $B, debug-loc:$DL1)
>    (DEBUG_VALUE $T1, (modify-expr debug-expr:$DE1, ...))
>    (DEBUG_VALUE $T2, (modify-expr debug-expr:$DE2, ...))
> macros would be the same as instructions, etc. I'll work through the
> examples in my notes and see if I can find something reasonable for
> everything in this style.
>
> One potential nit, is that it it's a little odd for the defs to be on
> the right hand side of the opcode. It's easy enough to change:
>    (set $T1, $T2, (G_DIVMOD $A, $B))
> but I think that's worse to read. I'd be inclined to go for everything
> on the right and just label the defs:
>    (G_DIVMOD def:$T1, def:$T2, use:$A, use:$B, debug-loc:$DL1)

I guess I'm so used to looking at textual assembly that I don't find the
defs on the RHS as grating even without the labeling. But that seems
like a minor point.


>> I simply think using the 'dag' type in TableGen is a much better
>> approach because it composes with other TableGen frontend features
>> (e.g., all the substitution rules "just work", you can
>> programmatically compose a dag/sexpr from multiple pieces using !con,
>> etc.).
>
> That makes sense to me.
>
>>> Another issue we didn't like with DAG is that you also end up needing
>>> a lot of pseudo-nodes like EXTRACT_SUBREG whereas these are natural
>>> parts of the MIR syntax. For example:
>>> (set $d, (EXTRACT_SUBREG (MYTGT_ADD $A, $B), sub_lo))
>>> compared to:
>>> %d:sub_lo = MYTGT_ADD %A, %B
>>> and:
>>> (set $d, (REG_SEQUENCE GPR32, (MYTGT_ADD $A, $B), sub_lo, (MYTGT_ADD
>>> $C, $D), sub_hi))
>>> compared to:
>>> %d:sub_lo = MYTGT_ADD %A, %B
>>> %d:sub_hi = MYTGT_ADD %C, %D
>>
>> How about:
>>
>>  (MYTGT_ADD (sub_lo $d), $A, $B),
>>  (MYTGT_ADD (sub_hi $d), $C, $D),
>>
>> That seems acceptable to me.
>
> That looks good to me too.
>
>>> Along the same lines, I also think that the integrated debug-info is
>>> only really practical in MIR. It's possible to shoe-horn DILocation
>>> in using a pseudo-node like so:
>>>    (set $d, (add (mul $a, $b):$mul, $c):$add   -> (set
>>> (merged_dilocation (muladd $a, $b, $c), $mul, $add)
>>> but there isn't really a good place to modify the DIExpression used
>>> by DEBUG_VALUE.
>>>> One reason is that it adds yet another parser, which is more
>>>> maintenance burden without buying much.
>>> The expectation is that we can make use of the existing MIR parser
>>> and replace it's MachineInstr/MachineOperand/etc. generation code
>>> with pattern-matching generation code. I'm going to be looking into
>>> the implementation of that over the next few weeks.
>>>> The more important reason is that DAGs compose better with the rest
>>>> of TableGen. Consider combine rules defined in multiclasses, for
>>>> example. That is very common all through LLVM. In order to mirror an
>>>> existing selection rule in the AMDGPU backend, we'd likely want to
>>>> have something like this:
>>>>
>>>>  multiclass Arithmetic_i16_GIPats<Instruction ginst,
>>>>                                   Instruction inst> {
>>>>    // This is probably wishful thinking -- would be okay if we had to
>>>>    // split this into two different rules due to the different types
>>>>    // of $dst
>>>>    def : GICombineRule<
>>>>      (defs root:$dst, operand:$src0, operand:$src1),
>>>>      (oneof (ginst i16:$dst, i16:$src0, i16:$src1),
>>>>             (match (ginst i16:$tmp, i16:$src0, i16:$src1),
>>>>                    (G_ZEXT i32:$dst, $tmp))),
>>>>      (inst $dst, $src0, $src1)>;
>>>>
>>>>    // A third rule for zext to 64 bits would go here...
>>>>  }
>>> This is the reason that macros are explicitly imported in the defs
>>> section and the imported instances are renameable. The same thing in
>>> the MIR syntax (and using the changes to variable naming from above)
>>> would be:
>>>  multiclass Arithmetic_i16_GIPats<Instruction ginst,
>>>                                   Instruction inst> {
>>>    def : GICombineRule<
>>>      # There's a corner case for the double definition of $dst here.
>>> I'll come back to that later in the mixing concerns comment
>>>      (defs root:$dst, (import-macro ginst:$ginst, def:$dst,
>>> use:$src0, use:$src1)),
>>>      (match [{ %dst:(s16) = ginst %src0:(s16), %src1:(s16) }]), #
>>> I've corrected the i16 to s16 here. i16 isn't a type in GlobalISel
>>>      (apply /* I'll come back to this bit */)>;
>>>    // Covers s32 and s64
>>>    def : GICombineRule<
>>>      (defs root:$dst, operand:$src0, operand:$src1, (import-macro
>>> ginst:$ginst)),
>>>      (match [{ %0:(s16) = ginst %src0:(s16), %src1:(s16)
>>>                %dst = G_ZEXT %0:(s16) }]
>>>             (isS32OrS64 type:$dst)),
>>>      (apply /* I'll come back to this bit */)>;
>>>  }
>>> We could potentially fold the has G_ZEXT case and no G_ZEXT cases
>>> together with a macro but it seems unnecessary complexity in this
>>> case. It might be more worth it if there's a similar G_SEXT variant too.
>>> The reason for the "I'll come back to this bit" in the apply section
>>> is because you've raised an issue I'd forgotten to deal with. The
>>> opcodes in the apply section should also be parameterizable. My first
>>> thought on that is that there should be a macro-equivalent for apply
>>> such as:
>>>  def : GIExpansion<
>>>    (defs def:$R, use:$S, imm:$IDX),
>>>    (apply (select imm:$IDX,
>>>                (0 [{MIR %R = FOO %S }]),
>>>                (1 [{MIR %R = BAR %S }]),
>>>                (2 [{MIR %0 = BAZ %S
>>>                         %R = BAR %0 }])))>;
>>> or alternatively:
>>>   def : GICombineRule<
>>>     ...,
>>>     ...,
>>>     (apply (create_opcode [{ ${IDX} ? MYTGT_A : MYTGT_B }] ...):$name,
>>>            [{MIR %D = name %s }])>;
>>> The former is more powerful, but the latter is more convenient for
>>> trivial cases
>>
>> I admit that you've lost me a little here :)
>>
>> Okay, I see how importing an instruction as a macro is possible
>> without string concatenation, although I have to say that having to
>> express this explicit import still feels quite a bit more cumbersome
>> than the strawman syntax I've shown using the 'dag' type.
>>
>> Again, the 'dag' type just composes better with the rest of TableGen.
>
> With your reply above, I'm coming around to the dag type. I'll see how
> the examples in my notes end up in that style.

I'm very happy to read that :)


>> On the apply-side though, I don't understand where the difficulty
>> comes from. Each instantiation of the multiclass has a single `inst'
>> which needs to go into the apply rule. That shouldn't be so difficult
>> to express -- at least with the 'dag' type it's trivial.
>
> Yeah, I went off on a tangent a bit :-).
>
> For your example: Yes, there's only one inst for each multiclass
> instantiation so it's simple.
>
> For the tangent I was on, consider:
> %0 = G_CONSTANT iN %x
> %1 = G_CONSTANT iN %y
> %2 = G_LSHR %a, %x
> %3 = G_LSHR %1, %y
> =>
> %0 = G_CONSTANT iN %z # z = x+y
> %1 = G_LSHR %a, %0
> and:
> %0 = G_CONSTANT iN %x
> %1 = G_CONSTANT iN %y
> %2 = G_ASHR %a, %x
> %3 = G_ASHR %1, %y
> =>
> %0 = G_CONSTANT iN %z # z = x+y
> %1 = G_ASHR %a, %0
> and:
> %0 = G_CONSTANT iN %x
> %1 = G_CONSTANT iN %y
> %2 = G_SHL %a, %x
> %3 = G_SHL %1, %y
> =>
> %0 = G_CONSTANT iN %z # z = x+y
> %1 = G_SHL %a, %0
>
> These rules are very similar so it would be nice to merge them like so:
> %0 = G_CONSTANT iN %x
> %1 = G_CONSTANT iN %y
> %2 = {G_SHL,G_ASHR,G_LSHR} %a, %x
> %3 = <same-shift-as-%2> %1, %y
> =>
> %0 = G_CONSTANT iN %z # z = x+y
> %1 = <same-shift-as-match> %a, %0

Ah, I see what you mean.

My first instinct is to say: you could use a TableGen class or even a
`foreach` as lightweight methods of generating three separate
records/defs and thus three separate rules for that:

   foreach inst = [G_LSHR, G_ASHR, G_SHL] in
   def : GICombineRule<
     ... simply use inst here in the appropriate place ...
     >;

Then again, maybe it would be useful to write it as a single rule if it
allows the TableGen backend to generate better matching code by
explicitly expressing the similarity between the rules.

I'm not sure which is better: writing more but simpler rules, and
relying on the backend to piece things together again and exploit
optimization opportunities, or writing simpler but more complex rules.


>>>> Obviously you can achieve this in your proposal as well using string
>>>> concatenation, but that does seem rather backwards.
>>>>
>>>> Something else that falls out rather nicely from using DAGs is that
>>>> it gives you a natural way to give a name to an _instruction_, to be
>>>> able to pass it to a predicate for extended checks (and as a
>>>> potentially more natural way of preserving debug locations!).
>>> I don't think we need a means to name instructions since instructions
>>> can be trivially found from of their results using
>>> MachineOperand::getParent().
>>> As noted above, merging debug locations doesn't fit very well into
>>> DAG's as you have to use a pseudo node to be able to provide both
>>> DILocations on the same result instruction, and there's no real means
>>> to include DEBUG_LOC for temporaries that are eliminated and need to
>>> be rebased from another value.
>>
>> Hmm... I feel like we need more concrete examples for debug info to
>> discuss this. One of the reasons I've brought up naming instructions
>> is precisely because I thought it could simplify transferring DebugLocs.
>>
>> Again, I'm not suggesting the use of a DAG like in SDSel, as I think
>> that's a mistake. I simply mean the 'dag' type of TableGen, which
>> should really be called 'sexpr' or something.
>
> I think we can use the same idea you suggested for subregisters:
>    (DIVMOD $T1, $T2, $A, $B, (merge_debug_loc $DL1, $DL2))

That sounds good!


>>>> I have a more explicit example of this later.
>>>>
>>>>
>>>>> _'Upside-down' matches (i.e. roots at the top) and similar_
>>>>> This one requires algorithm changes which I'd prefer not to discuss
>>>>> in this RFC. Assuming the underlying algorithm gains support for
>>>>> this, this is how the syntax would look:
>>>>> def : GICombineRule<
>>>>>   (defs root:$root, reg:$A),
>>>>>   (match [{MIR %1 = G_LOAD %root
>>>>>                %A = G_SEXT %1 }]),
>>>>>   (apply [{MIR %A = G_SEXTLOAD %root }])>;
>>>>> The only unusual thing about this rule is that the root isn't at
>>>>> the bottom. Instead of starting at a use and matching towards defs,
>>>>> we're starting at the def and matching towards uses. This has some
>>>>> potentially useful properties. The combine algorithm has to chose
>>>>> an insertion point for the replacement code and the traditional
>>>>> choice has been the root of the match. Assuming we keep doing the
>>>>> same thing, 'upside-down' matching like this is able to avoid the
>>>>> checks that the load is safe to move, is non-volatile, has one use,
>>>>> etc. that would be necessary if we moved the G_LOAD down to the
>>>>> G_SEXT. Also, assuming we keep the same broadly bottom-up
>>>>> processing order as the existing Combine algorithm this kind of
>>>>> rule also has relatively lower priority than conventional rules
>>>>> because the root is seen later. This can be useful as
>>>>> (algorithm-dependent) the MIR may be more settled by the time it
>>>>> tries to match.
>>>>> Along the same lines, the syntax can potentially support the root
>>>>> being in the middle of a match. This could be used in a similar way
>>>>> to the upside-down match above to control the insertion point and
>>>>> priority. For example:
>>>>> def : GICombineRule<
>>>>>   (defs root:$root, reg:$A, reg:$B, reg:$C),
>>>>>   (match [{MIR %1(s32) = G_TRUNC %A(s64)
>>>>>                %2(s32) = G_TRUNC %B(s64)
>>>>>                %root = G_ADD %1, %2
>>>>>                %C(s64) = G_SEXT %root }]),
>>>>>   (emit [{MIR %root = G_ADD %A, %B
>>>>>               %C = G_SEXT_INREG %root }])>;
>>>>> Unfortunately, I don't have any concrete examples where this would
>>>>> be useful in comparison to a conventional or upside-down match at
>>>>> the moment. I'm mostly keeping the door open as I can see potential
>>>>> for benefits (mostly w.r.t sinking and hoisting safety around an
>>>>> instr that would be difficult to test for that) given an
>>>>> appropriate rule and also because I'm inclined to say that the
>>>>> tablegen syntax shouldn't be the reason it's not possible. It
>>>>> should be up to the Combine algorithm and and tablegen-erate pass
>>>>> involved in specializing the algorithm for a target.
>>>>
>>>> I'm concerned that the concrete syntax proposed here mixes a bunch
>>>> of concerns that really should be addressed separately:
>>>>
>>>> 1. Defining a mapping between semantically equivalent chunks of code.
>>>> 2. Controlling insertion points as a simple way to keep side effects
>>>> under control in many cases.
>>>> 3. Controlling additional algorithmic aspects of the combine
>>>> algorithm (whether you're matching up or down, say).
>>> I think there may be a misunderstanding with #3. There's no real
>>> concept of matching up or down in the rules, there's only the
>>> starting point for the match and from there whether it's following
>>> the uses or defs isn't something the rule needs to be concerned
>>> about. The algorithm needs to be aware of it (and historically hasn't
>>> been, DAGCombine broadly runs bottom up and expects to match towards
>>> defs and can fail to re-schedule nodes for re-visit in many common
>>> cases) but the algorithm is expected to figure that out for itself.
>>> 'root' definitions are the only place that #2 and #3 come into the
>>> proposed syntax. At the moment, the only requirement the syntax
>>> imposes on the algorithm is that it has a concept of a current
>>> MachineOperand Def* (and by extension instruction) that will be used
>>> as the starting point for the match and that it can choose a valid
>>> insertion point for each instruction produced. The insertion point is
>>> the same position in the current algorithm but that's algorithm
>>> dependent and doesn't have to be the case in the long run. There may
>>> even be more than one for rules that emit multiple instructions in
>>> the apply section. There's currently no means to specify an insertion
>>> point, it's left up to the algorithm to find a place for each
>>> instruction.  That said, I'd be surprised if there was an algorithm
>>> where the current MachineOperand isn't also at least one of the
>>> insertion points.
>>> * The current upstream code is a bit misleading about that and makes
>>> it seem that it's the MachineInstr but I made that mistake before on
>>> ISel so I'm intending to change that.
>>>> Could we keep those separate somehow? E.g., don't distinguish
>>>> between 'root' and 'reg' or 'operand' in the (defs ...) part of the
>>>> rule; have TableGen figure out the roots automatically based on
>>>> walking the graph implied by the matching rule, and default to using
>>>> the sink(s) as roots.
>>> operand isn't really connected to this, it's just a wildcard for an
>>> MachineOperand that doesn't care whether it's a reg, imm, MCExpr, or
>>> something else.
>>> I'm inclined to agree that it would be nice to get 'root' out of the
>>> definition but I don't really see a good way to do it. The sinks
>>> aren't always the right choice and are a particularly bad choice on
>>> combines like the extending loads since that's trying to match all
>>> uses simultaneously and those uses could have any opcode.
>>
>> How about having it as a lettable variable? As in (using my favorite
>> syntax, but it should translate either way):
>>
>>  let MatchRoot = "$tmp" in
>>  def : GICombineRule<
>>      (match (G_LOAD $tmp, $ptr),
>>             (G_SEXT $dst, $tmp)),
>>      (apply (G_SEXTLOAD $dst, $ptr))>;
>>
>> I'm using $tmp instead of $ptr as the root here since you wrote above
>> that it needs to be a Def.
>>
>> Which makes the discussion about the insertion point potentially
>> confusing, since $tmp no longer appears in the (apply). But maybe that
>> doesn't matter.
>
> The insertion point is up to the algorithm anyway so it makes sense for
> it to not be visible.
>
>> If MatchRoot is not set, it'd just default to using a sink -- that
>> should be by far the most common case.
>
> If we change that to all the sinks then we cover the common multi-root
> case too.
>
>> Also, it'd be nice to get rid of the (defs) part of the syntax
>> entirely for brevity. I believe that should be possible.
>
> I'm not very keen to remove the defs section but I'll see whether it's
> still needed in this style. Even if it's removable though, having it is
> likely to reduce the time spent on the tablegen pass as there's a cost
> to having to deduce information. I also think it's also important for
> error checking since it provides a means to declare which values you
> expect to see both in the match/apply whereas it's expected that some
> are only matched.
By reducing the time spent you mean the time spent in TableGen? That may
well be the case :)

If it's only about making error checking easier, maybe it could be an
optional annotation? Something like:

   def : GICombineRule<
     (match (INST $dst, $src0, $src1),
            reg:$src0, reg:src1),
     (apply ...)>;

Just spit-balling here admittedly.


>>>> Then add some optional let-able variable in the GICombineRule to
>>>> define non-default rules for the combine algorithm.
>>>>
>>>> Controlling the insertion point to address loads like in your
>>>> example above could be done quite naturally if the MIR is expressed
>>>> in a DAG:
>>>>
>>>>  def : GICombineRule<
>>>>    (match (G_LOAD $tmp, $ptr):$load,
>>>>           (G_SEXT $dst, $tmp)),
>>>>    (apply (insertat $load),
>>>>           (G_SEXTLOAD $dst, $ptr))>;
>>> This isn't quite the same thing as the extending loads combine in the
>>> example. In that combine, it's matching the load and all uses of its
>>> result and chosing a replacement based on the global usage of that
>>> def across the whole function, rewriting conflicting
>>> sext/zext/aext/trunc's to be as cheap as possible (a lot of the
>>> detail of that is inside the function but the code is upstream if you
>>> want to see the details). This rule is matching a single use and
>>> relying on CSE to de-dupe the N G_SEXTLOAD's that come out of it. To
>>> illustrate the difference, consider:
>>> %0(s8) = G_LOAD %p
>>> %d1(s16) = G_SEXT %0
>>> %d2(s32) = G_SEXT %0
>>> If I apply the above rule as much as possible to this (twice) with
>>> the roots at the sink, I'll get:
>>> %d1(s16) = G_SEXTLOAD %p
>>> %d2(s32) = G_SEXTLOAD %p
>>> whereas if I apply the extending loads combine to it (once), I'll get:
>>> %d2(s32) = G_SEXTLOAD %p
>>> %d1(s16) = G_TRUNC %d2
>>> which will normally be cheaper to execute. In combination with other
>>> rules and CSE, the first version could eventually turn into the
>>> second version but it requires more memory churn and processing time
>>> to get there.
>>
>> You've lost me again, sorry. Where does the G_TRUNC suddenly come
>> from? Is that the rule that uses `Helper.matchCombineExtendingLoads`?
>>
>> In that case, don't worry, I only intended my example to be comparable
>> to the simpler G_LOAD / G_SEXTLOAD rule without C++ escape that you
>> showed in your original mail as well.
>
> Yes, it comes from the Helper.applyCombineExtendingLoads(...), which
> looking back at the thread is from the macros section and not the upside
> down matching section you were commenting on. Sorry about that
>
>> If, on the other hand, you have a plan to get to the more advanced
>> result of matching multiple G_SEXTs and using G_TRUNC _without_ a C++
>> escape, I'd be _very_ interested to learn more about that!
>
> It's easy enough to match multiple uses in the current rule definitions.
> You just describe each use in the match section and have a predicate
> check that the operands aren't the same. However, C++ is the only way to
> look at all the users of a given value and modify them differently at
> the moment. I've had some vague thoughts about a (for-each-use ...) in
> the match and apply section but I haven't given that much thought yet.

Okay, makes sense.


>>> One of the things I'm planning to look into in the algorithm is
>>> making it smarter about combines involving multiple-uses in the
>>> intermediate operands. DAGCombine is harmfully greedy in this area
>>> and will often duplicate expensive operations if one use is
>>> combinable but another isn't. This has led to the addition of
>>> hasOneUse() checks which solves that but takes things to the opposite
>>> extreme. I think it should be possible to find an algorithm that
>>> makes an intelligent decision for defs with multiple uses.
>>
>> Absolutely agreed that we need better solutions here.
>>
>> In the AMDGPU backend, we have some rather interesting challenges
>> around those. This is perhaps a bit of a far future tangent, but one
>> of them is that floating point operations can optionally negate their
>> operands; however, this can be at the cost of a larger instruction
>> encoding, and the rules for that are rather complex. So there's a
>> problem of optimizing *where* we insert the negations to improve code
>> size.
>>
>> In a computation that is a tree with a single sink, an optimal
>> solution can be found in linear time using dynamic programming (maybe
>> there's an easier way, but I'm not aware of one). With general DAGs,
>> the problem is of course much harder.
>>
>> The negation is just one example, and it would be pretty awesome if we
>> could use a declarative database of combines to make progress at the
>> very least in local search optimizations.
>
> It certainly seems that something like this ought to be possible. A set
> of combines could find the local minimums and it's also possible for
> combine rules to avoid making changes and record the possible places for
> the negation and the associated costs which ought to make it possible to
> reach the optimum for the tree.
>
> Along the same lines, it occurs to me that known-bits analysis is very
> similar and could be done declaratively with rules like:
>   def : GIKnownBitsRule<
>       (match reg:$dst, reg:$src))
>       (match (G_ZEXT $dst, $src)),
>       (apply (exec [{
> KnownZero.setBitsFrom(MRI.getType(${src}).getSizeInBits()) }], reg:$src))>;

That's a cool idea.

By the way, did I mention that I *really* like the idea of using ${var}
expansions in the C++ fragments? That's so much nicer than what we have
for the SDIsel :)

Cheers,
Nicolai

--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

David Jones via llvm-dev
In reply to this post by David Jones via llvm-dev
Daniel Sanders via llvm-dev <[hidden email]> writes:

> Unfortunately, Instruction is a class-hierachy based design with
> specialized getters for each subclass whereas MachineInstr is a single
> one-size-fits-all class. It's likely to be difficult to unify them
> with templates. A common API sounds feasible to some degree, certainly
> the G_* opcodes and the Instruction hierarchy ought to be able to
> agree on abstraction that can interact with both, and
> MachineMemOperand ought to be able to agree with
> LoadInst/StoreInst/Atomic*Inst

Yeah, that's along the lines of what I was thinking.  Long way off, of
course.  :)

> I expect it to be able to deal with ISel too. The main difference
> between the two is that ISel's apply step is required to constrain
> operands to a register class whereas Combine doesn't need to do that
> but can choose to. I'm intending to share code between the tablegen
> passes for Combine and ISel so that it will be the same underlying
> code generator.

Cool.  If combine can choose to enforce constraints, then the mechanism
is there for isel to use.

> The syntax could also work for the Legalizer, and RegBankAlloc too.

I hadn't thought about legalizer and things like RegBankAlloc.  That's
neat!  Even though legalizer is a special case, there still may be value
in making its specification more declarative.

                               -David
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Tablegen-erated GlobalISel Combine Rules

David Jones via llvm-dev
Hi Daniel,

I finally read the proposal.

I have a couple of comments that re-enforce what I was thinking
initially: it is too early IMHO to develop a full tablegen backend for
this. I believe we still miss ways to represent things that matters
(see my comments below) and I am afraid we still don't know all that
needs to be represented.

> It can reach across basic blocks when it's safe to do so. We may need to introduce a means to limit the inter-basic-block reach as I can foresee situations where it may pull expensive operations into a loop if it's not careful about this

I think we would need a syntax for that and in particular, I don't
think a generic heuristic based on frequency can solely solve the
problem (i.e., what you're hinting in your RFC).

Another thing that we miss has to do with the number of users. E.g.,
should a pattern only be applied if there is only one user? Is it
valid if there are more that one users, if we cannot move something
into the destination block can we still apply the pattern is we move
the result in the source block, etc.

The number of users may not be that relevant, but this gives use is a
notion of profitability (it is just how SDISel does this). This
actually ties back to the inter-basic-block case: is this pattern
profitable if it is between basic blocks with different frequencies.
In other words, I believe we should start to think in terms of how
profitable is a pattern. E.g., the source pattern as some cost and the
destination patterns as some other cost. Applying the pattern should
give us a profitability score, which would be higher when it produces
dead code (i.e., what the oneUse check tries to identify in SDISel).

Following that logic, we may get away with all the disabling specific
patterns kind of thing. I.e., if something is not profitable it does
not get applied.

All in all, the syntax that your suggesting is reasonable (though IMHO
MIR is going to limit our ability to share more logic with IR, but
maybe that's fine to give up on that goal?), but I feel it actually
distracts us from thinking about the problems we are solving and thus,
I am hesitant to stamp a work that would write a new tablegen backend.

That said, again, I am not the one doing the work!

Cheers,
-Quentin
[snip]
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev