[llvm-dev] [RFC] MC support for variant scheduling classes.

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] [RFC] MC support for variant scheduling classes.

Joel E. Denny via llvm-dev
Hi all,

The goal of this RFC is to make information related to variant scheduling
classes accessible at MC level. This would help tools like llvm-mca
understand/resolve variant scheduling classes.

To achieve this goal, I plan to introduce a new class of scheduling predicates
named MCSchedPredicate. An MCSchedPredicate allows the definition of boolean
expressions with a well-known semantic, that can be used to generate code for
both MachineInstr and MCInst.

The new predicates are designed to be completely optional. Scheduling models
can use a combination of SchedPredicate and MCSchedPredicate to describe
variant reads and writes. Old scheduling predicate definitions would still be
valid. New MCSchedPredicates would behave like normal scheduling predicates.

A bit of background
-------------------

Variant scheduling classes model situations where the instruction profile
depends on the value of certain operands.

For example, modern x86 processors know that a register-register XOR is a
zero-idiom if both operands are the same register. That means, the XOR would
be optimized out at register renaming stage, and no opcode issued to the
pipelines. A variant scheduling class can be used to describe this case (see
example below):

```
def ZeroIdiomWrite : SchedWriteRes<[]> { let Latency = 0; }

def ZeroIdiom : SchedPredicate<[{
    MI->getOpcode() == X86::XORrr &&
    MI->getOperand(0).getReg() == MI->getOperand(1).getReg()
}]>;

def WriteXOR : SchedWriteVariant<[
   SchedVar<ZeroIdiom,   [ZeroIdiomWrite],
   SchedVar<NoSchedPred, [WriteALU]
>;
```

Problems with the current design
--------------------------------

A SchedPredicate is essentially a custom block of C++ code used by the
SubtargetEmitter to generate a condition through a boolean expression.
A SchedPredicate sees all the definitions that are "captured" by the
`PredicateProlog` (another block of C++ code). It can also access public
members of TargetSchedule.

A common pattern used by the ARM scheduling models to define predicates is:
 - PredicateProlog "captures" the TargetInstrInfo object from the
   TargetSchedule object.
 - Each predicate uses the "captured" TargetInstrInfo object (TII) to call
   helpers exposed by the (target specific) InstrInfo interface.

Note that TargetSchedule and TargetInstrInfo are both CodeGen concepts.

SchedPredicate definitions only work on MachineInstr objects. Therefore, the
C++ code block is not portable (i.e. it doesn't work if the input instruction
is a MCInst).  The `MI` used by the ZeroIdiom definition from the previous
example is a MachineInstr *.

The main problem with this design is that predicates don't have a "portable"
semantic.  A predicate is essentially an opaque block of code, and the
semantic of predicates is unknown to tablegen. Tablegen can only trust the
user, and just "copy-paste" code blocks from the various predicates to an
auto-generated `XXXGenSubtargetInfo::resolveSchedClass()` function.

This limits our ability to reason on predicates. In particular, it makes it
extremely hard (if not impossible) for tools that can only access the MC layer
to reuse predicate definitions to resolve variant scheduling classes.

If instead we expose the semantic of predicates to tablegen, we can then teach
tablegen how to generate an equivalent code-block that works on MCInst.

In the next section I show how I plan to expose the semantic of scheduling
predicates to tablegen. I will then go through a couple of examples describing
how the new predicate syntax can be used, and finally I will describe the
patches required to implement this feature.

A new class of scheduling predicates
------------------------------------

MCSchedPredicate allows the definition of scheduling predicates that have a
well-defined portable semantic. They can be used in place of SchedPredicate to
define SchedReadVariant and SchedWriteVariant definitions in tablegen.

An MCSchedPredicate definition is built on top of an MCPredicate. MCPredicate
definitions can be composed together to form complex boolean expressions.

To better understand how these new predicates work, let's have a look at the
following example.

```
def M3BranchLinkFastPred  : SchedPredicate<[{MI->getOpcode() == AArch64::BLR &&
                                             MI->getOperand(0).isReg() &&
                                             MI->getOperand(0).getReg() !=
                                             AArch64::LR}]>;
```

This tablegen code snippet has been taken from AArch64/AArch64SchedExynosM3.td

Predicate `M3BranchLinkFastPred` can be rewritten using an MCSchedPredicate
definition as follows:

```
def M3BranchLinkFastPred  : MCSchedPredicate<
  CheckAllOf<[
    CheckOpcode<[BLR]>,
    CheckRegOperand<0>,
    CheckNot<CheckRegOperandValue<0, LR>>]>
  >;
```

The MCSchedPredicate uses a `CheckAllOf`, which is a "composition of
predicates", and returns true only if every predicate in the composition
returns true. Note that `CheckAllOf`, `CheckOpcode`, `CheckRegOperand` and
`CheckNot` are all MCPredicate classes.

Each predicate class has a well known semantic. For example, `CheckOpcode` is
only used to check if the opcode of an instruction is part of a set of opcodes.
In this example, CheckOpcode is used to check if the instruction is a BLR.

This new syntax allows the definition of predicates in a declarative way.
These new predicates don't require custom blocks of C++, and can be used to
define conditions without being bound to a particular representation (i.e.
MachineInstr vs MCInst).

It also means that tablegen backends are now able to parse and understand the
logic of each predicate check. But more importantly, tablegen backends gained
the ability to "lower" scheduling predicates into code that work on MCInst too.

A more complicated example involving TII method calls.
------------------------------------------------------

This code is taken from the AArch64 Cyclone scheduling model:

```
def WriteZPred : SchedPredicate<[{TII->isGPRZero(*MI)}]>;
def WriteImmZ  : SchedWriteVariant<[
                   SchedVar<WriteZPred, [WriteX]>,
                   SchedVar<NoSchedPred, [WriteImm]>]>;
```

Predicate WriteZPred is used to check if a GPR instruction is a zero-idiom.
The rationale is that zero-idioms have zero latency and don't consume
processor resources.

The predicate logic is defined by method `isGPRZero()`, which is accessible
through the TII object (i.e. a `const AArch64InstrInfo *`).

Below is the definition of `isGPRZero` in AArch64/AArch64InstrInfo.cpp:

```
// Return true if this instruction simply sets its single destination register
// to zero. This is equivalent to a register rename of the zero-register.
bool AArch64InstrInfo::isGPRZero(const MachineInstr &MI) {
  switch (MI.getOpcode()) {
  default:
    break;
  case AArch64::MOVZWi:
  case AArch64::MOVZXi: // movz Rd, #0 (LSL #0)
    if (MI.getOperand(1).isImm() && MI.getOperand(1).getImm() == 0) {
      assert(MI.getDesc().getNumOperands() == 3 &&
             MI.getOperand(2).getImm() == 0 && "invalid MOVZi operands");
      return true;
    }
    break;
  case AArch64::ANDWri: // and Rd, Rzr, #imm
    return MI.getOperand(1).getReg() == AArch64::WZR;
  case AArch64::ANDXri:
    return MI.getOperand(1).getReg() == AArch64::XZR;
  case TargetOpcode::COPY:
    return MI.getOperand(1).getReg() == AArch64::WZR;
  }
  return false;
}
```

That logic can be replaced by the following MCPredicate definitions:

```
def CheckMOVZ : CheckAllOf<[
  CheckOpcode<[MOVZWi, MOVZXi]>,
  CheckNumOperands<3>,
  CheckImmOperand<1>,
  CheckZeroOperand<1>,
  CheckImmOperand<2>,
  CheckZeroOperand<2>
]>;

def CheckANDW : CheckAllOf<[
  CheckOpcode<[ANDWri]>,
  CheckRegOperand<1>,
  CheckRegOperandValue<1, WZR>
]>;

def CheckANDX : CheckAllOf<[
  CheckOpcode<[ANDXri]>,
  CheckRegOperand<1>,
  CheckRegOperandValue<1, XZR>
]>;

def CheckCOPY : CheckAllOf<[
  CheckPseudo<[COPY]>,
  CheckRegOperand<1>,
  CheckRegOperandValue<1, WZR>
]>;

// Return true if this instruction simply sets its single destination register
// to zero. This is equivalent to a register rename of the zero-register.

def IsGPRZero : TIIPredicate<"AArch64", "isGPRZero",
  AnyOfMCPredicates<[CheckMOVZ, CheckANDW, CheckANDX, CheckCOPY]>>;
```

TIIPredicate definitions are used to model calls to the target-specific
InstrInfo.

A TIIPredicate definition is treated specially by the InstrInfoEmitter
tablegen backend, which will use it to automatically generate a definition
in the target specific `GenInstrInfo` class.

Basically, we can tell tablegen to generate that definition for us.

Now that the description of IsGPRZero is available in the form of a
MCPredicate, we can modify the original SchedWriteVariant WriteImmZ as follows:

```
def WriteZPred : MCSchedPredicate<IsGPRZero>;

def WriteImmZ : SchedWriteVariant<[
                  SchedVar<WriteZPred, [WriteX]>,
                  SchedVar<SchedDefault, [WriteImm]>]>;
```

How to resolve scheduling classes from MC
-----------------------------------------

MCSubtargetInfo will gain a new method:

```
  /// Resolve a variant scheduling class for the given MCInst and CPU.
  virtual unsigned
  resolveVariantSchedClass(unsigned SchedClass, const MCInst *MI,
                           unsigned CPUID) const {
    return 0;
  }
```

The SubtargetEmitter is resonsible for processing scheduling classes and
generate an override for that method.

This is what the SubtargetEmitter generates for the Cyclone and Exynos3M if we
implement the changes described by the previous sections:

```
unsigned resolveVariantSchedClass(unsigned SchedClass,
    const MCInst *MI, unsigned CPUID) const override {
  switch (SchedClass) {
  case 117: // BLR
    if (CPUID == 5) { // ExynosM3Model
      if ((
          ( MI->getOpcode() == AArch64::BLR )
          && MI->getOperand(0).isReg()
          && MI->getOperand(0).getReg() != AArch64::LR
        ))
        return 934; // M3WriteAB
      if (true)
        return 935; // M3WriteAC
    }
    break;
  case 386: // MOVZWi_MOVZXi
    if (CPUID == 3) { // CycloneModel
      if (AArch64_MC::isGPRZero(*MI))
        return 930; // WriteX
      if (true)
        return 962; // WriteImm
    }
    break;
  case 387: // ANDWri_ANDXri
    if (CPUID == 3) { // CycloneModel
      if (AArch64_MC::isGPRZero(*MI))
        return 930; // WriteX
      if (true)
        return 962; // WriteImm
    }
    break;
  case 695: // ANDWri
    if (CPUID == 3) { // CycloneModel
      if (AArch64_MC::isGPRZero(*MI))
        return 930; // WriteX
      if (true)
        return 962; // WriteImm
    }
    break;
  };
  // Don't know how to resolve this scheduling class.
  return 0;
  }
};
```

Note that this override will become a member of a new tablegen'd class named
AArch64GenMCSubtargetInfo. That class would directly extend MCSubtargetInfo.
Class AArch64GenMCSubtargetInfo is what will get instantiated by method
`Target::createMCSubtargetInfo()`.

----
Let's go back to the definition of IsGPRZero using a TIIPredicate.

```
def IsGPRZero : TIIPredicate<"AArch64", "isGPRZero",
  AnyOfMCPredicates<[CheckMOVZ, CheckANDW, CheckANDX, CheckCOPY]>>;
```

This is how the InstructionInfoEmitter expands the method in the tablegen'd
class AArch64GenInstrInfo:

```
  static bool isGPRZero(const MachineInstr &MI) {
    return (
      (
        (
          MI.getOpcode() == AArch64::MOVZWi
          || MI.getOpcode() == AArch64::MOVZXi
        )
        && MI.getNumOperands() == 3
        && MI.getOperand(1).isImm()
        && MI.getOperand(1).getImm() == 0
        && MI.getOperand(2).isImm()
        && MI.getOperand(2).getImm() == 0
      )
      || (
        ( MI.getOpcode() == AArch64::ANDWri )
        && MI.getOperand(1).isReg()
        && MI.getOperand(1).getReg() == AArch64::WZR
      )
      || (
        ( MI.getOpcode() == AArch64::ANDXri )
        && MI.getOperand(1).isReg()
        && MI.getOperand(1).getReg() == AArch64::XZR
      )
      || (
        ( MI.getOpcode() == TargetOpcode::COPY )
        && MI.getOperand(1).isReg()
        && MI.getOperand(1).getReg() == AArch64::WZR
      )
    );
  }
```

Another variant of function `isGPRZero` is expanded in the AArch64_MC
namespace (see below):

```
#ifdef GET_GENINSTRINFO_MC_DECL
#undef GET_GENINSTRINFO_MC_DECL
namespace llvm {
class MCInst;

namespace AArch64_MC {

bool isGPRZero(const MCInst &MI);

} // end AArch64_MC namespace
} // end llvm namespace
#endif // GET_GENINSTRINFO_MC_DECL

#ifdef GET_GENINSTRINFO_MC_HELPERS
#undef GET_GENINSTRINFO_MC_HELPERS
namespace llvm {
namespace AArch64_MC {

bool isGPRZero(const MCInst &MI) {
  return (
    (
      (
        MI.getOpcode() == AArch64::MOVZWi
        || MI.getOpcode() == AArch64::MOVZXi
      )
      && <...snip...>
    )
  );

} // end AArch64_MC namespace
} // end llvm namespace
#endif // GET_GENISTRINFO_MC_HELPERS
```

Function isGPRZero would live in namespace AArch64_MC.
The declaration of AArch64_MC::isGPRZero has to be made visible to
AArch64MCTargetDesc.h, so that it becomes known to the new
`resolveVariantSchedClass()` method.

As a side note: all this code is guarded by macro definitions. This allows to
control their expansion (if we decide that we don't want them).


What to do next
---------------
I have a series of three patches ready to be sent upstream for review.

The first patch is mostly a no functional change. It introduces the new
scheduling predicate class in tablegen, and it teaches the
InstructionInfoEmitter and the SubtargetEmitter how to expand MCSchedPredicate
definitions.
The first patch is up for review here: https:://reviews.llvm.org/D46695.

The second patch would teach the SubtargetEmitter how to generate method
resolveVariantSchedClass().

The last patch of the sequence will teach llvm-mca how to use method
`resolveVariantSchedClass()` to resolve variant classes. llvm-mca will generate an error if the variant scheduling class cannot be resolved.

Review https://reviews.llvm.org/D46697 is the union of patch1 and patch2 only.
It is not meant to be reviewed at this stage, since it contains the code
changes related to patch1.

The third patch is available here: https://reviews.llvm.org/D46698.
D46698 requires patch1 and patch2.

Bonus (optional) patches:
 1) [X86] Teach scheduling models how to recognize zero-idioms.
    This would make easier to review the llvm-mca change.
 2) [X86] Add variant scheduling classes for LEA instructions.
 3) [AArch64] Rewrite the predicates mentioned by this RFC.

People that are interested in seeing how to implement "optional" patch 3 can
have a look at the review here: https://reviews.llvm.org/D46701

Please let me know what you think.

Thanks,
Andrea

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] MC support for variant scheduling classes.

Joel E. Denny via llvm-dev


On May 10, 2018, at 8:58 AM, Andrea Di Biagio <[hidden email]> wrote:

Hi all,

The goal of this RFC is to make information related to variant scheduling
classes accessible at MC level. This would help tools like llvm-mca
understand/resolve variant scheduling classes.

To achieve this goal, I plan to introduce a new class of scheduling predicates
named MCSchedPredicate. An MCSchedPredicate allows the definition of boolean
expressions with a well-known semantic, that can be used to generate code for
both MachineInstr and MCInst.

The new predicates are designed to be completely optional. Scheduling models
can use a combination of SchedPredicate and MCSchedPredicate to describe
variant reads and writes. Old scheduling predicate definitions would still be
valid. New MCSchedPredicates would behave like normal scheduling predicates.

<snip>

What to do next
---------------
I have a series of three patches ready to be sent upstream for review.

The first patch is mostly a no functional change. It introduces the new
scheduling predicate class in tablegen, and it teaches the
InstructionInfoEmitter and the SubtargetEmitter how to expand MCSchedPredicate
definitions.
The first patch is up for review here: https:://reviews.llvm.org/D46695.

The second patch would teach the SubtargetEmitter how to generate method
resolveVariantSchedClass().

The last patch of the sequence will teach llvm-mca how to use method
`resolveVariantSchedClass()` to resolve variant classes. llvm-mca will generate an error if the variant scheduling class cannot be resolved.

Review https://reviews.llvm.org/D46697 is the union of patch1 and patch2 only.
It is not meant to be reviewed at this stage, since it contains the code
changes related to patch1.

The third patch is available here: https://reviews.llvm.org/D46698.
D46698 requires patch1 and patch2.

Bonus (optional) patches:
 1) [X86] Teach scheduling models how to recognize zero-idioms.
    This would make easier to review the llvm-mca change.
 2) [X86] Add variant scheduling classes for LEA instructions.
 3) [AArch64] Rewrite the predicates mentioned by this RFC.

People that are interested in seeing how to implement "optional" patch 3 can
have a look at the review here: https://reviews.llvm.org/D46701

Please let me know what you think.

Thanks,
Andrea

Fantastic writeup! It’s great to see so much progress on fundamental infrastructure.

My time for LLVM code review is extremely limited. Can someone work with Andrea to get these patches in?

-Andy


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] MC support for variant scheduling classes.

Joel E. Denny via llvm-dev
On 10 May 2018 at 21:58, Andrew Trick <[hidden email]> wrote:
> Fantastic writeup! It’s great to see so much progress on fundamental
> infrastructure.
>
> My time for LLVM code review is extremely limited. Can someone work with
> Andrea to get these patches in?

Hi Andrew,

Same here, but this has been a long goal for me, too, so I'll do my best.


--
cheers,
--renato
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] MC support for variant scheduling classes.

Joel E. Denny via llvm-dev
Thanks Andrew and Renato,

One think I didn't mention, and I should probably made it more explicit in my RFC is that: "the new predicate framework is extensible".

That means, developers can extend it by adding new Check predicates.
As long as they also teach the PredicateExpander how to do the lowering for those new predicates, then everything should be fine.

--

In the RFC I mentioned how we can use a TIIPredicate to let tablegen auto-generate two version of function `isGPRZero`:
 1) a version that takes a MachineInstr as input, and that is automatically generated by tablegen into the target-specific instruction info class.
  2) another version (still auto-generated by tablegen) that takes a MCInst as input.

The goal is to help users defining a predicate the check logic. If we use a TIIPredicate, we specify the logic only once, in a declarative way, and then we let tablegen generate code for us.

If for some reason, a user doesn't want to use this approach, then they can still provide their own implementation for variant 2. (i.e. the version of `isGPRZero` that takes a MCInst as input).

We can then introduce a new MCPredicate as follows:

```
// MCInstVariant and MachineInstVariant are both function names.
//
// MCinstVariant is the function to call if we want to check properties on MCInst.
// MachineInstrVariant is the function to call if we want to check properties on MachineInstr.

CheckFunction<string MCInstVariant, string MachineInstrVariant> : MCPredicate {
  string MCinstFn = MCInstVariant;
  string MachineInstrFn = MachineInstrVariant;
}
```

Then we teach the PredicateExpander (in utils/Tablegen/PredicateExpander.cpp|.h) how to lower that new predicate.

Here is an example of how this could be done:
```
void PredicateExpander::expandCheckFunction(formatted_raw_ostream &OS, StringRef MCInstVariant, StringRef MachineInstrVariant) {
  if (shouldExpandForMC())
    OS << MCInstVariant;
  else
    OS << MachineInstrVariant;
  OS << "(MI)";
}
```

Basically, if we are generating code for MC, then we expand a call to " MCInstVariant ". Otherwise, we expand a call to " MachineInstrVariant" (both user defined functions).

--

The bottom line is: the framework is extensible.
As long as we tell the PredicateExpander how to lower/expand our new predicates, then we can implement different approaches.

I hope this answers to the comments from https://reviews.llvm.org/D46701.

Thanks,
Andrea

On Thu, May 10, 2018 at 10:24 PM, Renato Golin <[hidden email]> wrote:
On 10 May 2018 at 21:58, Andrew Trick <[hidden email]> wrote:
> Fantastic writeup! It’s great to see so much progress on fundamental
> infrastructure.
>
> My time for LLVM code review is extremely limited. Can someone work with
> Andrea to get these patches in?

Hi Andrew,

Same here, but this has been a long goal for me, too, so I'll do my best.


--
cheers,
--renato


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] MC support for variant scheduling classes.

Joel E. Denny via llvm-dev


On May 11, 2018, at 4:26 AM, Andrea Di Biagio <[hidden email]> wrote:

The goal is to help users defining a predicate the check logic. If we use a TIIPredicate, we specify the logic only once, in a declarative way, and then we let tablegen generate code for us.

If for some reason, a user doesn't want to use this approach, then they can still provide their own implementation for variant 2. (i.e. the version of `isGPRZero` that takes a MCInst as input).

The important thing is that users can call into target-specific TII entry points (i.e. not declared in TargetInstrInfo as a virtual method). The reason I provided a C++ hook is so that users could do this without learning the tablegen backend. Although if it’s easy to define a new target specific TIIPredicate, that’s fine too.

-Andy

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] MC support for variant scheduling classes.

Joel E. Denny via llvm-dev


On Fri, May 11, 2018 at 6:00 PM, Andrew Trick <[hidden email]> wrote:


On May 11, 2018, at 4:26 AM, Andrea Di Biagio <[hidden email]> wrote:

The goal is to help users defining a predicate the check logic. If we use a TIIPredicate, we specify the logic only once, in a declarative way, and then we let tablegen generate code for us.

If for some reason, a user doesn't want to use this approach, then they can still provide their own implementation for variant 2. (i.e. the version of `isGPRZero` that takes a MCInst as input).

The important thing is that users can call into target-specific TII entry points (i.e. not declared in TargetInstrInfo as a virtual method). The reason I provided a C++ hook is so that users could do this without learning the tablegen backend. Although if it’s easy to define a new target specific TIIPredicate, that’s fine too.

-Andy

I agree.
The auto generated target hook is not a virtual method. It is always a static member of the XXXGenInstrInfo class automatically generated by tablegen.

The latest version of patch 1 (https://reviews.llvm.org/D46695) also added a new `CheckFunctionCall` predicate.
People that don't find easy to use a TIIPredicate can now use a CheckFunctionCall instead. Overall, I think it is good to have alternatives to using TIIPredicate; some TII hooks used for predicates can be very long, and converting them into predicates can be error prone. Spotting logical mistakes in auto-generated (especially in big code blocks) can be difficult and annoying.
Ideally, TIIPredicate will be used in most cases where the predicate function is small. If the logic gets too complicated, then people can always use a `CheckFunctionCall`.

Cheers
-Andrea



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev