[llvm-dev] Different SelectionDAGs for same CPU

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] Different SelectionDAGs for same CPU

Sudhindra kulkarni via llvm-dev
Hi,
I used 2 different compilers to compile the same IR for the same custom target.
The LLVM IR code is

define i32 @_Z9test_mathv() #0 {
  %a = alloca i32, align 4
  %1 = load i32, i32* %a, align 4
  ret i32 %1
}

Before instruction selection, the Selection DAGs are the same:

Optimized legalized selection DAG: %bb.0 '_Z9test_mathv:'
SelectionDAG has 7 nodes:
  t0: ch = EntryToken
    t4: i32,ch = load<(dereferenceable load 4 from %ir.a)> t0, FrameIndex:i32<0>, undef:i32
  t6: ch,glue = CopyToReg t0, Register:i32 $r4, t4
  t7: ch = UISD::Ret t6, Register:i32 $r4, t6:1


But after it, one has 1 more node than the other

compiler 1
===== Instruction selection ends:
Selected selection DAG: %bb.0 '_Z9test_mathv:'
SelectionDAG has 8 nodes:
  t0: ch = EntryToken
      t1: i32 = add TargetFrameIndex:i32<0>, TargetConstant:i32<0>
    t4: i32,ch = LDWI<Mem:(dereferenceable load 4 from %ir.a)> t1, t0
  t6: ch,glue = CopyToReg t0, Register:i32 $r4, t4
  t7: ch = JLR Register:i32 $r4, t6, t6:1


compiler 2
===== Instruction selection ends:
Selected selection DAG: BB#0 '_Z9test_mathv:'
SelectionDAG has 7 nodes:
  t0: ch = EntryToken
    t4: i32,ch = LDWI<Mem:LD4[%a](dereferenceable)> TargetFrameIndex:i32<0>, TargetConstant:i32<0>, t0
  t6: ch,glue = CopyToReg t0, Register:i32 %$r4, t4
  t7: ch = JLR Register:i32 %$r4, t6, t6:1

In the first case, node t1 is a separate node whereas in the second case, t1 is inside t4. What difference in implementation could explain this difference in behavior? Where in the code should I look into?
(Note that "LDWI" is an instruction that adds up a register and an immediate and loads the memory content located at the address represented by the sum into a register)

Thanks.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Different SelectionDAGs for same CPU

Sudhindra kulkarni via llvm-dev
Hi Josh,

On Tue, 22 Jan 2019 at 04:54, Josh Sharp via llvm-dev
<[hidden email]> wrote:
> In the first case, node t1 is a separate node whereas in the second case, t1 is inside t4. What difference in implementation could explain this difference in behavior?

The second compiler looks like someone has added extra code to fold a
stack address calculation into the load operation that accesses the
variable.

> Where in the code should I look into?

It could be implemented in a couple of places. Most likely is that
XYZInstrInfo.td (or some related TableGen file) defines a
ComplexPattern that is used by the LDWI instruction definition. That
ComplexPattern tells pattern matching to call a specific function in
XYZISelDAGToDAG.cpp when deciding what to use for the LDWI operands.
That C++ function is probably what looks for an FrameIndex node and
has been taught that it can be folded into the load.

If you just grep the target's code for FrameIndex or frameindex you
should find it pretty quickly though, even if they used some other
method. There don't tend to be many uses of that particular node.

Cheers.

Tim.
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Different SelectionDAGs for same CPU

Sudhindra kulkarni via llvm-dev
Hi Tim,

>That C++ function is probably what looks for an FrameIndex node and
>has been taught that it can be folded into the load.

How do you teach a function that a node can be folded into an instruction?


From: Tim Northover <[hidden email]>
Sent: Monday, January 21, 2019 11:52 PM
To: Josh Sharp
Cc: via llvm-dev
Subject: Re: [llvm-dev] Different SelectionDAGs for same CPU
 
Hi Josh,

On Tue, 22 Jan 2019 at 04:54, Josh Sharp via llvm-dev
<[hidden email]> wrote:
> In the first case, node t1 is a separate node whereas in the second case, t1 is inside t4. What difference in implementation could explain this difference in behavior?

The second compiler looks like someone has added extra code to fold a
stack address calculation into the load operation that accesses the
variable.

> Where in the code should I look into?

It could be implemented in a couple of places. Most likely is that
XYZInstrInfo.td (or some related TableGen file) defines a
ComplexPattern that is used by the LDWI instruction definition. That
ComplexPattern tells pattern matching to call a specific function in
XYZISelDAGToDAG.cpp when deciding what to use for the LDWI operands.
That C++ function is probably what looks for an FrameIndex node and
has been taught that it can be folded into the load.

If you just grep the target's code for FrameIndex or frameindex you
should find it pretty quickly though, even if they used some other
method. There don't tend to be many uses of that particular node.

Cheers.

Tim.

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Different SelectionDAGs for same CPU

Sudhindra kulkarni via llvm-dev
On Sat, 26 Jan 2019 at 00:15, Josh Sharp <[hidden email]> wrote:
> >That C++ function is probably what looks for an FrameIndex node and
> >has been taught that it can be folded into the load.
>
> How do you teach a function that a node can be folded into an instruction?

Well, if you look at the SelectAddrModeIndexed function in
AArch64ISelDAGToDAG.cpp for example, at the top it checks whether the
address we're selecting is an ISD::FrameIndex; if so, it converts it
into an equivalent TargetFrameIndex (so that LLVM knows it's already
been selected) and makes that the base of the address operand, and
adds a dummy TargetConstant 0 as the offset operand; then it returns
true to indicate it was able to match part of the DAG for that
instruction.

Other key things to look at in that particular example is the
am_indexed8 definition, which is where TableGen is taught about that
C++ function (well, actually SelectAddrMode8, but that just
immediately calls SelectAddrMode with an extra "8" argument), and the
definition of LDRB which uses that am_indexed8 in a pattern.

The definitions are quite a maze of multiclass expansions, so I
sometimes find it easier to run llvm-tblgen without a backend (from my
build directory "bin/llvm-tblgen ../llvm/lib/Target/AArch64/AArch64.td
-I ../llvm/include -I ../llvm/lib/Target/AArch64"). That expands
everything so that you can (say) look at all the parts that make up
LDRBui (the key instruction) in one place -- all of its operands and
patterns and bits etc.

Cheers.

Tim.
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Different SelectionDAGs for same CPU

Sudhindra kulkarni via llvm-dev
Tim,

I was able to fold the stack address calculation into the load operation as you said. Is the approach the same if I want to fold any target instruction into any another target instruction? Specifically, I'm trying to get from this

t0: ch = EntryToken
      t8: i32 = MOVRI TargetConstant:i32<0>
    t1: i32,i1,i1,i1,i1 = ADDR TargetFrameIndex:i32<0>, t8
  t3: ch,glue = CopyToReg t0, Register:i32 $r4, t1
  t4: ch = JLR Register:i32 $r4, t3, t3:1

to this

t0: ch = EntryToken
    t1: i32,i1,i1,i1,i1 = ADDR TargetFrameIndex:i32<0>, MOVRI:i32,i1,i1
  t3: ch,glue = CopyToReg t0, Register:i32 $r4, t1
  t4: ch = JLR Register:i32 $r4, t3, t3:1

Thanks.


From: Tim Northover <[hidden email]>
Sent: Saturday, January 26, 2019 12:15 AM
To: Josh Sharp
Cc: via llvm-dev
Subject: Re: [llvm-dev] Different SelectionDAGs for same CPU
 
On Sat, 26 Jan 2019 at 00:15, Josh Sharp <[hidden email]> wrote:
> >That C++ function is probably what looks for an FrameIndex node and
> >has been taught that it can be folded into the load.
>
> How do you teach a function that a node can be folded into an instruction?

Well, if you look at the SelectAddrModeIndexed function in
AArch64ISelDAGToDAG.cpp for example, at the top it checks whether the
address we're selecting is an ISD::FrameIndex; if so, it converts it
into an equivalent TargetFrameIndex (so that LLVM knows it's already
been selected) and makes that the base of the address operand, and
adds a dummy TargetConstant 0 as the offset operand; then it returns
true to indicate it was able to match part of the DAG for that
instruction.

Other key things to look at in that particular example is the
am_indexed8 definition, which is where TableGen is taught about that
C++ function (well, actually SelectAddrMode8, but that just
immediately calls SelectAddrMode with an extra "8" argument), and the
definition of LDRB which uses that am_indexed8 in a pattern.

The definitions are quite a maze of multiclass expansions, so I
sometimes find it easier to run llvm-tblgen without a backend (from my
build directory "bin/llvm-tblgen ../llvm/lib/Target/AArch64/AArch64.td
-I ../llvm/include -I ../llvm/lib/Target/AArch64"). That expands
everything so that you can (say) look at all the parts that make up
LDRBui (the key instruction) in one place -- all of its operands and
patterns and bits etc.

Cheers.

Tim.

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev