[RFC, ARM] expanding RET to CopyToReg;BRIND

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[RFC, ARM] expanding RET to CopyToReg;BRIND

Rafael Espíndola
I have changed the way in which the ARM backend generates a function
return. Instead of expanding a RET to a CopyToReg;RETFLAG, it now
expands into a CopyToReg;BRIND. I haven't commit it yet, but the patch
is attached.

In my opinion the resulting code is easier to understand, but I have
some questions:

Why all backends use RETFLAG?

Why it is named RETFLAG?

Why the Copy that places the result must have a Flag operand? If I
understand correctly, the Flag operand exists in nodes that use a flag
register (cpsr in ARM).

Thanks,
Rafael

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

llvm.patch (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [RFC, ARM] expanding RET to CopyToReg;BRIND

Chris Lattner
On Tue, 30 May 2006, [UTF-8] Rafael Esp?ndola wrote:
> I have changed the way in which the ARM backend generates a function
> return. Instead of expanding a RET to a CopyToReg;RETFLAG, it now
> expands into a CopyToReg;BRIND. I haven't commit it yet, but the patch
> is attached.

Ok, I haven't looked at the code, but you're free to do whatever make
sense.

> In my opinion the resulting code is easier to understand, but I have
> some questions:
>
> Why all backends use RETFLAG?

The backends seem to be doing the following:

1. For 'ret void', a "ISD::RET" node is left along and not lowered.  As
    such, it gets directly pattern matched.  On PPC, for example, we have:

// Return void support.
def : Pat<(ret), (BLR)>;

    ... which maps it directly to the PPC "blr" instruction.

2. For 'ret value', the targets custom lower the ISD::RET node into some
    number of CopyToReg nodes (to set up the live outs), then need a node
    to represent the return.  The return node has to be flagged do the
    copies, so that the scheduler doesn't make the copies wander from the
    return.

> Why it is named RETFLAG?

Historical reason.  Originally we didn't have nodes that could
*optionally* have an input flag.  A better design, e.g. on PPC would be to
have a PPCISD::RET node, which takes an optional input flag, and always
lower RET to it.

> Why the Copy that places the result must have a Flag operand? If I
> understand correctly, the Flag operand exists in nodes that use a flag
> register (cpsr in ARM).

Flag in the SelectionDAG stuff is so named because it was originally used
for condition codes.  However, it has since grown to mean "keep these two
nodes always together".  In the case of return, you want the scheduler to
produce code like this (on PPC):

   ...
   R3 = outval_virtreg
   blr

not like this:

   ...
   R3 = outval_virtreg
   ...
   blr

So the copy and blr are flagged together.

Another case where flags are useful are for things like the X86 variable
shift instruction.  There the shift amount is required to be in the CL
register, so we generate code like this:


    CL = shamt_virtreg
    X = shl Y, CL

We don't want the copy and shift to wander apart from each other (e.g. we
don't want another shift to get scheduled in between them), so we flag
them together.  In practice, these copies usually get coallesced away.

-Chris

--
http://nondot.org/sabre/
http://llvm.org/
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC, ARM] expanding RET to CopyToReg;BRIND

Rafael Espíndola
> > Why it is named RETFLAG?
>
> Historical reason.  Originally we didn't have nodes that could
> *optionally* have an input flag.  A better design, e.g. on PPC would be to
> have a PPCISD::RET node, which takes an optional input flag, and always
> lower RET to it.
I See. I will try to always lower to "(mov)*;bx lr" on ARM.

> Flag in the SelectionDAG stuff is so named because it was originally used
> for condition codes.  However, it has since grown to mean "keep these two
> nodes always together".  In the case of return, you want the scheduler to
> produce code like this (on PPC):
That clarifies a lot! Thanks.

>    ...
>    R3 = outval_virtreg
>    blr
>
> not like this:
>
>    ...
>    R3 = outval_virtreg
>    ...
>    blr
>
> So the copy and blr are flagged together.
>
> Another case where flags are useful are for things like the X86 variable
> shift instruction.  There the shift amount is required to be in the CL
> register, so we generate code like this:
>
>
>     CL = shamt_virtreg
>     X = shl Y, CL
>
> We don't want the copy and shift to wander apart from each other (e.g. we
> don't want another shift to get scheduled in between them), so we flag
> them together.  In practice, these copies usually get coallesced away.
In the second case shl explicitly uses CL. Shouldn't the register
allocator be smart enough to avoid scheduling an instruction that
destroys CL in between them?

In the first case, what do you think about making it possible for an
instruction to optionally depend on a value? That is, make blr depend
on R3 or R3/R4 depending on the type of the return value. Something
like
a = DAG.getNode(ISD::BRIND, MVT::Other, Copy, LR);
a.addUse(PPC::R3)

> -Chris

Thanks,
Rafael
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC, ARM] expanding RET to CopyToReg;BRIND

Chris Lattner
On Wed, 31 May 2006, [UTF-8] Rafael Esp?ndola wrote:
>> We don't want the copy and shift to wander apart from each other (e.g. we
>> don't want another shift to get scheduled in between them), so we flag
>> them together.  In practice, these copies usually get coallesced away.
> In the second case shl explicitly uses CL. Shouldn't the register
> allocator be smart enough to avoid scheduling an instruction that
> destroys CL in between them?

Right, the register allocator sees that.  The problem is the scheduler: we
have to represent the dependence between the copy and the shift in a way
that can be expressed in the dependence dag.  We do this with a flag edge.

> In the first case, what do you think about making it possible for an
> instruction to optionally depend on a value? That is, make blr depend
> on R3 or R3/R4 depending on the type of the return value. Something
> like
> a = DAG.getNode(ISD::BRIND, MVT::Other, Copy, LR);
> a.addUse(PPC::R3)

You can play games like that, but I wouldn't suggest it.  It's better to
just force the copy to be inserted in the right place.  When the register
allocator runs, it does know about register lifetimes and other
constraints, and knows that R3/R4 are live out (as indicated by
MF.addLiveOut(...) ).

-Chris

--
http://nondot.org/sabre/
http://llvm.org/
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC, ARM] expanding RET to CopyToReg;BRIND

Rafael Espíndola
On 5/31/06, Chris Lattner <[hidden email]> wrote:

> On Wed, 31 May 2006, [UTF-8] Rafael Esp?ndola wrote:
> >> We don't want the copy and shift to wander apart from each other (e.g. we
> >> don't want another shift to get scheduled in between them), so we flag
> >> them together.  In practice, these copies usually get coallesced away.
> > In the second case shl explicitly uses CL. Shouldn't the register
> > allocator be smart enough to avoid scheduling an instruction that
> > destroys CL in between them?
>
> Right, the register allocator sees that.  The problem is the scheduler: we
> have to represent the dependence between the copy and the shift in a way
> that can be expressed in the dependence dag.  We do this with a flag edge.
Sorry. I meant the scheduler. There is a data dependency already. One
instruction defines CL. The other one uses it.

> > In the first case, what do you think about making it possible for an
> > instruction to optionally depend on a value? That is, make blr depend
> > on R3 or R3/R4 depending on the type of the return value. Something
> > like
> > a = DAG.getNode(ISD::BRIND, MVT::Other, Copy, LR);
> > a.addUse(PPC::R3)
>
> You can play games like that, but I wouldn't suggest it.  It's better to
> just force the copy to be inserted in the right place.  When the register
> allocator runs, it does know about register lifetimes and other
> constraints, and knows that R3/R4 are live out (as indicated by
> MF.addLiveOut(...) ).

I am going to use the Flag by now :-)
I was just thinking that the imposed restriction is stronger then
necessary. For example, it might be valid to reorder

R3 = ...
R4 = ...
blr

into
R4 = ..
R3 = ...
blr

> -Chris

Thanks once more,
Rafael
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [RFC, ARM] expanding RET to CopyToReg;BRIND

Chris Lattner
On Wed, 31 May 2006, [UTF-8] Rafael Esp?ndola wrote:
>> Right, the register allocator sees that.  The problem is the scheduler: we
>> have to represent the dependence between the copy and the shift in a way
>> that can be expressed in the dependence dag.  We do this with a flag edge.

> Sorry. I meant the scheduler. There is a data dependency already. One
> instruction defines CL. The other one uses it.

Right.  The problem is that there is no notion of "next to" and no strict
ordering when building the scheduling graph, so data dependencies must be
explicitly represented in the dag.

>> > In the first case, what do you think about making it possible for an
>> > instruction to optionally depend on a value? That is, make blr depend
>> > on R3 or R3/R4 depending on the type of the return value. Something
>> > like
>> > a = DAG.getNode(ISD::BRIND, MVT::Other, Copy, LR);
>> > a.addUse(PPC::R3)
>>
>> You can play games like that, but I wouldn't suggest it.  It's better to
>> just force the copy to be inserted in the right place.  When the register
>> allocator runs, it does know about register lifetimes and other
>> constraints, and knows that R3/R4 are live out (as indicated by
>> MF.addLiveOut(...) ).
>
> I am going to use the Flag by now :-)
> I was just thinking that the imposed restriction is stronger then
> necessary. For example, it might be valid to reorder
>
> R3 = ...
> R4 = ...
> blr
>
> into
> R4 = ..
> R3 = ...
> blr

Right, but notice that "..." is just a virtual register.  Once the copies
are coallesced away, this basically amounts to saying "R3 must have this
value and R4 must have this value before the blr".  It doesn't constrain
in which order R3/R4 as assigned.

-Chris

>> -Chris
>
> Thanks once more,
> Rafael
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

-Chris

--
http://nondot.org/sabre/
http://llvm.org/
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev