Two Regalloc Enhancements

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Two Regalloc Enhancements

dag-7
We have two features for register allocation we'd like to contribute if folks
think they are worthwhile.  We want to get a read on whether they will be
useful to people.

The first features backschedules reloads during the spilling phase.  As
reloads are generated, we have some very simple code to try to schedule them
as far ahead of the use as possible.

The second features modifies linearscan to try to spread register usage out a
bit.  Rather than always grabbing the first free register in the allocatable
list, it remembers the last few registers recently assigned and does not reuse
them unless there are no other registers available.  This tends to help the
backscheduling code by distributing register usage and providing more
scheduling freedom.  It also can induce spilling where none was there before
if the allocator has "just enough" registers.  We haven't noticed any serious
performance problems in practice.

With both patches, we have seen performance improvements on some codes.

I know there's some work on post-ra scheduling going on which would probably
supercede the reload backscheduling code.  If that's coming soon, there's
probably not much point in contributing it.  The "round-robin" register
assignment would help and post-ra scheduler.

What's the community's opinion on whether these two features are worth
committing to the public repository?

                                   -Dave
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Two Regalloc Enhancements

Evan Cheng-2

On Jul 23, 2009, at 12:42 PM, David Greene wrote:

> We have two features for register allocation we'd like to contribute  
> if folks
> think they are worthwhile.  We want to get a read on whether they  
> will be
> useful to people.
>
> The first features backschedules reloads during the spilling phase.  
> As
> reloads are generated, we have some very simple code to try to  
> schedule them
> as far ahead of the use as possible.

Ok.

>
> The second features modifies linearscan to try to spread register  
> usage out a
> bit.  Rather than always grabbing the first free register in the  
> allocatable
> list, it remembers the last few registers recently assigned and does  
> not reuse
> them unless there are no other registers available.  This tends to  
> help the
> backscheduling code by distributing register usage and providing more
> scheduling freedom.  It also can induce spilling where none was  
> there before
> if the allocator has "just enough" registers.  We haven't noticed  
> any serious
> performance problems in practice.

Ok. As with any heuristics change, some tests will benefit, some will  
suffer. I am ok with both sets of changes assuming there are ways to  
control them.

>
> With both patches, we have seen performance improvements on some  
> codes.
>
> I know there's some work on post-ra scheduling going on which would  
> probably
> supercede the reload backscheduling code.  If that's coming soon,  
> there's
> probably not much point in contributing it.  The "round-robin"  
> register
> assignment would help and post-ra scheduler.

Post-ra scheduling has been working for a while. The reason it's not  
turned on for x86 is it's not helping much (1 or 2%) while the compile  
time cost is too high (~9% codegen time). I assume you guys are doing  
your experiments using AMD processors. It could be Intel's uArch is  
just not benefiting from the load scheduling.

Round-robin register assignment probably will help post-ra scheduling.  
However, for small functions it may end up increase the number of  
registers used. That can be bad for performance.

>
> What's the community's opinion on whether these two features are worth
> committing to the public repository?

I welcome the features as long as we can add them as llc-beta first.  
Once we have some more testing across all the platforms, we can then  
decide whether they can be turned on. Is that ok?

Evan

>
>                                   -Dave
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Two Regalloc Enhancements

dag-7
On Thursday 23 July 2009 18:07, Evan Cheng wrote:

> Ok. As with any heuristics change, some tests will benefit, some will
> suffer. I am ok with both sets of changes assuming there are ways to
> control them.

Yep, we have flags.

> Post-ra scheduling has been working for a while. The reason it's not
> turned on for x86 is it's not helping much (1 or 2%) while the compile
> time cost is too high (~9% codegen time). I assume you guys are doing
> your experiments using AMD processors. It could be Intel's uArch is
> just not benefiting from the load scheduling.

Yes, I can imagine there would be differences here.  The memory architectures
are quite different.

> Round-robin register assignment probably will help post-ra scheduling.
> However, for small functions it may end up increase the number of
> registers used. That can be bad for performance.

Correct.  As I said, we haven't noticed any degredation.  But our code base is
very different from yours.  :)

> > What's the community's opinion on whether these two features are worth
> > committing to the public repository?
>
> I welcome the features as long as we can add them as llc-beta first.
> Once we have some more testing across all the platforms, we can then
> decide whether they can be turned on. Is that ok?

Fine with me.  How do I do this?  Just put them under some flags that default
to false?

                                  -Dave
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Two Regalloc Enhancements

Evan Cheng-2

On Jul 23, 2009, at 4:23 PM, David Greene wrote:

> On Thursday 23 July 2009 18:07, Evan Cheng wrote:
>
>> Ok. As with any heuristics change, some tests will benefit, some will
>> suffer. I am ok with both sets of changes assuming there are ways to
>> control them.
>
> Yep, we have flags.
>
>> Post-ra scheduling has been working for a while. The reason it's not
>> turned on for x86 is it's not helping much (1 or 2%) while the  
>> compile
>> time cost is too high (~9% codegen time). I assume you guys are doing
>> your experiments using AMD processors. It could be Intel's uArch is
>> just not benefiting from the load scheduling.
>
> Yes, I can imagine there would be differences here.  The memory  
> architectures
> are quite different.
>
>> Round-robin register assignment probably will help post-ra  
>> scheduling.
>> However, for small functions it may end up increase the number of
>> registers used. That can be bad for performance.
>
> Correct.  As I said, we haven't noticed any degredation.  But our  
> code base is
> very different from yours.  :)
>
>>> What's the community's opinion on whether these two features are  
>>> worth
>>> committing to the public repository?
>>
>> I welcome the features as long as we can add them as llc-beta first.
>> Once we have some more testing across all the platforms, we can then
>> decide whether they can be turned on. Is that ok?
>
> Fine with me.  How do I do this?  Just put them under some flags  
> that default
> to false?

Yep! Thanks.

Evan
>
>                                  -Dave
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Two Regalloc Enhancements[MESSAGE NOT SCANNED]

Mark Shannon-2
In reply to this post by dag-7
Hi David,

What effect is there on compile time?

David Greene wrote:

> We have two features for register allocation we'd like to contribute if folks
> think they are worthwhile.  We want to get a read on whether they will be
> useful to people.
>
> The first features backschedules reloads during the spilling phase.  As
> reloads are generated, we have some very simple code to try to schedule them
> as far ahead of the use as possible.
>
> The second features modifies linearscan to try to spread register usage out a
> bit.  Rather than always grabbing the first free register in the allocatable
> list, it remembers the last few registers recently assigned and does not reuse
> them unless there are no other registers available.  This tends to help the
> backscheduling code by distributing register usage and providing more
> scheduling freedom.  It also can induce spilling where none was there before
> if the allocator has "just enough" registers.  We haven't noticed any serious
> performance problems in practice.
>
> With both patches, we have seen performance improvements on some codes.
>
> I know there's some work on post-ra scheduling going on which would probably
> supercede the reload backscheduling code.  If that's coming soon, there's
> probably not much point in contributing it.  The "round-robin" register
> assignment would help and post-ra scheduler.
>
> What's the community's opinion on whether these two features are worth
> committing to the public repository?
>
>                                    -Dave
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Two Regalloc Enhancements[MESSAGE NOT SCANNED]

greened
On Friday 24 July 2009 06:46:24 Mark Shannon wrote:
> Hi David,
>
> What effect is there on compile time?

None.  These are very simple changes.  At most each
reload takes a wee bit longer because the code walks
backward through the BB to schedule.

                                      -Dave

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: Two Regalloc Enhancements

Alireza.Moshtaghi
In reply to this post by dag-7
These seem to be the exact opposite of what the PIC16 port requires.
However, since the behavior is controllable I don't think they will
break our port.

Just a little background, PIC16 only has one register; so preloading it
is just going to cause too many useless spills...

Regards,
Ali

> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]]
On
> Behalf Of David Greene
> Sent: Thursday, July 23, 2009 12:42 PM
> To: [hidden email]
> Subject: [LLVMdev] Two Regalloc Enhancements
>
> We have two features for register allocation we'd like to contribute
if
> folks
> think they are worthwhile.  We want to get a read on whether they will
be
> useful to people.
>
> The first features backschedules reloads during the spilling phase.
As
> reloads are generated, we have some very simple code to try to
schedule
> them
> as far ahead of the use as possible.
>
> The second features modifies linearscan to try to spread register
usage
> out a
> bit.  Rather than always grabbing the first free register in the
> allocatable
> list, it remembers the last few registers recently assigned and does
not
> reuse
> them unless there are no other registers available.  This tends to
help
> the
> backscheduling code by distributing register usage and providing more
> scheduling freedom.  It also can induce spilling where none was there
> before
> if the allocator has "just enough" registers.  We haven't noticed any
> serious
> performance problems in practice.
>
> With both patches, we have seen performance improvements on some
codes.
>
> I know there's some work on post-ra scheduling going on which would
> probably
> supercede the reload backscheduling code.  If that's coming soon,
there's
> probably not much point in contributing it.  The "round-robin"
register

> assignment would help and post-ra scheduler.
>
> What's the community's opinion on whether these two features are worth
> committing to the public repository?
>
>                                    -Dave
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev