predicated execution

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

predicated execution

Dietmar Ebner
hi,

we're trying to come up with a static compiler based on llvm for a 4-
way vliw architecture with full support for predicated execution. a  
distinguishing feature is the omission of flag registers. instead,  
conditions can be paired with a particular instruction within the  
same bundle.

a performance critical issue will be proper use of predicated  
execution. if-conversion can either be performed early in the code  
generation process [1] (exposing larger basic blocks to the  
optimizers) or deferred until code generation is almost
complete [2].

for the time being, i'm planning to go with the second approach and  
have a late optimization pass over the selected machine instructions  
that
a] preliminarily schedules and bundles the selected instructions
b] speculatively executes instructions in the predecessor block if  
there are unused resources
c] converts blocks B into a appropriately predicated version  
(eliminating branches) if it's profitable for the particular  
architecture.

this approach will require only some minor extensions to the existing  
class hierarchy to represent instruction bundles and predicates.  
additionally, it appears to be more reasonable to make the decision  
what to execute conditionally late in the compilation process when we  
can easily account for architectural constraints than perform if-
conversion early and undoing it if necessary. also, previous work [2]  
indicates, that this approach is able to retain almost all  
opportunities for if-conversion present in the input program.

for what i've seen so far (i'm not yet fully familiar with llvm),  
there's almost no existing support for predicated execution. however,  
architectures such as arm and ia64 should show a significant speedup.  
are there already people working on those enhancements or are there  
any short-term plans? also, does anybody have strong objections  
against the approach outlined above? any comments are kindly welcome!


cheers and thanks in advance,

-
dietmar

[1] August et al. : A Framework for Balancing Control Flow and  
Predication
http://portal.acm.org/citation.cfm?id=266800.266810

[2] Snavely et al. : Predicate Analysis and If-Conversion in an  
Itanium Link-Time Optimizer
http://systems.cs.colorado.edu/EPIC2/papers/s3-3-snavely.pdf


---------------------------------------------------------------------
Dietmar Ebner
CD Laboratory - Compilation Techniques for Embedded Processors
Institut fuer Computersprachen  E: [hidden email]
Technische Universitaet Wien    T: (+431) 58801-18598
Argentinierstrasse 8 / E1851    F: (+431) 58801-58521
1040 Wien, Austria              W: www.complang.tuwien.ac.at/cd/ebner



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: predicated execution

Evan Cheng-2

On Mar 7, 2007, at 6:45 AM, Dietmar Ebner wrote:

> hi,
>
> we're trying to come up with a static compiler based on llvm for a 4-
> way vliw architecture with full support for predicated execution. a
> distinguishing feature is the omission of flag registers. instead,
> conditions can be paired with a particular instruction within the
> same bundle.

Ok.

> a performance critical issue will be proper use of predicated
> execution. if-conversion can either be performed early in the code
> generation process [1] (exposing larger basic blocks to the
> optimizers) or deferred until code generation is almost
> complete [2].

Option 1 makes a lot of sense, most predication aware compilers go  
this route. I haven't read the paper. But I am guessing their system  
is rediscovering CFG etc. at link time, reverse compiling assembly  
code into some internal representation? That is an interesting  
approach for a company who is servicing clients who do not provide  
their source code.

If you use LLVM and you build option 1, you can do something like  
option 2 without a lot of horribleness. :-) LLVM does link time  
optimization at bytecode level.

>
> for the time being, i'm planning to go with the second approach and
> have a late optimization pass over the selected machine instructions
> that
> a] preliminarily schedules and bundles the selected instructions
> b] speculatively executes instructions in the predecessor block if
> there are unused resources
> c] converts blocks B into a appropriately predicated version
> (eliminating branches) if it's profitable for the particular
> architecture.

Ok. It doesn't sound like option 2 though (it has nothing to do with  
LTO). Item (b) sounds like meld scheduling, if that what you are  
considering?

>
> this approach will require only some minor extensions to the existing
> class hierarchy to represent instruction bundles and predicates.
> additionally, it appears to be more reasonable to make the decision
> what to execute conditionally late in the compilation process when we
> can easily account for architectural constraints than perform if-
> conversion early and undoing it if necessary. also, previous work [2]
> indicates, that this approach is able to retain almost all
> opportunities for if-conversion present in the input program.

Sounds reasonable especially given the current LLVM infrastructure.  
The current register allocator is not predication aware. Therefor if-
conversion too aggressively would over-subscribe the available  
resources. Once you have the exact instruction count and register  
requirement, you should be able to make a better if-conversion decision.

>
> for what i've seen so far (i'm not yet fully familiar with llvm),
> there's almost no existing support for predicated execution. however,
> architectures such as arm and ia64 should show a significant speedup.
> are there already people working on those enhancements or are there
> any short-term plans? also, does anybody have strong objections
> against the approach outlined above? any comments are kindly welcome!

I don't know enough about your target architecture to critique your  
choice. This isn't the "traditional" flow for a fully predicated VLIW  
model, but it's likely you can get it up and running at reasonable  
time frame.

LLVM does not have much support for predicated execution. You can  
introduce predicate register class, but the backend passes would not  
treat them differently.

However, you are in luck! It's extremely likely we will be starting  
on predication work for the ARM backend in the very near future. ARM  
has an extremely limited predication model (just condition codes), so  
it's not yet clear how much of our work will be applicable to your  
needs. But we'll try to keep our work as target independent as possible.

Evan

>
>
> cheers and thanks in advance,
>
> -
> dietmar
>
> [1] August et al. : A Framework for Balancing Control Flow and
> Predication
> http://portal.acm.org/citation.cfm?id=266800.266810
>
> [2] Snavely et al. : Predicate Analysis and If-Conversion in an
> Itanium Link-Time Optimizer
> http://systems.cs.colorado.edu/EPIC2/papers/s3-3-snavely.pdf
>
>
> ---------------------------------------------------------------------
> Dietmar Ebner
> CD Laboratory - Compilation Techniques for Embedded Processors
> Institut fuer Computersprachen  E: [hidden email]
> Technische Universitaet Wien    T: (+431) 58801-18598
> Argentinierstrasse 8 / E1851    F: (+431) 58801-58521
> 1040 Wien, Austria              W: www.complang.tuwien.ac.at/cd/ebner
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: predicated execution

Dietmar Ebner
Evan,

thanks for your detailed answer!

On Mar 8, 2007, at 10:03 AM, Evan Cheng wrote:

>> a performance critical issue will be proper use of predicated
>> execution. if-conversion can either be performed early in the code
>> generation process [1] (exposing larger basic blocks to the
>> optimizers) or deferred until code generation is almost
>> complete [2].
>
> Option 1 makes a lot of sense, most predication aware compilers go
> this route. I haven't read the paper. But I am guessing their system
> is rediscovering CFG etc. at link time, reverse compiling assembly
> code into some internal representation? That is an interesting
> approach for a company who is servicing clients who do not provide
> their source code.
>
> If you use LLVM and you build option 1, you can do something like
> option 2 without a lot of horribleness. :-) LLVM does link time
> optimization at bytecode level.
right, i agree option 1 might be favorable in many terms. however,  
this would involve major changes in various components (instruction  
selection, register allocation, probably various optimization passes)  
and i don't have the time now to do this. also, it would be very hard  
to predict on the llvm level how code will look like after code  
generation and register allocation, i.e., aggressive if-conversion  
and hyperblock generation will probably require partial reverse if-
conversion to be effective.

>> for the time being, i'm planning to go with the second approach and
>> have a late optimization pass over the selected machine instructions
>> that
>> a] preliminarily schedules and bundles the selected instructions
>> b] speculatively executes instructions in the predecessor block if
>> there are unused resources
>> c] converts blocks B into a appropriately predicated version
>> (eliminating branches) if it's profitable for the particular
>> architecture.
>
> Ok. It doesn't sound like option 2 though (it has nothing to do with
> LTO). Item (b) sounds like meld scheduling, if that what you are
> considering?
not exactly and yes, it has nothing to do with LTO. the idea is just  
to move instructions to the predecessor block (if there is only one),  
if they can execute there (conditionally) for free, i.e., there are  
unused slots/nops that might be replaced with a particular  
instruction. this decreases the size of blocks for (c).

> However, you are in luck! It's extremely likely we will be starting
> on predication work for the ARM backend in the very near future. ARM
> has an extremely limited predication model (just condition codes), so
> it's not yet clear how much of our work will be applicable to your
> needs. But we'll try to keep our work as target independent as  
> possible.
thats what i was hoping. for the time being, i need a fast  
implementation that gets me an idea of the performance that can be  
reached with the existing infrastructure (our target doesn't exist  
yet in silicon but we do have a simulator). i'll report first results  
as they become available.

cheers,

-
dietmar
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev