alias information in codegen

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

alias information in codegen

Dan Gohman-2
There have been a few queries about this recently, and I've done some
work in this area recently, so I'm posting a summary of what the major
outstanding issues are.

* BasicAliasAnalysis, the default AliasAnalysis implementation, doesn't
   understand lowered GEPs, integer arithmetic, or PHIs, and the
   regular codegen process involves passes that lower GEPs.

One way to solve this is to use a different AliasAnalysis  
implementation.
I haven't looked at it in detail, but I'd guess that the Andersen's
pass handles all of these constructs reasonably well. And it has the
advantage of being much more capable than the BasicAliasAnalysis.

Another alternative is to develop a SCEV-based alias analysis. The SCEV
framework currently has the opposite problem; it understands integer
arithmetic and PHIs quite well, but doesn't currently understand GEPs.
I know it can be made to understand GEPs though, and it's on my TODO
list to implement this.

* There are still a variety of places in SelectionDAG creation that
   don't preserve SVOperand/SVOffset (as well as alignment and  
volatile).

These places need to be found and fixed. This is pretty straight-
forward,
and the places that need changing can be found by inserting some
strategic assert(SVOperand)'s.

  * The DAGCombiner's alias-analysis handling isn't ideal.

This is the -combiner-alias-analysis and -combiner-global-alias-analysis
options. They basically work, though they aren't turned on by default
currently. The algorithm used wants some scrutiny as well.

  * MachineInstrs currently don't reliably record information about
    memory accesses.

This is being addressed by MemOperands. However, currently there is a
problem; the current code misses memory references in anonymous
patterns, like this in x86:

def : Pat<(zextloadi64i32 addr:$src),
           (SUBREG_TO_REG (i64 0), (MOV32rm addr:$src),  
x86_subreg_32bit)>;

I can provide more details about what's going on here if anyone's
interested.

However, for people just interested in post-regalloc scheduling and
VLIW packing and similar things, MemOperands aren't the only approach.
A potentially better way to do this would be to extend MachineInstrs
to preserve the chain dependencies from the SelectionDAG.

This would be more efficient, as the SelectionDAG already needs a full
dependence graph, so there's no need to make any further alias queries.
There would be some details to resolve, such as how to handle loads
and stores inserted after instruction selection. Also, this depends on
the SelectionDAG having good dependence information, but improving
that would be beneficial for pre-regalloc scheduling as well.

Dan

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: alias information in codegen

Chris Lattner
On Apr 3, 2008, at 1:00 PM, Dan Gohman wrote:
> * BasicAliasAnalysis, the default AliasAnalysis implementation,  
> doesn't
>   understand lowered GEPs, integer arithmetic, or PHIs, and the
>   regular codegen process involves passes that lower GEPs.

Sure.

> One way to solve this is to use a different AliasAnalysis
> implementation.
> I haven't looked at it in detail, but I'd guess that the Andersen's
> pass handles all of these constructs reasonably well. And it has the
> advantage of being much more capable than the BasicAliasAnalysis.

I haven't tried, but I seriously doubt it.  Most AA's specifically  
just look at pointers and interprocedural ones are much more  
interested in this than local ones: this reduces the number of values  
to track, speeding up analysis and reducing memory use.

>
> Another alternative is to develop a SCEV-based alias analysis. The  
> SCEV
> framework currently has the opposite problem; it understands integer
> arithmetic and PHIs quite well, but doesn't currently understand GEPs.
> I know it can be made to understand GEPs though, and it's on my TODO
> list to implement this.

This could be interesting, but you're basically attacking the  
dependence analysis problem here.  Dependence analysis can be used to  
disambiguate pointers (and basicaa does a simple form of this) but  
that is orthogonal to the rest of the problem.

I think it would be better by starting to make LSR just not throw away  
pointer information when it is trivial to preserve it.  This is  
obviously not always the case, but it is VERY VERY VERY often the  
case, so we should see how far that gets us.  If nothing else, it  
makes the IR sent into the codegen simpler and smaller.

> * There are still a variety of places in SelectionDAG creation that
>   don't preserve SVOperand/SVOffset (as well as alignment and
> volatile).
>
> These places need to be found and fixed. This is pretty straight-
> forward,
> and the places that need changing can be found by inserting some
> strategic assert(SVOperand)'s.

This is also a serious issue that can manifest as bugs already, so it  
is a correctness issue, not a performance issue.

>  * The DAGCombiner's alias-analysis handling isn't ideal.
>
> This is the -combiner-alias-analysis and -combiner-global-alias-
> analysis
> options. They basically work, though they aren't turned on by default
> currently. The algorithm used wants some scrutiny as well.

Yep, it would be very interesting to rewrite the combiner aa stuff to  
be more scalable.  We've seen significant improvements with it, and it  
currently just does trivial base+offset disambiguation.  It would be  
much more interesting to throw full AliasAnalysis info into it. :)

-Chris
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: alias information in codegen

Florian Brandner-3
In reply to this post by Dan Gohman-2
On Thursday 03 April 2008 22:00:34 Dan Gohman wrote:
> However, for people just interested in post-regalloc scheduling and
> VLIW packing and similar things, MemOperands aren't the only approach.
> A potentially better way to do this would be to extend MachineInstrs
> to preserve the chain dependencies from the SelectionDAG.

the selection dag may already contain unnecessary dependencies, even with
alias analysis turned on. it is built after codegenprepare and thus the
lowered GEPs cause imprecise results already at this point.

florian



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: alias information in codegen

Duncan Sands
In reply to this post by Dan Gohman-2
Hi,

> * There are still a variety of places in SelectionDAG creation that
>    don't preserve SVOperand/SVOffset (as well as alignment and  
> volatile).
>
> These places need to be found and fixed. This is pretty straight-
> forward,
> and the places that need changing can be found by inserting some
> strategic assert(SVOperand)'s.

I've seen at least one place that passed through the original
SVOperand and SVOffset, but incorrectly forget to adjust SVOffset.
The assert you suggest wouldn't catch this.  There really need
to be some new or better methods to make it hard to get this kind
of thing wrong (currently it is hard to get it right!).

Ciao,

Duncan.
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: alias information in codegen

Dan Gohman-2
On Fri, April 4, 2008 2:24 am, Duncan Sands wrote:

> Hi,
>
>> * There are still a variety of places in SelectionDAG creation that
>>    don't preserve SVOperand/SVOffset (as well as alignment and
>> volatile).
>>
>> These places need to be found and fixed. This is pretty straight-
>> forward,
>> and the places that need changing can be found by inserting some
>> strategic assert(SVOperand)'s.
>
> I've seen at least one place that passed through the original
> SVOperand and SVOffset, but incorrectly forget to adjust SVOffset.
> The assert you suggest wouldn't catch this.  There really need
> to be some new or better methods to make it hard to get this kind
> of thing wrong (currently it is hard to get it right!).

I agree that it's tricky to get these right. Do you have any
ideas for how the values might be verified?

One thing we can work on is simplifying SelectionDAG::getLoad and
getStore, or maybe providing better alternative convenience methods.
Right now they're quite chatty.

Dan


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: alias information in codegen

Dan Gohman-2
In reply to this post by Florian Brandner-3
On Fri, April 4, 2008 12:03 am, Florian Brandner wrote:
> On Thursday 03 April 2008 22:00:34 Dan Gohman wrote:
>> However, for people just interested in post-regalloc scheduling and
>> VLIW packing and similar things, MemOperands aren't the only approach.
>> A potentially better way to do this would be to extend MachineInstrs
>> to preserve the chain dependencies from the SelectionDAG.
>
> the selection dag may already contain unnecessary dependencies, even with
> alias analysis turned on. it is built after codegenprepare and thus the
> lowered GEPs cause imprecise results already at this point.

Right; this idea depends on the other items being addressed,
giving the SelectiondDAG better dependence information, so it's
a somewhat longer-term strategy. But it has the nice property that
improvements to the dependence information can benefit both
pre-regalloc and post-regalloc scheduling.

Dan


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: alias information in codegen

Daniel Berlin
In reply to this post by Chris Lattner
On Thu, Apr 3, 2008 at 10:33 PM, Chris Lattner <[hidden email]> wrote:

> On Apr 3, 2008, at 1:00 PM, Dan Gohman wrote:
>  > * BasicAliasAnalysis, the default AliasAnalysis implementation,
>  > doesn't
>  >   understand lowered GEPs, integer arithmetic, or PHIs, and the
>  >   regular codegen process involves passes that lower GEPs.
>
>  Sure.
>
>
>  > One way to solve this is to use a different AliasAnalysis
>  > implementation.
>  > I haven't looked at it in detail, but I'd guess that the Andersen's
>  > pass handles all of these constructs reasonably well. And it has the
>  > advantage of being much more capable than the BasicAliasAnalysis.
>
>  I haven't tried, but I seriously doubt it.  Most AA's specifically
>  just look at pointers and interprocedural ones are much more
>  interested in this than local ones: this reduces the number of values
>  to track, speeding up analysis and reducing memory use.
>

I agree with Chris here completely..
It will help with GEP's eventually (field sensitive versions are being
worked on), but otherwise, it isn't going to help you here.
Trying to do pointer analysis on very lowered forms is both expensive
and hard.  It is like trying to build a bookshelf out of mashed
potatoes.
:)
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: alias information in codegen

Dan Gohman-2
In reply to this post by Chris Lattner
On Thu, April 3, 2008 7:33 pm, Chris Lattner wrote:

> On Apr 3, 2008, at 1:00 PM, Dan Gohman wrote:
>> * BasicAliasAnalysis, the default AliasAnalysis implementation,
>> doesn't
>>   understand lowered GEPs, integer arithmetic, or PHIs, and the
>>   regular codegen process involves passes that lower GEPs.
>
> Sure.
>
>> One way to solve this is to use a different AliasAnalysis
>> implementation.
>> I haven't looked at it in detail, but I'd guess that the Andersen's
>> pass handles all of these constructs reasonably well. And it has the
>> advantage of being much more capable than the BasicAliasAnalysis.
>
> I haven't tried, but I seriously doubt it.  Most AA's specifically
> just look at pointers and interprocedural ones are much more
> interested in this than local ones: this reduces the number of values
> to track, speeding up analysis and reducing memory use.

I just tried it, and it did work. Perhaps this means that the
current -anders-aa pass is doing more work than it should.

>> Another alternative is to develop a SCEV-based alias analysis. The
>> SCEV
>> framework currently has the opposite problem; it understands integer
>> arithmetic and PHIs quite well, but doesn't currently understand GEPs.
>> I know it can be made to understand GEPs though, and it's on my TODO
>> list to implement this.
>
> This could be interesting, but you're basically attacking the
> dependence analysis problem here.  Dependence analysis can be used to
> disambiguate pointers (and basicaa does a simple form of this) but
> that is orthogonal to the rest of the problem.

I was thinking of starting with something very simple here: an
AliasAnalysis implementation where alias queries are implemented by
computing a SCEV for each pointer and doing a getMinusSCEV with them.
If the result is a constant, it can be handled. If the result is a
subtraction of two simpler values, they can be handed off to the
next analysis in the chain.

> I think it would be better by starting to make LSR just not throw away
> pointer information when it is trivial to preserve it.  This is
> obviously not always the case, but it is VERY VERY VERY often the
> case, so we should see how far that gets us.  If nothing else, it
> makes the IR sent into the codegen simpler and smaller.

I agree this would be useful.

Dan


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: alias information in codegen

Daniel Berlin
On Mon, Apr 7, 2008 at 11:00 AM, Dan Gohman <[hidden email]> wrote:

> On Thu, April 3, 2008 7:33 pm, Chris Lattner wrote:
>  > On Apr 3, 2008, at 1:00 PM, Dan Gohman wrote:
>  >> * BasicAliasAnalysis, the default AliasAnalysis implementation,
>  >> doesn't
>  >>   understand lowered GEPs, integer arithmetic, or PHIs, and the
>  >>   regular codegen process involves passes that lower GEPs.
>  >
>  > Sure.
>  >
>  >> One way to solve this is to use a different AliasAnalysis
>  >> implementation.
>  >> I haven't looked at it in detail, but I'd guess that the Andersen's
>  >> pass handles all of these constructs reasonably well. And it has the
>  >> advantage of being much more capable than the BasicAliasAnalysis.
>  >
>  > I haven't tried, but I seriously doubt it.  Most AA's specifically
>  > just look at pointers and interprocedural ones are much more
>  > interested in this than local ones: this reduces the number of values
>  > to track, speeding up analysis and reducing memory use.
>
>  I just tried it, and it did work. Perhaps this means that the
>  current -anders-aa pass is doing more work than it should.
>

It's more likely that something is broken and needs fixing, to be honest.
IE it's probably getting wrong answers.

The anders-aa is in good shape solving algorithm wise, but needs a lot
of testing to get the bugs out of the constraints it generates.
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: alias information in codegen

Pertti Kellomäki-2
In reply to this post by Daniel Berlin
Daniel Berlin wrote:
> Trying to do pointer analysis on very lowered forms is both expensive
> and hard.

This is actually something that has been bothering me. There
is a lot of existing work on parallelizing Fortran compilers,
and the dependence analysis there focuses on figuring out
whether two references to the same array alias or not. The
analysis methods basically solve systems of equations formed
from the index expressions in the array references.

It would seem that this kind of analysis should be done
way before converting to anything like the LLVM IR. Are there
any plans or ideas as to how such alias information could be
propagated e.g. from clang to LLVM?
--
Pertti
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev