[llvm-dev] Basic block merging

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] Basic block merging

Son Tuan VU via llvm-dev
Under certain circumstances, my compiler outputs basic blocks having the same function:

bb_97:                                            ; preds = %bb_1
  %476 = getelementptr inbounds %LMtop.I0.ARType, %LMtop.I0.ARType* %0, i64 0, i32 6
  %477 = bitcast i8** %476 to %LBstd.Cprocess.CRType**
  %478 = load %LBstd.Cprocess.CRType*, %LBstd.Cprocess.CRType** %477, align 8
  %479 = getelementptr inbounds %LBstd.Cprocess.CRType, %LBstd.Cprocess.CRType* %478, i64 0, i32 3
  %480 = bitcast i8** %479 to %LMtop.I0.ARType**
  store %LMtop.I0.ARType* %0, %LMtop.I0.ARType** %480, align 8
  br label %bb_106

bb_98:                                            ; preds = %bb_1
  %481 = getelementptr inbounds %LMtop.I0.ARType, %LMtop.I0.ARType* %0, i64 0, i32 6
  %482 = bitcast i8** %481 to %LBstd.Cprocess.CRType**
  %483 = load %LBstd.Cprocess.CRType*, %LBstd.Cprocess.CRType** %482, align 8
  %484 = getelementptr inbounds %LBstd.Cprocess.CRType, %LBstd.Cprocess.CRType* %483, i64 0, i32 3
  %485 = bitcast i8** %484 to %LMtop.I0.ARType**
  store %LMtop.I0.ARType* %0, %LMtop.I0.ARType** %485, align 8
  br label %bb_106

These blocks have provably the same function - the instructions of one correspond perfectly to instructions in the other.

I would have expected some optimization to detect this, and merge the blocks by forwarding all jumps to one, to jump to the other.

I notice a "merge functions" pass, which does this if two functions can be proven to have identical effect.

Is there a pass that merges blocks? Is it enabled by default?  If so, then is there something non-obvious in the code that would be defeating this optimization?


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Basic block merging

Son Tuan VU via llvm-dev
On Wed, May 29, 2019 at 10:49 AM David Jones via llvm-dev
<[hidden email]> wrote:

>
> Under certain circumstances, my compiler outputs basic blocks having the same function:
>
> bb_97:                                            ; preds = %bb_1
>   %476 = getelementptr inbounds %LMtop.I0.ARType, %LMtop.I0.ARType* %0, i64 0, i32 6
>   %477 = bitcast i8** %476 to %LBstd.Cprocess.CRType**
>   %478 = load %LBstd.Cprocess.CRType*, %LBstd.Cprocess.CRType** %477, align 8
>   %479 = getelementptr inbounds %LBstd.Cprocess.CRType, %LBstd.Cprocess.CRType* %478, i64 0, i32 3
>   %480 = bitcast i8** %479 to %LMtop.I0.ARType**
>   store %LMtop.I0.ARType* %0, %LMtop.I0.ARType** %480, align 8
>   br label %bb_106
>
> bb_98:                                            ; preds = %bb_1
>   %481 = getelementptr inbounds %LMtop.I0.ARType, %LMtop.I0.ARType* %0, i64 0, i32 6
>   %482 = bitcast i8** %481 to %LBstd.Cprocess.CRType**
>   %483 = load %LBstd.Cprocess.CRType*, %LBstd.Cprocess.CRType** %482, align 8
>   %484 = getelementptr inbounds %LBstd.Cprocess.CRType, %LBstd.Cprocess.CRType* %483, i64 0, i32 3
>   %485 = bitcast i8** %484 to %LMtop.I0.ARType**
>   store %LMtop.I0.ARType* %0, %LMtop.I0.ARType** %485, align 8
>   br label %bb_106
>
> These blocks have provably the same function - the instructions of one correspond perfectly to instructions in the other.
>
> I would have expected some optimization to detect this, and merge the blocks by forwarding all jumps to one, to jump to the other.
>
> I notice a "merge functions" pass, which does this if two functions can be proven to have identical effect.
>
> Is there a pass that merges blocks? Is it enabled by default?  If so, then is there something non-obvious in the code that would be defeating this optimization?
I see two problems here (neither unsolvable): 1) I believe the
register allocator is working over a full function, and for this merge
the registers and stack/frame offsets have to be the same. 2) if the
block very small the icache might not work as well, and on some arches
long jumps are more expensive than short jumps and require trunks.

Basically, if this was to be done, the blocks that are merged would
have to be turned into functions (with the fastcc calling convention).

-Shawn Landden
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Basic block merging

Son Tuan VU via llvm-dev
Am Mi., 29. Mai 2019 um 13:31 Uhr schrieb Shawn Landden via llvm-dev
<[hidden email]>:

>
> On Wed, May 29, 2019 at 10:49 AM David Jones via llvm-dev
> <[hidden email]> wrote:
> >
> > Under certain circumstances, my compiler outputs basic blocks having the same function:
> >
> > bb_97:                                            ; preds = %bb_1
> >   %476 = getelementptr inbounds %LMtop.I0.ARType, %LMtop.I0.ARType* %0, i64 0, i32 6
> >   %477 = bitcast i8** %476 to %LBstd.Cprocess.CRType**
> >   %478 = load %LBstd.Cprocess.CRType*, %LBstd.Cprocess.CRType** %477, align 8
> >   %479 = getelementptr inbounds %LBstd.Cprocess.CRType, %LBstd.Cprocess.CRType* %478, i64 0, i32 3
> >   %480 = bitcast i8** %479 to %LMtop.I0.ARType**
> >   store %LMtop.I0.ARType* %0, %LMtop.I0.ARType** %480, align 8
> >   br label %bb_106
> >
> > bb_98:                                            ; preds = %bb_1
> >   %481 = getelementptr inbounds %LMtop.I0.ARType, %LMtop.I0.ARType* %0, i64 0, i32 6
> >   %482 = bitcast i8** %481 to %LBstd.Cprocess.CRType**
> >   %483 = load %LBstd.Cprocess.CRType*, %LBstd.Cprocess.CRType** %482, align 8
> >   %484 = getelementptr inbounds %LBstd.Cprocess.CRType, %LBstd.Cprocess.CRType* %483, i64 0, i32 3
> >   %485 = bitcast i8** %484 to %LMtop.I0.ARType**
> >   store %LMtop.I0.ARType* %0, %LMtop.I0.ARType** %485, align 8
> >   br label %bb_106
> >
> > These blocks have provably the same function - the instructions of one correspond perfectly to instructions in the other.
> >
> > I would have expected some optimization to detect this, and merge the blocks by forwarding all jumps to one, to jump to the other.
> >
> > I notice a "merge functions" pass, which does this if two functions can be proven to have identical effect.
> >
> > Is there a pass that merges blocks? Is it enabled by default?  If so, then is there something non-obvious in the code that would be defeating this optimization?
> I see two problems here (neither unsolvable): 1) I believe the
> register allocator is working over a full function, and for this merge
> the registers and stack/frame offsets have to be the same. 2) if the
> block very small the icache might not work as well, and on some arches
> long jumps are more expensive than short jumps and require trunks.

I understand this request differently. This is on LLVM-IR, so register
allocation would happen on the merged block. I-Cache utilization
should improve since there is less code in the merged blocked compared
to the two original blocks.

I could imagine an algorithm that checks whether the two blocks are
isomorphic and have the same jump targets. Then merge the two blocks
(including adding PHIs if the blocks have different predecessors) by
RAUW one block with this other.

In the example, both blocks have the same predecessor, i.e. it will be
a conditional branch. In this case the transformation (after having
determined that the blocks are isomorphic) would change it to an
unconditional branch to one of the blocks. The other one becomes dead
code.

The question is, whether the improvement justifies the additional cost
of identifying isomorphic blocks.

Michael
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Basic block merging

Son Tuan VU via llvm-dev
On Wed, May 29, 2019 at 1:58 PM Michael Kruse <[hidden email]> wrote:

>
> Am Mi., 29. Mai 2019 um 13:31 Uhr schrieb Shawn Landden via llvm-dev
> <[hidden email]>:
> >
> > On Wed, May 29, 2019 at 10:49 AM David Jones via llvm-dev
> > <[hidden email]> wrote:
> > >
> > > Under certain circumstances, my compiler outputs basic blocks having the same function:
> > >
> > > bb_97:                                            ; preds = %bb_1
> > >   %476 = getelementptr inbounds %LMtop.I0.ARType, %LMtop.I0.ARType* %0, i64 0, i32 6
> > >   %477 = bitcast i8** %476 to %LBstd.Cprocess.CRType**
> > >   %478 = load %LBstd.Cprocess.CRType*, %LBstd.Cprocess.CRType** %477, align 8
> > >   %479 = getelementptr inbounds %LBstd.Cprocess.CRType, %LBstd.Cprocess.CRType* %478, i64 0, i32 3
> > >   %480 = bitcast i8** %479 to %LMtop.I0.ARType**
> > >   store %LMtop.I0.ARType* %0, %LMtop.I0.ARType** %480, align 8
> > >   br label %bb_106
> > >
> > > bb_98:                                            ; preds = %bb_1
> > >   %481 = getelementptr inbounds %LMtop.I0.ARType, %LMtop.I0.ARType* %0, i64 0, i32 6
> > >   %482 = bitcast i8** %481 to %LBstd.Cprocess.CRType**
> > >   %483 = load %LBstd.Cprocess.CRType*, %LBstd.Cprocess.CRType** %482, align 8
> > >   %484 = getelementptr inbounds %LBstd.Cprocess.CRType, %LBstd.Cprocess.CRType* %483, i64 0, i32 3
> > >   %485 = bitcast i8** %484 to %LMtop.I0.ARType**
> > >   store %LMtop.I0.ARType* %0, %LMtop.I0.ARType** %485, align 8
> > >   br label %bb_106
> > >
> > > These blocks have provably the same function - the instructions of one correspond perfectly to instructions in the other.
> > >
> > > I would have expected some optimization to detect this, and merge the blocks by forwarding all jumps to one, to jump to the other.
> > >
> > > I notice a "merge functions" pass, which does this if two functions can be proven to have identical effect.
> > >
> > > Is there a pass that merges blocks? Is it enabled by default?  If so, then is there something non-obvious in the code that would be defeating this optimization?
> > I see two problems here (neither unsolvable): 1) I believe the
> > register allocator is working over a full function, and for this merge
> > the registers and stack/frame offsets have to be the same. 2) if the
> > block very small the icache might not work as well, and on some arches
> > long jumps are more expensive than short jumps and require trunks.
>
> I understand this request differently. This is on LLVM-IR, so register
> allocation would happen on the merged block. I-Cache utilization
> should improve since there is less code in the merged blocked compared
> to the two original blocks.
>
> I could imagine an algorithm that checks whether the two blocks are
> isomorphic and have the same jump targets. Then merge the two blocks
> (including adding PHIs if the blocks have different predecessors) by
> RAUW one block with this other.
>
> In the example, both blocks have the same predecessor, i.e. it will be
> a conditional branch. In this case the transformation (after having
> determined that the blocks are isomorphic) would change it to an
> unconditional branch to one of the blocks. The other one becomes dead
> code.
Oh I was thinking very differently: similar blocks in different
functions (which is reasonable, but more complicated).
That LLVM can't merge equivalent blocks in the same function is
certainly is a bug.
>
> The question is, whether the improvement justifies the additional cost
> of identifying isomorphic blocks.
Could the merge-functions pass be somehow modified to also do this? in
a way that would require code duplication, with the common code
refactored out.

-Shawn Landden
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Basic block merging

Son Tuan VU via llvm-dev
In reply to this post by Son Tuan VU via llvm-dev
SimplifyCFG has some code for doing some merging. For example SinkCommonCodeFromPredecessors

~Craig


On Wed, May 29, 2019 at 11:58 AM Michael Kruse via llvm-dev <[hidden email]> wrote:
Am Mi., 29. Mai 2019 um 13:31 Uhr schrieb Shawn Landden via llvm-dev
<[hidden email]>:
>
> On Wed, May 29, 2019 at 10:49 AM David Jones via llvm-dev
> <[hidden email]> wrote:
> >
> > Under certain circumstances, my compiler outputs basic blocks having the same function:
> >
> > bb_97:                                            ; preds = %bb_1
> >   %476 = getelementptr inbounds %LMtop.I0.ARType, %LMtop.I0.ARType* %0, i64 0, i32 6
> >   %477 = bitcast i8** %476 to %LBstd.Cprocess.CRType**
> >   %478 = load %LBstd.Cprocess.CRType*, %LBstd.Cprocess.CRType** %477, align 8
> >   %479 = getelementptr inbounds %LBstd.Cprocess.CRType, %LBstd.Cprocess.CRType* %478, i64 0, i32 3
> >   %480 = bitcast i8** %479 to %LMtop.I0.ARType**
> >   store %LMtop.I0.ARType* %0, %LMtop.I0.ARType** %480, align 8
> >   br label %bb_106
> >
> > bb_98:                                            ; preds = %bb_1
> >   %481 = getelementptr inbounds %LMtop.I0.ARType, %LMtop.I0.ARType* %0, i64 0, i32 6
> >   %482 = bitcast i8** %481 to %LBstd.Cprocess.CRType**
> >   %483 = load %LBstd.Cprocess.CRType*, %LBstd.Cprocess.CRType** %482, align 8
> >   %484 = getelementptr inbounds %LBstd.Cprocess.CRType, %LBstd.Cprocess.CRType* %483, i64 0, i32 3
> >   %485 = bitcast i8** %484 to %LMtop.I0.ARType**
> >   store %LMtop.I0.ARType* %0, %LMtop.I0.ARType** %485, align 8
> >   br label %bb_106
> >
> > These blocks have provably the same function - the instructions of one correspond perfectly to instructions in the other.
> >
> > I would have expected some optimization to detect this, and merge the blocks by forwarding all jumps to one, to jump to the other.
> >
> > I notice a "merge functions" pass, which does this if two functions can be proven to have identical effect.
> >
> > Is there a pass that merges blocks? Is it enabled by default?  If so, then is there something non-obvious in the code that would be defeating this optimization?
> I see two problems here (neither unsolvable): 1) I believe the
> register allocator is working over a full function, and for this merge
> the registers and stack/frame offsets have to be the same. 2) if the
> block very small the icache might not work as well, and on some arches
> long jumps are more expensive than short jumps and require trunks.

I understand this request differently. This is on LLVM-IR, so register
allocation would happen on the merged block. I-Cache utilization
should improve since there is less code in the merged blocked compared
to the two original blocks.

I could imagine an algorithm that checks whether the two blocks are
isomorphic and have the same jump targets. Then merge the two blocks
(including adding PHIs if the blocks have different predecessors) by
RAUW one block with this other.

In the example, both blocks have the same predecessor, i.e. it will be
a conditional branch. In this case the transformation (after having
determined that the blocks are isomorphic) would change it to an
unconditional branch to one of the blocks. The other one becomes dead
code.

The question is, whether the improvement justifies the additional cost
of identifying isomorphic blocks.

Michael
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Basic block merging

Son Tuan VU via llvm-dev
In reply to this post by Son Tuan VU via llvm-dev
Am Mi., 29. Mai 2019 um 15:02 Uhr schrieb Shawn Landden <[hidden email]>:
> > The question is, whether the improvement justifies the additional cost
> > of identifying isomorphic blocks.
> Could the merge-functions pass be somehow modified to also do this? in
> a way that would require code duplication, with the common code
> refactored out.

There is FunctionComparator::cmpBasicBlocks to determine block
isomorphism. Looks like it requires instructions to be in the exact
same order.

Michael
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev