[llvm-dev] LLVM Pass To Remove Dead Code In A Basic Block

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] LLVM Pass To Remove Dead Code In A Basic Block

U.Mutlu via llvm-dev
Hi all,

LLVM optimization pass gives an error "Terminator found in the middle of a basic block!" since basic block IR may have multiple "ret" instructions. It seems LLVM does not accept multiple return in a basic block by default. 

Is there a specific optimization or pass that I can enable to remove unreachable codes in basic blocks?

Best,

Aaron


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] LLVM Pass To Remove Dead Code In A Basic Block

U.Mutlu via llvm-dev


> On 25 May 2018, at 01:46, Aaron via llvm-dev <[hidden email]> wrote:
>
> Hi all,
>
> LLVM optimization pass gives an error "Terminator found in the middle of a basic block!" since basic block IR may have multiple "ret" instructions. It seems LLVM does not accept multiple return in a basic block by default.
>

Yes, if you’re inserting terminators in a basic block at the IR level, you’re going to have to split it into two different blocks yourself. Essentially you’re encountering a legaliser failure here — it’s an invariant of a BasicBlock that there are no terminator instructions in the body, and the last instruction should be a terminator.

> Is there a specific optimization or pass that I can enable to remove unreachable codes in basic blocks?
>

If you’re writing your own pass, you can do this yourself — after you insert the return instruction, just delete the rest of the instructions in the basic block after that instruction.

Just curious, how are you getting return instructions in the middle of a basic block?

> Best,
>
> Aaron
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- Dean

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] LLVM Pass To Remove Dead Code In A Basic Block

U.Mutlu via llvm-dev
Hi Dean,

Thanks for your reply.

That's exactly what I am doing, but I was looking for a default optimization or pass implementation if there was.

I used BasicBlock::splitBasicBlock() but it puts "br" end of original basic block. I tried to delete the br instruction by using eraseFromParent() but it didn't work.

I had to rewrite my own splitBasicBlock() by modifying the original one. Just removed the br instruction creation lines.

>> Just curious, how are you getting return instructions in the middle of a basic block?
My code generation pass allows multiple return instruction generation since that simplifies front-end significantly.

Here is the code if anyone would like to use it:

void CodeGenPass::DeleteDeadCode(llvm::BasicBlock * basicBlock)
{
    for (auto it = basicBlock->begin(); it != basicBlock->end(); ++it)
    {
        // Split after first return instruction.
        if (it->getOpcode() == llvm::Instruction::Ret)
        {
            ++it;
            // Split only if there is a following instruction.
            if (it != basicBlock->getInstList().end())
            {
                auto deadCodeBlock = SplitBasicBlock(basicBlock, it);
                deadCodeBlock->eraseFromParent();
            }
            return;
        }
    }
}


llvm::BasicBlock * CodeGenPass::SplitBasicBlock(llvm::BasicBlock * basicBlock, llvm::BasicBlock::iterator it)
{
    assert(basicBlock->getTerminator() && "Block must have terminator instruction.");
    assert(it != basicBlock->getInstList().end() && "Can't split block since there is no following instruction in the basic block.");

    auto newBlock = llvm::BasicBlock::Create(basicBlock->getContext(), "splitedBlock", basicBlock->getParent(), basicBlock->getNextNode());

    // Move all of the instructions from original block into new block.
    newBlock->getInstList().splice(newBlock->end(), basicBlock->getInstList(), it, basicBlock->end());

    // Now we must loop through all of the successors of the New block (which
    // _were_ the successors of the 'this' block), and update any PHI nodes in
    // successors.  If there were PHI nodes in the successors, then they need to
    // know that incoming branches will be from New, not from Old.
    //
    for (llvm::succ_iterator I = llvm::succ_begin(newBlock), E = llvm::succ_end(newBlock); I != E; ++I)
    {
        // Loop over any phi nodes in the basic block, updating the BB field of
        // incoming values...
        llvm::BasicBlock *Successor = *I;
        for (auto &PN : Successor->phis())
        {
            int Idx = PN.getBasicBlockIndex(basicBlock);
            while (Idx != -1)
            {
                PN.setIncomingBlock((unsigned)Idx, newBlock);
                Idx = PN.getBasicBlockIndex(basicBlock);
            }
        }
    }

    return newBlock;
}


Best,
 
Aaron


On Thu, May 24, 2018 at 9:22 AM, Dean Michael Berris <[hidden email]> wrote:


> On 25 May 2018, at 01:46, Aaron via llvm-dev <[hidden email]> wrote:
>
> Hi all,
>
> LLVM optimization pass gives an error "Terminator found in the middle of a basic block!" since basic block IR may have multiple "ret" instructions. It seems LLVM does not accept multiple return in a basic block by default.
>

Yes, if you’re inserting terminators in a basic block at the IR level, you’re going to have to split it into two different blocks yourself. Essentially you’re encountering a legaliser failure here — it’s an invariant of a BasicBlock that there are no terminator instructions in the body, and the last instruction should be a terminator.

> Is there a specific optimization or pass that I can enable to remove unreachable codes in basic blocks?
>

If you’re writing your own pass, you can do this yourself — after you insert the return instruction, just delete the rest of the instructions in the basic block after that instruction.

Just curious, how are you getting return instructions in the middle of a basic block?

> Best,
>
> Aaron
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- Dean



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] LLVM Pass To Remove Dead Code In A Basic Block

U.Mutlu via llvm-dev

> On 25 May 2018, at 03:53, Aaron <[hidden email]> wrote:
>
> Hi Dean,
>
> Thanks for your reply.
>
> That's exactly what I am doing, but I was looking for a default optimization or pass implementation if there was.
>

There’s a dead code elimination pass, but it works with a control flow graph (it removes basic blocks that are unreachable after constant propagation and other passes earlier on have run).

> I used BasicBlock::splitBasicBlock() but it puts "br" end of original basic block. I tried to delete the br instruction by using eraseFromParent() but it didn't work.
>

This is working as intended. All LLVM BasicBlocks *must* have a terminator as the last instruction.

> I had to rewrite my own splitBasicBlock() by modifying the original one. Just removed the br instruction creation lines.
>

It sounds like it would have just been easier to replace the branch instruction into a ret, then remove all successors of the original br instruction. There’s nothing special here, this is how it’s supposed to be done.

> >> Just curious, how are you getting return instructions in the middle of a basic block?
> My code generation pass allows multiple return instruction generation since that simplifies front-end significantly.
>

I’m just wondering why not have a ‘br’ to an exit basic block instead of ‘ret’ mid-stream of instructions. That way you don’t need to do any special post-processing while you’re emitting the LLVM IR, you end up with valid basic blocks all the time, and you can leverage all the other passes that come with LLVM already.

That at least sounds simpler to me.

I’m not sure whether you can have non-entry blocks with no predecessors in LLVM (need to look up the langref about that) but I can imagine it’s a fairly useful construct to support. e.g. something like (pseudo-IR):

.entry:
  <this is the entry block>
  <…>
  br .exit

.bb0:
  <this is by definition unreachable>
  <…>
  br .exit

.exit:
  ret

This structure makes it easier for dead code elimination to determine that .bb0 is unreachable from .entry and can elide it safely. From your front-end, you can have a single ‘.exit’ block in a function and let the optimisations do the basic-block merging later on.

Have you considered this approach instead?

-- Dean

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] LLVM Pass To Remove Dead Code In A Basic Block

U.Mutlu via llvm-dev
> I’m just wondering why not have a ‘br’ to an exit basic block instead of ‘ret’ mid-stream of instructions.
> Have you considered this approach instead?

Thanks for bringing this up.

Yes. In fact, I tried that approach/pattern first. Simply, you create default exit block and a local return variable (to track return value) per function, but it requires extra flags and variables to track function exit block and return value anytime during code generation (which my approach does not need to do these). 

When generating code, every time you see a return instruction, you store return value into that local variable (if it returns a value) and create branch instruction to function's exit block. Then function can generate return instruction by using local return variable or just ret void.

entry:
  <this is the entry block>
  <alloc local return variable>
  <…>
  <store return value into local return variable, if returns a value>
  br .exit

.bb0:
  <this is by definition unreachable>
  <…>
  <store return value into local return variables, if returns a value>
  br .exit

.exit:
  ret <local return value> OR ret void.


But this approach still does not resolve the issue I had. Front-end still can create multiple "br" instruction per block. Now we hit the original problem: we can't create multiple terminator instructions in a block.

One or the other pattern could have advantages based on language spec or the way you implement front-end. Who knows, I may want to return back to this approach in the future if creates better advantages. I'm constantly considering alternatives. The simplest / cleanest design wins.

Best,

Aaron


On Fri, May 25, 2018 at 1:41 AM, Dean Michael Berris <[hidden email]> wrote:

> On 25 May 2018, at 03:53, Aaron <[hidden email]> wrote:
>
> Hi Dean,
>
> Thanks for your reply.
>
> That's exactly what I am doing, but I was looking for a default optimization or pass implementation if there was.
>

There’s a dead code elimination pass, but it works with a control flow graph (it removes basic blocks that are unreachable after constant propagation and other passes earlier on have run).

> I used BasicBlock::splitBasicBlock() but it puts "br" end of original basic block. I tried to delete the br instruction by using eraseFromParent() but it didn't work.
>

This is working as intended. All LLVM BasicBlocks *must* have a terminator as the last instruction.

> I had to rewrite my own splitBasicBlock() by modifying the original one. Just removed the br instruction creation lines.
>

It sounds like it would have just been easier to replace the branch instruction into a ret, then remove all successors of the original br instruction. There’s nothing special here, this is how it’s supposed to be done.

> >> Just curious, how are you getting return instructions in the middle of a basic block?
> My code generation pass allows multiple return instruction generation since that simplifies front-end significantly.
>

I’m just wondering why not have a ‘br’ to an exit basic block instead of ‘ret’ mid-stream of instructions. That way you don’t need to do any special post-processing while you’re emitting the LLVM IR, you end up with valid basic blocks all the time, and you can leverage all the other passes that come with LLVM already.

That at least sounds simpler to me.

I’m not sure whether you can have non-entry blocks with no predecessors in LLVM (need to look up the langref about that) but I can imagine it’s a fairly useful construct to support. e.g. something like (pseudo-IR):

.entry:
  <this is the entry block>
  <…>
  br .exit

.bb0:
  <this is by definition unreachable>
  <…>
  br .exit

.exit:
  ret

This structure makes it easier for dead code elimination to determine that .bb0 is unreachable from .entry and can elide it safely. From your front-end, you can have a single ‘.exit’ block in a function and let the optimisations do the basic-block merging later on.

Have you considered this approach instead?

-- Dean



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] LLVM Pass To Remove Dead Code In A Basic Block

U.Mutlu via llvm-dev


> On 26 May 2018, at 07:24, Aaron <[hidden email]> wrote:
>
> > I’m just wondering why not have a ‘br’ to an exit basic block instead of ‘ret’ mid-stream of instructions.
> > Have you considered this approach instead?
>
> Thanks for bringing this up.
>
> Yes. In fact, I tried that approach/pattern first. Simply, you create default exit block and a local return variable (to track return value) per function, but it requires extra flags and variables to track function exit block and return value anytime during code generation (which my approach does not need to do these).
>
> When generating code, every time you see a return instruction, you store return value into that local variable (if it returns a value) and create branch instruction to function's exit block. Then function can generate return instruction by using local return variable or just ret void.
>
> entry:
>   <this is the entry block>
>   <alloc local return variable>
>   <…>
>   <store return value into local return variable, if returns a value>
>   br .exit
>
> .bb0:
>   <this is by definition unreachable>
>   <…>
>   <store return value into local return variables, if returns a value>
>   br .exit
>
> .exit:
>   ret <local return value> OR ret void.
>
>
> But this approach still does not resolve the issue I had. Front-end still can create multiple "br" instruction per block. Now we hit the original problem: we can't create multiple terminator instructions in a block.
>

The way I’ve seen this done, is to use a phi node in the exit block which consolidates all the potential values of the result, with all the potential predecessors contributing to the value marked out.

https://llvm.org/docs/LangRef.html#phi-instruction

That makes it clear that the value being returned can come only specifically from specific basic blocks.

This works really well when you’ve internalised the SSA form that LLVM models.

> One or the other pattern could have advantages based on language spec or the way you implement front-end. Who knows, I may want to return back to this approach in the future if creates better advantages. I'm constantly considering alternatives. The simplest / cleanest design wins.
>

If you’re looking to use LLVM most effectively, I think it would make sense to see how some of the other front-ends (and the various tutorials out there for using LLVM effectively) do it. In this case, from your front-end it would make sense to identify more granular basic blocks if you can do that. It would also greatly help with the LLVM passes which assume a certain structure.

Cheers

-- Dean

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev