[llvm-dev] LLVM Block is not the basic block

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] LLVM Block is not the basic block

Tim Northover via llvm-dev
Hi

I am using the LLVM function pass to help me to do code analysis. I use 


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] LLVM Block is not the basic block

Tim Northover via llvm-dev
Hi 

Sorry, that the previous email is sent out before I complete it due to my mistake. Please read this

I am using the LLVM function pass to help me to do code analysis. However, I found that the block LLVM identified will ignore the function call.

For example, the below IR should not be a basic block. 


  %call17 = call i32* @__errno_location() #14, !dbg !1384
  %18 = load i32, i32* %call17, align 4, !dbg !1384
  %19 = load i8*, i8** %dest_dirname, align 4, !dbg !1386
  call void (i32, i32, i8*, ...) @error(i32 1, i32 %18, i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str.389, i32 0, i    32 0), i8* %19), !dbg !1387
  br label %if.end18, !dbg !1388

The corresponding binary i s below

.text:0001A530                 BL      __errno_location
.text:0001A534                 LDR     R1, [R0]        ; errnum
.text:0001A538                 LDR     R3, [SP,#0x100+var_100]
.text:0001A53C                 LDR     R2, =aS_1       ; "%s"
.text:0001A540                 MOV     R0, #1          ; status
.text:0001A544                 BL      error
.text:0001A548                 B       loc_1A54C

Here you can see it obviously should not be a basic block because you called two functions! So the control flow graph LLVM generated is also not the real control flow graph, right? Do anyone know why or give me some suggestions?

Regards
Muhui


2018-05-29 20:35 GMT+08:00 Muhui Jiang <[hidden email]>:
Hi

I am using the LLVM function pass to help me to do code analysis. I use 



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] LLVM Block is not the basic block

Tim Northover via llvm-dev


> On 29 May 2018, at 22:40, Muhui Jiang via llvm-dev <[hidden email]> wrote:
>
> Hi
>
> Sorry, that the previous email is sent out before I complete it due to my mistake. Please read this
>
> I am using the LLVM function pass to help me to do code analysis. However, I found that the block LLVM identified will ignore the function call.
>
> For example, the below IR should not be a basic block.
>
>
>   %call17 = call i32* @__errno_location() #14, !dbg !1384
>   %18 = load i32, i32* %call17, align 4, !dbg !1384
>   %19 = load i8*, i8** %dest_dirname, align 4, !dbg !1386
>   call void (i32, i32, i8*, ...) @error(i32 1, i32 %18, i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str.389, i32 0, i    32 0), i8* %19), !dbg !1387
>   br label %if.end18, !dbg !1388
>
> The corresponding binary i s below
>
> .text:0001A530                 BL      __errno_location
> .text:0001A534                 LDR     R1, [R0]        ; errnum
> .text:0001A538                 LDR     R3, [SP,#0x100+var_100]
> .text:0001A53C                 LDR     R2, =aS_1       ; "%s"
> .text:0001A540                 MOV     R0, #1          ; status
> .text:0001A544                 BL      error
> .text:0001A548                 B       loc_1A54C
>
> Here you can see it obviously should not be a basic block because you called two functions! So the control flow graph LLVM generated is also not the real control flow graph, right? Do anyone know why or give me some suggestions?
>

In LLVM, basic blocks can contain function calls. This allows partial and/or full code inlining.

The control flow graph (CFG) referred to in LLVM passes only include the LLVM basic blocks inside a function. In LLVM, only tail-call exits are considered terminator instructions (and thus will delineate the basic block boundaries).

If you want to see function-call graphs, you may want to look at CallGraphSCCPass instead of FunctionPass.

http://llvm.org/docs/WritingAnLLVMPass.html#the-callgraphsccpass-class

Cheers

-- Dean

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] LLVM Block is not the basic block

Tim Northover via llvm-dev
Hi Dean

Thank you very much for you very quick reply. I am still a little bit confused and below is some of my questions.

In LLVM, basic blocks can contain function calls. This allows partial and/or full code inlining.
=======
So the reason why basic blocks can contain function calls is because of code inlining?

The control flow graph (CFG) referred to in LLVM passes only include the LLVM basic blocks inside a function. In LLVM, only tail-call exits are considered terminator instructions (and thus will delineate the basic block boundaries).
=======
If so, the control flow graph generated with opt -dot-cfg should not be the real control flow graph. Because LLVM doesn't think function call is a jump, thus the control flow graph is not accurate. Can I say like this? Or any option to help me to split a basic block in LLVM into several real basic blocks.

If you want to see function-call graphs, you may want to look at CallGraphSCCPass instead of FunctionPass.
======
What does function-call graphs mean? Does it mean the callgraph? Many Thanks

Regards
Muhui

2018-05-29 20:50 GMT+08:00 Dean Michael Berris <[hidden email]>:


> On 29 May 2018, at 22:40, Muhui Jiang via llvm-dev <[hidden email]> wrote:
>
> Hi
>
> Sorry, that the previous email is sent out before I complete it due to my mistake. Please read this
>
> I am using the LLVM function pass to help me to do code analysis. However, I found that the block LLVM identified will ignore the function call.
>
> For example, the below IR should not be a basic block.
>
>
>   %call17 = call i32* @__errno_location() #14, !dbg !1384
>   %18 = load i32, i32* %call17, align 4, !dbg !1384
>   %19 = load i8*, i8** %dest_dirname, align 4, !dbg !1386
>   call void (i32, i32, i8*, ...) @error(i32 1, i32 %18, i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str.389, i32 0, i    32 0), i8* %19), !dbg !1387
>   br label %if.end18, !dbg !1388
>
> The corresponding binary i s below
>
> .text:0001A530                 BL      __errno_location
> .text:0001A534                 LDR     R1, [R0]        ; errnum
> .text:0001A538                 LDR     R3, [SP,#0x100+var_100]
> .text:0001A53C                 LDR     R2, =aS_1       ; "%s"
> .text:0001A540                 MOV     R0, #1          ; status
> .text:0001A544                 BL      error
> .text:0001A548                 B       loc_1A54C
>
> Here you can see it obviously should not be a basic block because you called two functions! So the control flow graph LLVM generated is also not the real control flow graph, right? Do anyone know why or give me some suggestions?
>

In LLVM, basic blocks can contain function calls. This allows partial and/or full code inlining.

The control flow graph (CFG) referred to in LLVM passes only include the LLVM basic blocks inside a function. In LLVM, only tail-call exits are considered terminator instructions (and thus will delineate the basic block boundaries).

If you want to see function-call graphs, you may want to look at CallGraphSCCPass instead of FunctionPass.

http://llvm.org/docs/WritingAnLLVMPass.html#the-callgraphsccpass-class

Cheers

-- Dean



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] LLVM Block is not the basic block

Tim Northover via llvm-dev
On 5/29/2018 7:59 AM, Muhui Jiang via llvm-dev wrote:
> So the reason why basic blocks can contain function calls is because of
> code inlining?

Not really. Basic blocks are units of code in a context of a specific
function. When execution reaches a function call, it will go outside of
the current function for some time, but then it will return back to the
instruction following the call. From the perspective of the function
containing the call, the call is simply another (potentially complicated
and long-running) instruction.
Generally speaking, a basic block ends at instruction A, if the
instruction following A is not guaranteed to execute after A has
executed. From the point of view of a given function, a call to another
function will return back to the caller, so logically there is no reason
to terminate the basic block at a call. There are of course
complications, like calls that can throw exceptions, or calls that don't
return, but the general idea is that calls do return.

-Krzysztof

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] LLVM Block is not the basic block

Tim Northover via llvm-dev
Hi Krzysztof

I see and I agree with your explanation

However, you know some start of art binary analysis tools like angr will accept LLVM's such kind of design. You know there are some non-return functions. Does LLVM consider this?  Do you have any ideas if I want to create a block that cannot contain function calls with LLVM IR.

Regards
Muhui

2018-05-29 21:20 GMT+08:00 Krzysztof Parzyszek via llvm-dev <[hidden email]>:
On 5/29/2018 7:59 AM, Muhui Jiang via llvm-dev wrote:
So the reason why basic blocks can contain function calls is because of code inlining?

Not really. Basic blocks are units of code in a context of a specific function. When execution reaches a function call, it will go outside of the current function for some time, but then it will return back to the instruction following the call. From the perspective of the function containing the call, the call is simply another (potentially complicated and long-running) instruction.
Generally speaking, a basic block ends at instruction A, if the instruction following A is not guaranteed to execute after A has executed. From the point of view of a given function, a call to another function will return back to the caller, so logically there is no reason to terminate the basic block at a call. There are of course complications, like calls that can throw exceptions, or calls that don't return, but the general idea is that calls do return.

-Krzysztof

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] LLVM Block is not the basic block

Tim Northover via llvm-dev
It doesn't seem that LLVM treats non-returning calls in any special way
in the LLVM IR.

You could probably write your own pass that splits basic blocks at each
call, but there are optimizations that would merge the pieces back
together. If you do the splitting at the right moment, it may work, but
it really depends on your specific application.

-Krzysztof

On 5/29/2018 8:49 AM, Muhui Jiang wrote:

> Hi Krzysztof
>
> I see and I agree with your explanation
>
> However, you know some start of art binary analysis tools like angr will
> accept LLVM's such kind of design. You know there are some non-return
> functions. Does LLVM consider this?  Do you have any ideas if I want to
> create a block that cannot contain function calls with LLVM IR.
>
> Regards
> Muhui
>
> 2018-05-29 21:20 GMT+08:00 Krzysztof Parzyszek via llvm-dev
> <[hidden email] <mailto:[hidden email]>>:
>
>     On 5/29/2018 7:59 AM, Muhui Jiang via llvm-dev wrote:
>
>         So the reason why basic blocks can contain function calls is
>         because of code inlining?
>
>
>     Not really. Basic blocks are units of code in a context of a
>     specific function. When execution reaches a function call, it will
>     go outside of the current function for some time, but then it will
>     return back to the instruction following the call. From the
>     perspective of the function containing the call, the call is simply
>     another (potentially complicated and long-running) instruction.
>     Generally speaking, a basic block ends at instruction A, if the
>     instruction following A is not guaranteed to execute after A has
>     executed. From the point of view of a given function, a call to
>     another function will return back to the caller, so logically there
>     is no reason to terminate the basic block at a call. There are of
>     course complications, like calls that can throw exceptions, or calls
>     that don't return, but the general idea is that calls do return.
>
>     -Krzysztof
>
>     --
>     Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>     hosted by The Linux Foundation
>     _______________________________________________
>     LLVM Developers mailing list
>     [hidden email] <mailto:[hidden email]>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] LLVM Block is not the basic block

Tim Northover via llvm-dev
In reply to this post by Tim Northover via llvm-dev
On 29 May 2018 at 13:40, Muhui Jiang via llvm-dev
<[hidden email]> wrote:
> Here you can see it obviously should not be a basic block because you called
> two functions! So the control flow graph LLVM generated is also not the real
> control flow graph, right? Do anyone know why or give me some suggestions?

The graph is designed to model all issues relevant to the compiler.
Mostly this revolves around making sure values are still live when
needed and that phi instructions work properly to merge different
values in from different paths.

In the case of function calls though, execution always returns to
precisely the next instruction with some registers defined (the return
value), some clobbered, and some memory affected. There's no real
reason to distinguish that from any similar instruction as far as the
compiler is concerned.

The case where a call can affect control-flow is when it might throw
an exception, and that's represented in LLVM IR by a different
"invoke" instruction that does describe the possible return locations.

Not to pile issues on your plate, but you should also be aware that
CodeGen can synthesize calls in some cases. For example accessing a
"thread_local" variable on Apple platforms will generate a call to
__tlv_get_address that is completely invisible in the IR. Memcpy calls
may also appear out of nowhere on most platforms.

To get something close to the CPU-level control flow graph you'd
probably have to run a very late MachineFunction pass that looked not
only at the presented basic blocks, but also checked individual
instructions for isCall[*].

Cheers.

Tim.

[*] And even that would be an abstraction. IMO there's a pretty broad,
grey spectrum when it comes to "real" control flow. Depending on what
you're doing you may or may not include segfaults, floating-point
exceptions, and even asynchronous interrupts as control flow.
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev