[llvm-dev] IR to binary address mapping

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] IR to binary address mapping

div code via llvm-dev
Hi

I know that LLVM provide some debug API for us to know the source code information. For example, every IR instruction's source line number and column number. 

However, are there any method to get a mapping from IR instruction to binary address directly. I don't want to use dwarf line mapping table as a bridge. I think the binary is generated by clang and llvm. I think there definitely is some information about the mapping relationship between LLVM IR and the target binary address. Do anyone has suggestions? Many Thanks

Regards
Muhui

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] IR to binary address mapping

div code via llvm-dev
In theory that’s not exactly possible/accurate. Due to various operations in the Backend like Instruction Legalization, one IR instruction might got emitted into multiple assembly instruction, for example

Zhang

> 在 2018年6月12日,22:30,Muhui Jiang via llvm-dev <[hidden email]> 写道:
>
> Hi
>
> I know that LLVM provide some debug API for us to know the source code information. For example, every IR instruction's source line number and column number.
>
> However, are there any method to get a mapping from IR instruction to binary address directly. I don't want to use dwarf line mapping table as a bridge. I think the binary is generated by clang and llvm. I think there definitely is some information about the mapping relationship between LLVM IR and the target binary address. Do anyone has suggestions? Many Thanks
>
> Regards
> Muhui
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] IR to binary address mapping

div code via llvm-dev
In reply to this post by div code via llvm-dev
After the code is generated all you could really use would be debug info in the line table. There was for a time an LLVM pass ("DebugIR") for creating debug info/line table entries from LLVM textual IR source that'd give you the correspondence between the textual IR and the resulting machine code - but it's dead/removed - some folks might be trying to resurrect it somewhere, but I've not been keeping track.

On Tue, Jun 12, 2018 at 7:30 AM Muhui Jiang via llvm-dev <[hidden email]> wrote:
Hi

I know that LLVM provide some debug API for us to know the source code information. For example, every IR instruction's source line number and column number. 

However, are there any method to get a mapping from IR instruction to binary address directly. I don't want to use dwarf line mapping table as a bridge. I think the binary is generated by clang and llvm. I think there definitely is some information about the mapping relationship between LLVM IR and the target binary address. Do anyone has suggestions? Many Thanks

Regards
Muhui
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] IR to binary address mapping

div code via llvm-dev
In reply to this post by div code via llvm-dev
Hi

However, frontend may also do various operations on the source code and one line number and column number could map to more than one binary address. Why LLVM IR cannot?

Regrads
Muhui

2018-06-12 23:18 GMT+08:00 mayuyu.io <[hidden email]>:
In theory that’s not exactly possible/accurate. Due to various operations in the Backend like Instruction Legalization, one IR instruction might got emitted into multiple assembly instruction, for example

Zhang

> 在 2018年6月12日,22:30,Muhui Jiang via llvm-dev <[hidden email]> 写道:
>
> Hi
>
> I know that LLVM provide some debug API for us to know the source code information. For example, every IR instruction's source line number and column number.
>
> However, are there any method to get a mapping from IR instruction to binary address directly. I don't want to use dwarf line mapping table as a bridge. I think the binary is generated by clang and llvm. I think there definitely is some information about the mapping relationship between LLVM IR and the target binary address. Do anyone has suggestions? Many Thanks
>
> Regards
> Muhui
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] IR to binary address mapping

div code via llvm-dev
In reply to this post by div code via llvm-dev
Hi

If I understand right. You mean LLVM ever has the "DebugIR" pass that can be used to map from textual IR to binary code but it is removed later. Right? If so , I would try to search the related information and thank you very much

Regards
Muhui

2018-06-13 4:27 GMT+08:00 David Blaikie <[hidden email]>:
After the code is generated all you could really use would be debug info in the line table. There was for a time an LLVM pass ("DebugIR") for creating debug info/line table entries from LLVM textual IR source that'd give you the correspondence between the textual IR and the resulting machine code - but it's dead/removed - some folks might be trying to resurrect it somewhere, but I've not been keeping track.

On Tue, Jun 12, 2018 at 7:30 AM Muhui Jiang via llvm-dev <[hidden email]> wrote:
Hi

I know that LLVM provide some debug API for us to know the source code information. For example, every IR instruction's source line number and column number. 

However, are there any method to get a mapping from IR instruction to binary address directly. I don't want to use dwarf line mapping table as a bridge. I think the binary is generated by clang and llvm. I think there definitely is some information about the mapping relationship between LLVM IR and the target binary address. Do anyone has suggestions? Many Thanks

Regards
Muhui
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] IR to binary address mapping

div code via llvm-dev
In reply to this post by div code via llvm-dev

We preserve the source line/column for two reasons.  First, so that any compiler diagnostic messages can point to a source location that caused the diagnostic; second, because debugging information in the final binary wants to be able to map machine instruction addresses back to source locations.  There is never any need for the end-user to map machine instruction addresses back to IR instructions, so we don't maintain any information that could produce such a mapping.

--paulr

 

From: llvm-dev [mailto:[hidden email]] On Behalf Of Muhui Jiang via llvm-dev
Sent: Wednesday, June 13, 2018 3:09 AM
To: mayuyu.io
Cc: llvm-dev
Subject: Re: [llvm-dev] IR to binary address mapping

 

Hi

 

However, frontend may also do various operations on the source code and one line number and column number could map to more than one binary address. Why LLVM IR cannot?

 

Regrads

Muhui

 

2018-06-12 23:18 GMT+08:00 mayuyu.io <[hidden email]>:

In theory that’s not exactly possible/accurate. Due to various operations in the Backend like Instruction Legalization, one IR instruction might got emitted into multiple assembly instruction, for example

Zhang


> 2018612日,22:30Muhui Jiang via llvm-dev <[hidden email]> 写道:
>
> Hi
>
> I know that LLVM provide some debug API for us to know the source code information. For example, every IR instruction's source line number and column number.
>
> However, are there any method to get a mapping from IR instruction to binary address directly. I don't want to use dwarf line mapping table as a bridge. I think the binary is generated by clang and llvm. I think there definitely is some information about the mapping relationship between LLVM IR and the target binary address. Do anyone has suggestions? Many Thanks
>
> Regards
> Muhui

> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

 


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] IR to binary address mapping

div code via llvm-dev
Hi Paul

Thanks for your comments. Suppose I can generate the control flow graph via LLVM Pass or the default option like '-dot-cfg' with opt. However, the control flow graph is based on llvm IR level. I would like to have a control flow graph based on binary level. Thus, I want to map the IR to binary address. 

As far as I know, we used to use the debug information to map the IR to source code and then use the dwarf line mapping table to map to binary address. However, I come across many problems. For example, dwarf mapping table's information is not complete. Sometimes the line number could even be zero. Besides, one line and column number could map to more than one binary address. Thus, I may need the mapping from IR to binary to give me a control flow graph on binary level. Do you have any comments or solutions? Many Thanks

Regards
Muhui



2018-06-13 23:05 GMT+08:00 <[hidden email]>:

We preserve the source line/column for two reasons.  First, so that any compiler diagnostic messages can point to a source location that caused the diagnostic; second, because debugging information in the final binary wants to be able to map machine instruction addresses back to source locations.  There is never any need for the end-user to map machine instruction addresses back to IR instructions, so we don't maintain any information that could produce such a mapping.

--paulr

 

From: llvm-dev [mailto:[hidden email]] On Behalf Of Muhui Jiang via llvm-dev
Sent: Wednesday, June 13, 2018 3:09 AM
To: mayuyu.io
Cc: llvm-dev
Subject: Re: [llvm-dev] IR to binary address mapping

 

Hi

 

However, frontend may also do various operations on the source code and one line number and column number could map to more than one binary address. Why LLVM IR cannot?

 

Regrads

Muhui

 

2018-06-12 23:18 GMT+08:00 mayuyu.io <[hidden email]>:

In theory that’s not exactly possible/accurate. Due to various operations in the Backend like Instruction Legalization, one IR instruction might got emitted into multiple assembly instruction, for example

Zhang


> 2018612日,22:30Muhui Jiang via llvm-dev <[hidden email]> 写道:
>
> Hi
>
> I know that LLVM provide some debug API for us to know the source code information. For example, every IR instruction's source line number and column number.
>
> However, are there any method to get a mapping from IR instruction to binary address directly. I don't want to use dwarf line mapping table as a bridge. I think the binary is generated by clang and llvm. I think there definitely is some information about the mapping relationship between LLVM IR and the target binary address. Do anyone has suggestions? Many Thanks
>
> Regards
> Muhui

> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

 



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] IR to binary address mapping

div code via llvm-dev

I can imagine a machine-IR pass that could construct a CFG shortly before starting the AsmPrinter phase; I am pretty sure there is nothing that would affect the CFG after that point.  I don't know whether such an analysis exists currently.  I also don't know whether it would be straightforward to map an LLVM IR CFG to the machine-IR CFG; maybe not, as there are certainly machine-IR passes that do things like splitting and merging blocks.

Deriving final binary addresses for blocks in the CFG would require that you track or insert labels, and then examine the final binary to determine the addresses for each of those labels.  I am unclear how you would correlate these values with the CFG, however.

--paulr

 

From: Muhui Jiang [mailto:[hidden email]]
Sent: Wednesday, June 13, 2018 11:12 AM
To: Robinson, Paul
Cc: mayuyu.io; llvm-dev
Subject: Re: [llvm-dev] IR to binary address mapping

 

Hi Paul

 

Thanks for your comments. Suppose I can generate the control flow graph via LLVM Pass or the default option like '-dot-cfg' with opt. However, the control flow graph is based on llvm IR level. I would like to have a control flow graph based on binary level. Thus, I want to map the IR to binary address. 

 

As far as I know, we used to use the debug information to map the IR to source code and then use the dwarf line mapping table to map to binary address. However, I come across many problems. For example, dwarf mapping table's information is not complete. Sometimes the line number could even be zero. Besides, one line and column number could map to more than one binary address. Thus, I may need the mapping from IR to binary to give me a control flow graph on binary level. Do you have any comments or solutions? Many Thanks

 

Regards

Muhui

 

 

 

2018-06-13 23:05 GMT+08:00 <[hidden email]>:

We preserve the source line/column for two reasons.  First, so that any compiler diagnostic messages can point to a source location that caused the diagnostic; second, because debugging information in the final binary wants to be able to map machine instruction addresses back to source locations.  There is never any need for the end-user to map machine instruction addresses back to IR instructions, so we don't maintain any information that could produce such a mapping.

--paulr

 

From: llvm-dev [mailto:[hidden email]] On Behalf Of Muhui Jiang via llvm-dev
Sent: Wednesday, June 13, 2018 3:09 AM
To: mayuyu.io
Cc: llvm-dev
Subject: Re: [llvm-dev] IR to binary address mapping

 

Hi

 

However, frontend may also do various operations on the source code and one line number and column number could map to more than one binary address. Why LLVM IR cannot?

 

Regrads

Muhui

 

2018-06-12 23:18 GMT+08:00 mayuyu.io <[hidden email]>:

In theory that’s not exactly possible/accurate. Due to various operations in the Backend like Instruction Legalization, one IR instruction might got emitted into multiple assembly instruction, for example

Zhang


> 2018612日,22:30Muhui Jiang via llvm-dev <[hidden email]> 写道:
>
> Hi
>
> I know that LLVM provide some debug API for us to know the source code information. For example, every IR instruction's source line number and column number.
>
> However, are there any method to get a mapping from IR instruction to binary address directly. I don't want to use dwarf line mapping table as a bridge. I think the binary is generated by clang and llvm. I think there definitely is some information about the mapping relationship between LLVM IR and the target binary address. Do anyone has suggestions? Many Thanks
>
> Regards
> Muhui

> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

 

 


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] IR to binary address mapping

div code via llvm-dev

Hi 

Since it’s midnight in my time zone and I will try to reply to you shortly with mobile but may give you more information tomorrow.

First, thanks for your comments. I may try to google whether such kind of machine IR address exists

Second, to derive a basic blocks binary address. I first use LLVM pass to get the source line number and column number for every instruction inside the block, The I will query the dwarf line mapping table to get the binary address. I use the first instruction’s address as the block’s start address. I found some exceptions. Then I try to found the lowest addresses among all the instructions and set it as the block’s address. however, I found that there are still some exceptions. Thus, I don’t know how to generate a precise binary level control flow graph.

Regards
Muhui

<[hidden email]>于2018年6月13日 周三下午11:53写道:

I can imagine a machine-IR pass that could construct a CFG shortly before starting the AsmPrinter phase; I am pretty sure there is nothing that would affect the CFG after that point.  I don't know whether such an analysis exists currently.  I also don't know whether it would be straightforward to map an LLVM IR CFG to the machine-IR CFG; maybe not, as there are certainly machine-IR passes that do things like splitting and merging blocks.

Deriving final binary addresses for blocks in the CFG would require that you track or insert labels, and then examine the final binary to determine the addresses for each of those labels.  I am unclear how you would correlate these values with the CFG, however.

--paulr

 

From: Muhui Jiang [mailto:[hidden email]]
Sent: Wednesday, June 13, 2018 11:12 AM
To: Robinson, Paul
Cc: mayuyu.io; llvm-dev


Subject: Re: [llvm-dev] IR to binary address mapping

 

Hi Paul

 

Thanks for your comments. Suppose I can generate the control flow graph via LLVM Pass or the default option like '-dot-cfg' with opt. However, the control flow graph is based on llvm IR level. I would like to have a control flow graph based on binary level. Thus, I want to map the IR to binary address. 

 

As far as I know, we used to use the debug information to map the IR to source code and then use the dwarf line mapping table to map to binary address. However, I come across many problems. For example, dwarf mapping table's information is not complete. Sometimes the line number could even be zero. Besides, one line and column number could map to more than one binary address. Thus, I may need the mapping from IR to binary to give me a control flow graph on binary level. Do you have any comments or solutions? Many Thanks

 

Regards

Muhui

 

 

 

2018-06-13 23:05 GMT+08:00 <[hidden email]>:

We preserve the source line/column for two reasons.  First, so that any compiler diagnostic messages can point to a source location that caused the diagnostic; second, because debugging information in the final binary wants to be able to map machine instruction addresses back to source locations.  There is never any need for the end-user to map machine instruction addresses back to IR instructions, so we don't maintain any information that could produce such a mapping.

--paulr

 

From: llvm-dev [mailto:[hidden email]] On Behalf Of Muhui Jiang via llvm-dev
Sent: Wednesday, June 13, 2018 3:09 AM
To: mayuyu.io
Cc: llvm-dev
Subject: Re: [llvm-dev] IR to binary address mapping

 

Hi

 

However, frontend may also do various operations on the source code and one line number and column number could map to more than one binary address. Why LLVM IR cannot?

 

Regrads

Muhui

 

2018-06-12 23:18 GMT+08:00 mayuyu.io <[hidden email]>:

In theory that’s not exactly possible/accurate. Due to various operations in the Backend like Instruction Legalization, one IR instruction might got emitted into multiple assembly instruction, for example

Zhang


> 2018612日,22:30Muhui Jiang via llvm-dev <[hidden email]> 写道:
>
> Hi
>
> I know that LLVM provide some debug API for us to know the source code information. For example, every IR instruction's source line number and column number.
>
> However, are there any method to get a mapping from IR instruction to binary address directly. I don't want to use dwarf line mapping table as a bridge. I think the binary is generated by clang and llvm. I think there definitely is some information about the mapping relationship between LLVM IR and the target binary address. Do anyone has suggestions? Many Thanks
>
> Regards
> Muhui

> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

 

 


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev