[llvm-dev] Function start address

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] Function start address

U.Mutlu via llvm-dev
Hi

I am using LLVM Pass combined with dwarf debug information to get all the function's start address. My steps are below:

First, I write the function pass to get the start line of each function, which is finished.

Then, based on the start line of every single function, I try to query the specific line from the dwarf's line binary table, which is generated with llvm-dwarfdump -debug-line. 

However, About one third of the whole functions' start line is not found in the mapping table. Thus, I can not get the start binary address. I know that the mapping between source locations and binary addresses is not bijective. I am using O1 optimization option. I know that some of the information might be lost legitimately because of optimization. But I don't think dwarf will miss so many functions' start addresses. Am I right? Any useful comments and suggestions are welcomed. Many Thanks

Regards
Muhui

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Function start address

U.Mutlu via llvm-dev
Any particular reason you're using debug info to achieve this (& if you are, why you're using the line table?)? You could query the object/executable file's symbol table to find all the functions in an object or executable, and the instruction/address they start at. Or, if you are using debug info for some reason, you could look in the debug_info rather than the line table, and find the DW_TAG_subprogram for each function and look at its low_pc.

On Fri, Jun 1, 2018 at 3:36 AM Muhui Jiang via llvm-dev <[hidden email]> wrote:
Hi

I am using LLVM Pass combined with dwarf debug information to get all the function's start address. My steps are below:

First, I write the function pass to get the start line of each function, which is finished.

Then, based on the start line of every single function, I try to query the specific line from the dwarf's line binary table, which is generated with llvm-dwarfdump -debug-line. 

However, About one third of the whole functions' start line is not found in the mapping table. Thus, I can not get the start binary address. I know that the mapping between source locations and binary addresses is not bijective. I am using O1 optimization option. I know that some of the information might be lost legitimately because of optimization. But I don't think dwarf will miss so many functions' start addresses. Am I right? Any useful comments and suggestions are welcomed. Many Thanks

Regards
Muhui
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Function start address

U.Mutlu via llvm-dev
In reply to this post by U.Mutlu via llvm-dev

[Re-sending with llvm-dev included this time]

Hi Muhui,

Are the functions emitted to the final binary?  If a function is not used, there might not be any object code for it in the final binary.  Naturally there would be no entry in the line table in this case.

If the function does exist in the binary, it is entirely possible (I think) to have no instruction specifically associated with the function definition's source line, even though other instructions are associated with other lines in the function.  I (or someone) would need to look at a specific example before being able to say one way or the other if that is what you are running into.

 

Have you considered building a static array of function addresses?  If you used weak references it would not interfere with optimizing away entire functions, which I mentioned above.  Or would that be too intrusive into your use case?  Apologies if this suggestion has come up before.

--paulr

 

 

From: llvm-dev [mailto:[hidden email]] On Behalf Of Muhui Jiang via llvm-dev
Sent: Friday, June 01, 2018 6:36 AM
To: llvm-dev
Subject: [llvm-dev] Function start address

 

Hi

 

I am using LLVM Pass combined with dwarf debug information to get all the function's start address. My steps are below:

 

First, I write the function pass to get the start line of each function, which is finished.

 

Then, based on the start line of every single function, I try to query the specific line from the dwarf's line binary table, which is generated with llvm-dwarfdump -debug-line. 

 

However, About one third of the whole functions' start line is not found in the mapping table. Thus, I can not get the start binary address. I know that the mapping between source locations and binary addresses is not bijective. I am using O1 optimization option. I know that some of the information might be lost legitimately because of optimization. But I don't think dwarf will miss so many functions' start addresses. Am I right? Any useful comments and suggestions are welcomed. Many Thanks

 

Regards

Muhui


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Function start address

U.Mutlu via llvm-dev
In reply to this post by U.Mutlu via llvm-dev
Hi Paulr

Thanks for your very useful and quick reply. Below is my response.

Are the functions emitted to the final binary?  If a function is not used, there might not be any object code for it in the final binary.  Naturally there would be no entry in the line table in this case.

==============

Yes, I think so. At least, I use IDAPro to analysis the binary and I am pretty sure that many function are in the binary but its source line mapping table are not in the dwarf mapping table.


If the function does exist in the binary, it is entirely possible (I think) to have no instruction specifically associated with the function definition's source line, even though other instructions are associated with other lines in the function.  I (or someone) would need to look at a specific example before being able to say one way or the other if that is what you are running into.

==============

I don't understand what "no instruction specifically associated with the function definition's source line,"  this sentence mean. Could you please explain more? If you are interested, I am glad to give you a specific example. Please tell me whether you need it.


Have you considered building a static array of function addresses?  If you used weak references it would not interfere with optimizing away entire functions, which I mentioned above.  Or would that be too intrusive into your use case?  Apologies if this suggestion has come up before.

==============

To be honest, No.  Since I am using llvm IR to do the code analysis. Function start address is just a part of the whole evaluation. I would prefer to combine all the tools into one tool with llvm IR and dwarf debug information. Thus, I may not evaluate the  function addresses with a static table. Thank you very much


Regards

Muhui




<[hidden email]>于2018年6月1日 周五下午10:45写道:

Hi Muhui,

Are the functions emitted to the final binary?  If a function is not used, there might not be any object code for it in the final binary.  Naturally there would be no entry in the line table in this case.

If the function does exist in the binary, it is entirely possible (I think) to have no instruction specifically associated with the function definition's source line, even though other instructions are associated with other lines in the function.  I (or someone) would need to look at a specific example before being able to say one way or the other if that is what you are running into.

 

Have you considered building a static array of function addresses?  If you used weak references it would not interfere with optimizing away entire functions, which I mentioned above.  Or would that be too intrusive into your use case?  Apologies if this suggestion has come up before.

--paulr

 

 

From: llvm-dev [mailto:[hidden email]] On Behalf Of Muhui Jiang via llvm-dev
Sent: Friday, June 01, 2018 6:36 AM
To: llvm-dev
Subject: [llvm-dev] Function start address

 

Hi

 

I am using LLVM Pass combined with dwarf debug information to get all the function's start address. My steps are below:

 

First, I write the function pass to get the start line of each function, which is finished.

 

Then, based on the start line of every single function, I try to query the specific line from the dwarf's line binary table, which is generated with llvm-dwarfdump -debug-line. 

 

However, About one third of the whole functions' start line is not found in the mapping table. Thus, I can not get the start binary address. I know that the mapping between source locations and binary addresses is not bijective. I am using O1 optimization option. I know that some of the information might be lost legitimately because of optimization. But I don't think dwarf will miss so many functions' start addresses. Am I right? Any useful comments and suggestions are welcomed. Many Thanks

 

Regards

Muhui


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Function start address

U.Mutlu via llvm-dev
In reply to this post by U.Mutlu via llvm-dev
Hi 

Actually, No particular reason. I just think this might be a solution, then I use think kind of method. Querying the symbol table would be a good choice, but I prefer to use LLVM and dwarf information. I am sorry that I am not familiar with debug_info. But thanks to your suggestions. I would like to try to solve it with debug_info. It seems work according to your comments

By the way, I am still curious about the reason, why dwarf line mapping table would lost so many function's start addresses' information. It would be great if you have any comments on this problem. Many Thanks

Regards
Muhui

2018-06-01 23:00 GMT+08:00 David Blaikie <[hidden email]>:
Any particular reason you're using debug info to achieve this (& if you are, why you're using the line table?)? You could query the object/executable file's symbol table to find all the functions in an object or executable, and the instruction/address they start at. Or, if you are using debug info for some reason, you could look in the debug_info rather than the line table, and find the DW_TAG_subprogram for each function and look at its low_pc.

On Fri, Jun 1, 2018 at 3:36 AM Muhui Jiang via llvm-dev <[hidden email]> wrote:
Hi

I am using LLVM Pass combined with dwarf debug information to get all the function's start address. My steps are below:

First, I write the function pass to get the start line of each function, which is finished.

Then, based on the start line of every single function, I try to query the specific line from the dwarf's line binary table, which is generated with llvm-dwarfdump -debug-line. 

However, About one third of the whole functions' start line is not found in the mapping table. Thus, I can not get the start binary address. I know that the mapping between source locations and binary addresses is not bijective. I am using O1 optimization option. I know that some of the information might be lost legitimately because of optimization. But I don't think dwarf will miss so many functions' start addresses. Am I right? Any useful comments and suggestions are welcomed. Many Thanks

Regards
Muhui
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Function start address

U.Mutlu via llvm-dev
Hi

I tried to grep the "DW_TAG_subprogram" from the debug_info . However, I noticed that the number I found is still less than the whole functions I found with LLVM IR. Do you have any experiences? Many Thanks

Regards
Muhui

2018-06-02 13:34 GMT+08:00 Muhui Jiang <[hidden email]>:
Hi 

Actually, No particular reason. I just think this might be a solution, then I use think kind of method. Querying the symbol table would be a good choice, but I prefer to use LLVM and dwarf information. I am sorry that I am not familiar with debug_info. But thanks to your suggestions. I would like to try to solve it with debug_info. It seems work according to your comments

By the way, I am still curious about the reason, why dwarf line mapping table would lost so many function's start addresses' information. It would be great if you have any comments on this problem. Many Thanks

Regards
Muhui

2018-06-01 23:00 GMT+08:00 David Blaikie <[hidden email]>:
Any particular reason you're using debug info to achieve this (& if you are, why you're using the line table?)? You could query the object/executable file's symbol table to find all the functions in an object or executable, and the instruction/address they start at. Or, if you are using debug info for some reason, you could look in the debug_info rather than the line table, and find the DW_TAG_subprogram for each function and look at its low_pc.

On Fri, Jun 1, 2018 at 3:36 AM Muhui Jiang via llvm-dev <[hidden email]> wrote:
Hi

I am using LLVM Pass combined with dwarf debug information to get all the function's start address. My steps are below:

First, I write the function pass to get the start line of each function, which is finished.

Then, based on the start line of every single function, I try to query the specific line from the dwarf's line binary table, which is generated with llvm-dwarfdump -debug-line. 

However, About one third of the whole functions' start line is not found in the mapping table. Thus, I can not get the start binary address. I know that the mapping between source locations and binary addresses is not bijective. I am using O1 optimization option. I know that some of the information might be lost legitimately because of optimization. But I don't think dwarf will miss so many functions' start addresses. Am I right? Any useful comments and suggestions are welcomed. Many Thanks

Regards
Muhui
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Function start address

U.Mutlu via llvm-dev
In reply to this post by U.Mutlu via llvm-dev

Hi Muhui,

 

If the function does exist in the binary, it is entirely possible (I think) to have no instruction specifically associated with the function definition's source line, even though other instructions are associated with other lines in the function.  I (or someone) would need to look at a specific example before being able to say one way or the other if that is what you are running into.

==============

I don't understand what "no instruction specifically associated with the function definition's source line,"  this sentence mean. Could you please explain more? If you are interested, I am glad to give you a specific example. Please tell me whether you need it.

 

I tried a simple example like this:

 

int f(int a, int b)

{

 return a + b;

}

 

In the IR, it looks like the function definition is given as being on line 1 (the "int f" part), however in the DWARF line table the first instruction is associated with line 2 (the opening brace).  That's what I meant by "no instruction specifically associated with the function definition's source line"; even though the function is defined (according to C/C++) on line 1, there are no instructions for that line; the first instruction of the function is instead associated with the line where the scope starts.

 

I see that there is a different attribute in the IR metadata, called scopeLine, which is 2.  That might be what you want.

 

Have you considered building a static array of function addresses?  If you used weak references it would not interfere with optimizing away entire functions, which I mentioned above.  Or would that be too intrusive into your use case?  Apologies if this suggestion has come up before.

==============

To be honest, No.  Since I am using llvm IR to do the code analysis. Function start address is just a part of the whole evaluation. I would prefer to combine all the tools into one tool with llvm IR and dwarf debug information. Thus, I may not evaluate the  function addresses with a static table. Thank you very much

 

Hmm. I am curious what sort of analysis would be done within the IR, while still depending on the address of the functions in the final object file.  The IR is long gone by the time the compiler is emitting the final instruction stream.

 

Hope this helps,

--paulr

 


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Function start address

U.Mutlu via llvm-dev
In reply to this post by U.Mutlu via llvm-dev

Hi Muhui,

 

I tried to grep the "DW_TAG_subprogram" from the debug_info . However, I noticed that the number I found is still less than the whole functions I found with LLVM IR. Do you have any experiences? Many Thanks

 

The only explanation that comes to mind, is that the functions are not in the final binary object file.  However, previously you said you believed they were present.  If that is the case, please provide us with an example source file and compiler command line, to help diagnose the behavior.

 

Thanks,

--paulr

 


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Function start address

U.Mutlu via llvm-dev
Hi Paulr

I think I've already know the reason. I use the -save-temps to help me to save the LLVM IR during the compiling time. 

You know, there are four different kinds stages and every stage map to one file for one binary. And they are

*.preopt.bc
*.internalize.bc
*.opt.bc
*.precodegen.bc

My LLVM Pass is running on *.preopt.bc so that I get 376 functions. However, when I run the same pass on *.precodegen.bc. I get 266 functions, which is the same number according to the symbol table. My mistake that I didn't consider which bitcode file should I run. Thanks for your suggestions.

Regards
Muhui


2018-06-04 3:08 GMT+08:00 <[hidden email]>:

Hi Muhui,

 

I tried to grep the "DW_TAG_subprogram" from the debug_info . However, I noticed that the number I found is still less than the whole functions I found with LLVM IR. Do you have any experiences? Many Thanks

 

The only explanation that comes to mind, is that the functions are not in the final binary object file.  However, previously you said you believed they were present.  If that is the case, please provide us with an example source file and compiler command line, to help diagnose the behavior.

 

Thanks,

--paulr

 



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Function start address

U.Mutlu via llvm-dev
Hi

I would like to ask something more. From this experience, I think I didn't understand very well on the generation of LLVM IR. 

I am using autotools(configure, make) to compile the binaries. I use the LLVMgold.so and the -save-temps option to save the LLVM IR. If you have any other good suggestions on keeping the LLVM IR, especially on the compilation with autotools, please tell me. Many Thanks.

Regards
Muhui

2018-06-04 13:14 GMT+08:00 Muhui Jiang <[hidden email]>:
Hi Paulr

I think I've already know the reason. I use the -save-temps to help me to save the LLVM IR during the compiling time. 

You know, there are four different kinds stages and every stage map to one file for one binary. And they are

*.preopt.bc
*.internalize.bc
*.opt.bc
*.precodegen.bc

My LLVM Pass is running on *.preopt.bc so that I get 376 functions. However, when I run the same pass on *.precodegen.bc. I get 266 functions, which is the same number according to the symbol table. My mistake that I didn't consider which bitcode file should I run. Thanks for your suggestions.

Regards
Muhui


2018-06-04 3:08 GMT+08:00 <[hidden email]>:

Hi Muhui,

 

I tried to grep the "DW_TAG_subprogram" from the debug_info . However, I noticed that the number I found is still less than the whole functions I found with LLVM IR. Do you have any experiences? Many Thanks

 

The only explanation that comes to mind, is that the functions are not in the final binary object file.  However, previously you said you believed they were present.  If that is the case, please provide us with an example source file and compiler command line, to help diagnose the behavior.

 

Thanks,

--paulr

 




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Function start address

U.Mutlu via llvm-dev
Hi

One more thing I would like to confirm. It seems that the dwarf-info will only contain the functions inside the .text section rather than the .plt section. Am I right? You can check the attached file 

Regards
Muhui

2018-06-04 13:18 GMT+08:00 Muhui Jiang <[hidden email]>:
Hi

I would like to ask something more. From this experience, I think I didn't understand very well on the generation of LLVM IR. 

I am using autotools(configure, make) to compile the binaries. I use the LLVMgold.so and the -save-temps option to save the LLVM IR. If you have any other good suggestions on keeping the LLVM IR, especially on the compilation with autotools, please tell me. Many Thanks.

Regards
Muhui

2018-06-04 13:14 GMT+08:00 Muhui Jiang <[hidden email]>:
Hi Paulr

I think I've already know the reason. I use the -save-temps to help me to save the LLVM IR during the compiling time. 

You know, there are four different kinds stages and every stage map to one file for one binary. And they are

*.preopt.bc
*.internalize.bc
*.opt.bc
*.precodegen.bc

My LLVM Pass is running on *.preopt.bc so that I get 376 functions. However, when I run the same pass on *.precodegen.bc. I get 266 functions, which is the same number according to the symbol table. My mistake that I didn't consider which bitcode file should I run. Thanks for your suggestions.

Regards
Muhui


2018-06-04 3:08 GMT+08:00 <[hidden email]>:

Hi Muhui,

 

I tried to grep the "DW_TAG_subprogram" from the debug_info . However, I noticed that the number I found is still less than the whole functions I found with LLVM IR. Do you have any experiences? Many Thanks

 

The only explanation that comes to mind, is that the functions are not in the final binary object file.  However, previously you said you believed they were present.  If that is the case, please provide us with an example source file and compiler command line, to help diagnose the behavior.

 

Thanks,

--paulr

 





_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

new_cp (459K) Download Attachment