[llvm-dev] [RFC] Profile guided section layout

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] [RFC] Profile guided section layout

Alex Bradbury via llvm-dev
I've recently implemented profile guided section layout in llvm + lld using the Call-Chain Clustering (C³) heuristic from https://research.fb.com/wp-content/uploads/2017/01/cgo2017-hfsort-final1.pdf . In the programs I've tested it on I've gotten from 0% to 5% performance improvement over standard PGO with zero cases of slowdowns and up to 15% reduction in ITLB misses.


There are three parts to this implementation.

The first is a new llvm pass which uses branch frequency info to get counts for each call instruction and then adds a module flags metatdata table of function -> function edges along with their counts.

The second takes the module flags metadata and writes it into a .note.llvm.callgraph section in the object file. This currently just dumps it as text, but could save space by reusing the string table.

The last part is in lld. It reads the .note.llvm.callgraph data from each object file and merges them into a single table. It then builds a call graph based on the profile data then iteratively merges the hottest call edges using the C³ heuristic as long as it would not create a cluster larger than the page size. All clusters are then sorted by a density metric to further improve locality.


There are a couple issues with the current implementation that shouldn't be too hard to fix.

The .note.llvm.callgraph sections add about 10% to the size of each object file. This can be fixed by reusing the symbol string table instead of duplicating all the strings.

The ordering algorithm is n^2 in the number of call graph edges. This is because it's using a trivial algorithm and can be fixed by doing something more intelligent.

It doesn't currently work for LTO as the llvm pass needs to be run after all inlining decisions have been made and LTO codegen has to be done with -ffunction-sections.

It's only implemented for ELF.


I've attached the llvm and lld patches and would appreciate feedback on the overall design, the method of threading the profile data to the linker, and an improved ordering algorithm. No need for a detailed code review as I'll be making changes, adding tests, and posting proper patches for review.

- Michael Spencer

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

pgcl-lld.diff (12K) Download Attachment
pgcl-llvm.diff (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Profile guided section layout

Alex Bradbury via llvm-dev
Hi Michael,

This is cool stuff, thanks for sharing!

On 06/15/2017 11:51 AM, Michael Spencer via llvm-dev wrote:
> The first is a new llvm pass which uses branch frequency info to get
> counts for each call instruction and then adds a module flags
> metatdata table of function -> function edges along with their counts.
>
> The second takes the module flags metadata and writes it into a
> .note.llvm.callgraph section in the object file. This currently just
> dumps it as text, but could save space by reusing the string table.
Have you considered reading the profile in the linker and extracting
that information directly from the profile? The profile should contain
call sites and their sample counts and you could match these up with
relocations (calls) in the section?

> It doesn't currently work for LTO as the llvm pass needs to be run
> after all inlining decisions have been made and LTO codegen has to be
> done with -ffunction-sections.
So this is just an implementation issue, right? You can make LTO run
with -ffunction-sections (by setting
TargetOptions.FunctionSections=true) and insert your pass in the
appropriate place in the pipeline.

Thanks,
Tobias

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Profile guided section layout

Alex Bradbury via llvm-dev
On Thu, Jun 15, 2017 at 10:08 AM, Tobias Edler von Koch <[hidden email]> wrote:
Hi Michael,

This is cool stuff, thanks for sharing!

On 06/15/2017 11:51 AM, Michael Spencer via llvm-dev wrote:
The first is a new llvm pass which uses branch frequency info to get counts for each call instruction and then adds a module flags metatdata table of function -> function edges along with their counts.

The second takes the module flags metadata and writes it into a .note.llvm.callgraph section in the object file. This currently just dumps it as text, but could save space by reusing the string table.
Have you considered reading the profile in the linker and extracting that information directly from the profile? The profile should contain call sites and their sample counts and you could match these up with relocations (calls) in the section?

I did this using IR PGO instead of sample PGO so the profile data can only be applied in the same place in the pipeline it is generated. Even for sample based this would be complicated as the linker would actually need to generate machine basic blocks from sections to be able to accurately match sample counts to relocations, as there may be cold calls in hot functions.

It may be useful however for the linker to directly accept an externally generated call graph profile. The current approach can actually do this by embedding it into an extra object file.
 


It doesn't currently work for LTO as the llvm pass needs to be run after all inlining decisions have been made and LTO codegen has to be done with -ffunction-sections.
So this is just an implementation issue, right? You can make LTO run with -ffunction-sections (by setting TargetOptions.FunctionSections=true) and insert your pass in the appropriate place in the pipeline.

Yeah, just an implementation issue. Just need to build the pass pipeline differently for LTO and add a way to do -ffunction-sections in lld.

- Michael Spencer
 

Thanks,
Tobias

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Profile guided section layout

Alex Bradbury via llvm-dev
Great! Cc'ing Sriraman who implemented something like this several years back on our gcc google branch. We may want to utilize your work in the gold-plugin as well.


On Thu, Jun 15, 2017 at 10:55 AM, Michael Spencer via llvm-dev <[hidden email]> wrote:
On Thu, Jun 15, 2017 at 10:08 AM, Tobias Edler von Koch <[hidden email]> wrote:
Hi Michael,

This is cool stuff, thanks for sharing!

On 06/15/2017 11:51 AM, Michael Spencer via llvm-dev wrote:
The first is a new llvm pass which uses branch frequency info to get counts for each call instruction and then adds a module flags metatdata table of function -> function edges along with their counts.

The second takes the module flags metadata and writes it into a .note.llvm.callgraph section in the object file. This currently just dumps it as text, but could save space by reusing the string table.
Have you considered reading the profile in the linker and extracting that information directly from the profile? The profile should contain call sites and their sample counts and you could match these up with relocations (calls) in the section?

I did this using IR PGO instead of sample PGO so the profile data can only be applied in the same place in the pipeline it is generated. Even for sample based this would be complicated as the linker would actually need to generate machine basic blocks from sections to be able to accurately match sample counts to relocations, as there may be cold calls in hot functions.

It may be useful however for the linker to directly accept an externally generated call graph profile. The current approach can actually do this by embedding it into an extra object file.

Also, doing this in the LLVM IR is best IMO because we can share the work between different linkers.
 
 


It doesn't currently work for LTO as the llvm pass needs to be run after all inlining decisions have been made and LTO codegen has to be done with -ffunction-sections.
So this is just an implementation issue, right? You can make LTO run with -ffunction-sections (by setting TargetOptions.FunctionSections=true) and insert your pass in the appropriate place in the pipeline.

Yeah, just an implementation issue. Just need to build the pass pipeline differently for LTO and add a way to do -ffunction-sections in lld.

Yeah I implemented support to pass that down for the gold-plugin (r284140), I guess you need to do something similar for lld. 

- Michael Spencer
 

Thanks,
Tobias

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev




--
Teresa Johnson | Software Engineer | [hidden email] | <a href="tel:(408)%20460-2413" value="+14084602413" target="_blank">408-460-2413

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Profile guided section layout

Alex Bradbury via llvm-dev
In reply to this post by Alex Bradbury via llvm-dev


On Thu, Jun 15, 2017 at 10:55 AM, Michael Spencer via llvm-dev <[hidden email]> wrote:
On Thu, Jun 15, 2017 at 10:08 AM, Tobias Edler von Koch <[hidden email]> wrote:
Hi Michael,

This is cool stuff, thanks for sharing!

On 06/15/2017 11:51 AM, Michael Spencer via llvm-dev wrote:
The first is a new llvm pass which uses branch frequency info to get counts for each call instruction and then adds a module flags metatdata table of function -> function edges along with their counts.

The second takes the module flags metadata and writes it into a .note.llvm.callgraph section in the object file. This currently just dumps it as text, but could save space by reusing the string table.
Have you considered reading the profile in the linker and extracting that information directly from the profile? The profile should contain call sites and their sample counts and you could match these up with relocations (calls) in the section?

The main reason is that IPO transformations such as inlining and clonining will change the hotness of functions, so the original profile can not be directly for the purpose of function layout.   There is a similar support in Gold plugin for Google GCC.

David


 

I did this using IR PGO instead of sample PGO so the profile data can only be applied in the same place in the pipeline it is generated. Even for sample based this would be complicated as the linker would actually need to generate machine basic blocks from sections to be able to accurately match sample counts to relocations, as there may be cold calls in hot functions.

It may be useful however for the linker to directly accept an externally generated call graph profile. The current approach can actually do this by embedding it into an extra object file.
 


It doesn't currently work for LTO as the llvm pass needs to be run after all inlining decisions have been made and LTO codegen has to be done with -ffunction-sections.
So this is just an implementation issue, right? You can make LTO run with -ffunction-sections (by setting TargetOptions.FunctionSections=true) and insert your pass in the appropriate place in the pipeline.

Yeah, just an implementation issue. Just need to build the pass pipeline differently for LTO and add a way to do -ffunction-sections in lld.

- Michael Spencer
 

Thanks,
Tobias

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Profile guided section layout

Alex Bradbury via llvm-dev


On Thu, Jun 15, 2017 at 11:09 AM, Xinliang David Li via llvm-dev <[hidden email]> wrote:


On Thu, Jun 15, 2017 at 10:55 AM, Michael Spencer via llvm-dev <[hidden email]> wrote:
On Thu, Jun 15, 2017 at 10:08 AM, Tobias Edler von Koch <[hidden email]> wrote:
Hi Michael,

This is cool stuff, thanks for sharing!

On 06/15/2017 11:51 AM, Michael Spencer via llvm-dev wrote:
The first is a new llvm pass which uses branch frequency info to get counts for each call instruction and then adds a module flags metatdata table of function -> function edges along with their counts.

The second takes the module flags metadata and writes it into a .note.llvm.callgraph section in the object file. This currently just dumps it as text, but could save space by reusing the string table.
Have you considered reading the profile in the linker and extracting that information directly from the profile? The profile should contain call sites and their sample counts and you could match these up with relocations (calls) in the section?

The main reason is that IPO transformations such as inlining and clonining will change the hotness of functions, so the original profile can not be directly for the purpose of function layout.   There is a similar support in Gold plugin for Google GCC.

Will this cause issues with ThinLTO? E.g. the thinlto backends are doing inlining of imported functions. Do we have a mechanism for those decisions to be reflected in a global profile for the linker to look at?

In theory the thinlto backends can keep a history of their IPO decisions in metadata or something and emit a section for the linker to aggregate and reconstruct an accurate global profile, but that seems relatively invasive. 

-- Sean Silva
 

David


 

I did this using IR PGO instead of sample PGO so the profile data can only be applied in the same place in the pipeline it is generated. Even for sample based this would be complicated as the linker would actually need to generate machine basic blocks from sections to be able to accurately match sample counts to relocations, as there may be cold calls in hot functions.

It may be useful however for the linker to directly accept an externally generated call graph profile. The current approach can actually do this by embedding it into an extra object file.
 


It doesn't currently work for LTO as the llvm pass needs to be run after all inlining decisions have been made and LTO codegen has to be done with -ffunction-sections.
So this is just an implementation issue, right? You can make LTO run with -ffunction-sections (by setting TargetOptions.FunctionSections=true) and insert your pass in the appropriate place in the pipeline.

Yeah, just an implementation issue. Just need to build the pass pipeline differently for LTO and add a way to do -ffunction-sections in lld.

- Michael Spencer
 

Thanks,
Tobias

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Profile guided section layout

Alex Bradbury via llvm-dev


On Thu, Jun 15, 2017 at 2:30 PM, Sean Silva <[hidden email]> wrote:


On Thu, Jun 15, 2017 at 11:09 AM, Xinliang David Li via llvm-dev <[hidden email]> wrote:


On Thu, Jun 15, 2017 at 10:55 AM, Michael Spencer via llvm-dev <[hidden email]> wrote:
On Thu, Jun 15, 2017 at 10:08 AM, Tobias Edler von Koch <[hidden email]> wrote:
Hi Michael,

This is cool stuff, thanks for sharing!

On 06/15/2017 11:51 AM, Michael Spencer via llvm-dev wrote:
The first is a new llvm pass which uses branch frequency info to get counts for each call instruction and then adds a module flags metatdata table of function -> function edges along with their counts.

The second takes the module flags metadata and writes it into a .note.llvm.callgraph section in the object file. This currently just dumps it as text, but could save space by reusing the string table.
Have you considered reading the profile in the linker and extracting that information directly from the profile? The profile should contain call sites and their sample counts and you could match these up with relocations (calls) in the section?

The main reason is that IPO transformations such as inlining and clonining will change the hotness of functions, so the original profile can not be directly for the purpose of function layout.   There is a similar support in Gold plugin for Google GCC.

Will this cause issues with ThinLTO? E.g. the thinlto backends are doing inlining of imported functions. Do we have a mechanism for those decisions to be reflected in a global profile for the linker to look at?

In theory the thinlto backends can keep a history of their IPO decisions in metadata or something and emit a section for the linker to aggregate and reconstruct an accurate global profile, but that seems relatively invasive. 


Yes, it will cause problems which is also known to GCC's LIPO. We have an intern working on that problem :)

David
 

-- Sean Silva
 

David


 

I did this using IR PGO instead of sample PGO so the profile data can only be applied in the same place in the pipeline it is generated. Even for sample based this would be complicated as the linker would actually need to generate machine basic blocks from sections to be able to accurately match sample counts to relocations, as there may be cold calls in hot functions.

It may be useful however for the linker to directly accept an externally generated call graph profile. The current approach can actually do this by embedding it into an extra object file.
 


It doesn't currently work for LTO as the llvm pass needs to be run after all inlining decisions have been made and LTO codegen has to be done with -ffunction-sections.
So this is just an implementation issue, right? You can make LTO run with -ffunction-sections (by setting TargetOptions.FunctionSections=true) and insert your pass in the appropriate place in the pipeline.

Yeah, just an implementation issue. Just need to build the pass pipeline differently for LTO and add a way to do -ffunction-sections in lld.

- Michael Spencer
 

Thanks,
Tobias

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Profile guided section layout

Alex Bradbury via llvm-dev


On Thu, Jun 15, 2017 at 2:33 PM, Xinliang David Li <[hidden email]> wrote:


On Thu, Jun 15, 2017 at 2:30 PM, Sean Silva <[hidden email]> wrote:


On Thu, Jun 15, 2017 at 11:09 AM, Xinliang David Li via llvm-dev <[hidden email]> wrote:


On Thu, Jun 15, 2017 at 10:55 AM, Michael Spencer via llvm-dev <[hidden email]> wrote:
On Thu, Jun 15, 2017 at 10:08 AM, Tobias Edler von Koch <[hidden email]> wrote:
Hi Michael,

This is cool stuff, thanks for sharing!

On 06/15/2017 11:51 AM, Michael Spencer via llvm-dev wrote:
The first is a new llvm pass which uses branch frequency info to get counts for each call instruction and then adds a module flags metatdata table of function -> function edges along with their counts.

The second takes the module flags metadata and writes it into a .note.llvm.callgraph section in the object file. This currently just dumps it as text, but could save space by reusing the string table.
Have you considered reading the profile in the linker and extracting that information directly from the profile? The profile should contain call sites and their sample counts and you could match these up with relocations (calls) in the section?

The main reason is that IPO transformations such as inlining and clonining will change the hotness of functions, so the original profile can not be directly for the purpose of function layout.   There is a similar support in Gold plugin for Google GCC.

Will this cause issues with ThinLTO? E.g. the thinlto backends are doing inlining of imported functions. Do we have a mechanism for those decisions to be reflected in a global profile for the linker to look at?

In theory the thinlto backends can keep a history of their IPO decisions in metadata or something and emit a section for the linker to aggregate and reconstruct an accurate global profile, but that seems relatively invasive. 


Yes, it will cause problems which is also known to GCC's LIPO. We have an intern working on that problem :)

Nice! What approach is being used to solve it?

-- Sean Silva
 

David
 

-- Sean Silva
 

David


 

I did this using IR PGO instead of sample PGO so the profile data can only be applied in the same place in the pipeline it is generated. Even for sample based this would be complicated as the linker would actually need to generate machine basic blocks from sections to be able to accurately match sample counts to relocations, as there may be cold calls in hot functions.

It may be useful however for the linker to directly accept an externally generated call graph profile. The current approach can actually do this by embedding it into an extra object file.
 


It doesn't currently work for LTO as the llvm pass needs to be run after all inlining decisions have been made and LTO codegen has to be done with -ffunction-sections.
So this is just an implementation issue, right? You can make LTO run with -ffunction-sections (by setting TargetOptions.FunctionSections=true) and insert your pass in the appropriate place in the pipeline.

Yeah, just an implementation issue. Just need to build the pass pipeline differently for LTO and add a way to do -ffunction-sections in lld.

- Michael Spencer
 

Thanks,
Tobias

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev





_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Profile guided section layout

Alex Bradbury via llvm-dev


On Thu, Jun 15, 2017 at 2:39 PM, Sean Silva <[hidden email]> wrote:


On Thu, Jun 15, 2017 at 2:33 PM, Xinliang David Li <[hidden email]> wrote:


On Thu, Jun 15, 2017 at 2:30 PM, Sean Silva <[hidden email]> wrote:


On Thu, Jun 15, 2017 at 11:09 AM, Xinliang David Li via llvm-dev <[hidden email]> wrote:


On Thu, Jun 15, 2017 at 10:55 AM, Michael Spencer via llvm-dev <[hidden email]> wrote:
On Thu, Jun 15, 2017 at 10:08 AM, Tobias Edler von Koch <[hidden email]> wrote:
Hi Michael,

This is cool stuff, thanks for sharing!

On 06/15/2017 11:51 AM, Michael Spencer via llvm-dev wrote:
The first is a new llvm pass which uses branch frequency info to get counts for each call instruction and then adds a module flags metatdata table of function -> function edges along with their counts.

The second takes the module flags metadata and writes it into a .note.llvm.callgraph section in the object file. This currently just dumps it as text, but could save space by reusing the string table.
Have you considered reading the profile in the linker and extracting that information directly from the profile? The profile should contain call sites and their sample counts and you could match these up with relocations (calls) in the section?

The main reason is that IPO transformations such as inlining and clonining will change the hotness of functions, so the original profile can not be directly for the purpose of function layout.   There is a similar support in Gold plugin for Google GCC.

Will this cause issues with ThinLTO? E.g. the thinlto backends are doing inlining of imported functions. Do we have a mechanism for those decisions to be reflected in a global profile for the linker to look at?

In theory the thinlto backends can keep a history of their IPO decisions in metadata or something and emit a section for the linker to aggregate and reconstruct an accurate global profile, but that seems relatively invasive. 


Yes, it will cause problems which is also known to GCC's LIPO. We have an intern working on that problem :)

Nice! What approach is being used to solve it?



The method and results will be shared at some point. One approach is what you mentioned which uses meta data and pass them to linker plugin. Another way to be explored is to let thin link phase to make a very quick cross module inlinining decisions globally and teach the backend inliner to honor the results .  The global inline decisions are not intended to replace the backend inline analysis. The global analysis can for instance, 1) expose linker GC opportunities; 2) can focus on hot functions with few callsites or very few really hot callsites; 3) identify a single module that has the most # of calls to a function and assign that module to be be final info updater; 4) decide the post-link hotness for those functions whose decisions are made at thin-link time.

David

 
-- Sean Silva
 

David
 

-- Sean Silva
 

David


 

I did this using IR PGO instead of sample PGO so the profile data can only be applied in the same place in the pipeline it is generated. Even for sample based this would be complicated as the linker would actually need to generate machine basic blocks from sections to be able to accurately match sample counts to relocations, as there may be cold calls in hot functions.

It may be useful however for the linker to directly accept an externally generated call graph profile. The current approach can actually do this by embedding it into an extra object file.
 


It doesn't currently work for LTO as the llvm pass needs to be run after all inlining decisions have been made and LTO codegen has to be done with -ffunction-sections.
So this is just an implementation issue, right? You can make LTO run with -ffunction-sections (by setting TargetOptions.FunctionSections=true) and insert your pass in the appropriate place in the pipeline.

Yeah, just an implementation issue. Just need to build the pass pipeline differently for LTO and add a way to do -ffunction-sections in lld.

- Michael Spencer
 

Thanks,
Tobias

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev






_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Profile guided section layout

Alex Bradbury via llvm-dev

Michael,

 

I have tried this patch with adaptation to our own linker and it works great. I would like to see this implemented on master.

Please let me know if I can do anything to facilitate this.

 

Sergei

 

From: llvm-dev [mailto:[hidden email]] On Behalf Of Xinliang David Li via llvm-dev
Sent: Thursday, June 15, 2017 5:38 PM
To: Sean Silva <[hidden email]>
Cc: LLVMDev <[hidden email]>; [hidden email]
Subject: Re: [llvm-dev] [RFC] Profile guided section layout

 

 

 

On Thu, Jun 15, 2017 at 2:39 PM, Sean Silva <[hidden email]> wrote:

 

 

On Thu, Jun 15, 2017 at 2:33 PM, Xinliang David Li <[hidden email]> wrote:

 

 

On Thu, Jun 15, 2017 at 2:30 PM, Sean Silva <[hidden email]> wrote:

 

 

On Thu, Jun 15, 2017 at 11:09 AM, Xinliang David Li via llvm-dev <[hidden email]> wrote:

 

 

On Thu, Jun 15, 2017 at 10:55 AM, Michael Spencer via llvm-dev <[hidden email]> wrote:

On Thu, Jun 15, 2017 at 10:08 AM, Tobias Edler von Koch <[hidden email]> wrote:

Hi Michael,

This is cool stuff, thanks for sharing!

On 06/15/2017 11:51 AM, Michael Spencer via llvm-dev wrote:

The first is a new llvm pass which uses branch frequency info to get counts for each call instruction and then adds a module flags metatdata table of function -> function edges along with their counts.

The second takes the module flags metadata and writes it into a .note.llvm.callgraph section in the object file. This currently just dumps it as text, but could save space by reusing the string table.

Have you considered reading the profile in the linker and extracting that information directly from the profile? The profile should contain call sites and their sample counts and you could match these up with relocations (calls) in the section?

 

The main reason is that IPO transformations such as inlining and clonining will change the hotness of functions, so the original profile can not be directly for the purpose of function layout.   There is a similar support in Gold plugin for Google GCC.

 

Will this cause issues with ThinLTO? E.g. the thinlto backends are doing inlining of imported functions. Do we have a mechanism for those decisions to be reflected in a global profile for the linker to look at?

 

In theory the thinlto backends can keep a history of their IPO decisions in metadata or something and emit a section for the linker to aggregate and reconstruct an accurate global profile, but that seems relatively invasive. 

 

 

Yes, it will cause problems which is also known to GCC's LIPO. We have an intern working on that problem :)

 

Nice! What approach is being used to solve it?

 

 

 

The method and results will be shared at some point. One approach is what you mentioned which uses meta data and pass them to linker plugin. Another way to be explored is to let thin link phase to make a very quick cross module inlinining decisions globally and teach the backend inliner to honor the results .  The global inline decisions are not intended to replace the backend inline analysis. The global analysis can for instance, 1) expose linker GC opportunities; 2) can focus on hot functions with few callsites or very few really hot callsites; 3) identify a single module that has the most # of calls to a function and assign that module to be be final info updater; 4) decide the post-link hotness for those functions whose decisions are made at thin-link time.

 

David

 

 

-- Sean Silva

 

 

David

 

 

-- Sean Silva

 

 

David

 

 

 

 

I did this using IR PGO instead of sample PGO so the profile data can only be applied in the same place in the pipeline it is generated. Even for sample based this would be complicated as the linker would actually need to generate machine basic blocks from sections to be able to accurately match sample counts to relocations, as there may be cold calls in hot functions.

 

It may be useful however for the linker to directly accept an externally generated call graph profile. The current approach can actually do this by embedding it into an extra object file.

 

 

It doesn't currently work for LTO as the llvm pass needs to be run after all inlining decisions have been made and LTO codegen has to be done with -ffunction-sections.

So this is just an implementation issue, right? You can make LTO run with -ffunction-sections (by setting TargetOptions.FunctionSections=true) and insert your pass in the appropriate place in the pipeline.

 

Yeah, just an implementation issue. Just need to build the pass pipeline differently for LTO and add a way to do -ffunction-sections in lld.

 

- Michael Spencer

 


Thanks,
Tobias

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.

 


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

 


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

 

 

 

 


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Profile guided section layout

Alex Bradbury via llvm-dev
In reply to this post by Alex Bradbury via llvm-dev
Michael Spencer via llvm-dev <[hidden email]> writes:

> I've recently implemented profile guided section layout in llvm + lld using
> the Call-Chain Clustering (C³) heuristic from
> https://research.fb.com/wp-content/uploads/2017/01/cgo2017-hfsort-final1.pdf
> . In the programs I've tested it on I've gotten from 0% to 5% performance
> improvement over standard PGO with zero cases of slowdowns and up to 15%
> reduction in ITLB misses.
>
>
> There are three parts to this implementation.
>
> The first is a new llvm pass which uses branch frequency info to get counts
> for each call instruction and then adds a module flags metatdata table of
> function -> function edges along with their counts.
>
> The second takes the module flags metadata and writes it into a
> .note.llvm.callgraph section in the object file. This currently just dumps
> it as text, but could save space by reusing the string table.
>
> The last part is in lld. It reads the .note.llvm.callgraph data from each
> object file and merges them into a single table. It then builds a call
> graph based on the profile data then iteratively merges the hottest call
> edges using the C³ heuristic as long as it would not create a cluster
> larger than the page size. All clusters are then sorted by a density metric
> to further improve locality.

Since the branch frequency info is in a llvm specific format, it makes
sense for llvm to read it instead of expecting lld to do it again. Since
.o files is how the compiler talks to the linker, it also makes sense
for llvm to record the required information there.

In the same way, since the linker is the first place with global
knowledge, it makes sense for it to be the one that implements a section
ordering heuristic instead of just being told by some other tool, which
would complicate the build.

However, do we need to start with instrumentation? The original paper
uses sampling with good results and current intel cpus can record every
branch in a program.

I would propose starting with just an lld patch that reads the call
graph from a file. The format would be very similar to what you propose,
just weight,caller,callee.

In a another patch we can then look at instrumentation: Why it is more
convenient for some uses and what performance advantage it might have.

I have written a small tool that usesr intel_bts and 'perf script' to
construct the callgraph. I am giving it a try with your lld patch and
will hopefully post results today.

Cheers,
Rafael
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Profile guided section layout

Alex Bradbury via llvm-dev
Hi Rafael,

On 07/31/2017 04:20 PM, Rafael Avila de Espindola via llvm-dev wrote:
> However, do we need to start with instrumentation? The original paper
> uses sampling with good results and current intel cpus can record every
> branch in a program.
>
> I would propose starting with just an lld patch that reads the call
> graph from a file. The format would be very similar to what you propose,
> just weight,caller,callee.

The advantage of the proposed approach (weighted callgraph section) is
that it's completely transparent: it works regardless of the particular
profiling methodology (as long as there's !perf metadata when the pass
runs). For this reason, it fits neatly into an *existing* PGO-based
build flow. I only need to add 1 compiler flag to enable it. That's a
big plus.

On the other hand, I could see how your idea (callgraph input file for
linker) would be useful in situations where I just want to do section
layout but no PGO in the compiler... and of course for testing of the
linker's sorting algorithm.

So there's merits in both, but for my use cases Michael's original
approach is the most practical.

Tobias

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Profile guided section layout

Alex Bradbury via llvm-dev
Tobias Edler von Koch <[hidden email]> writes:

> Hi Rafael,
>
> On 07/31/2017 04:20 PM, Rafael Avila de Espindola via llvm-dev wrote:
>> However, do we need to start with instrumentation? The original paper
>> uses sampling with good results and current intel cpus can record every
>> branch in a program.
>>
>> I would propose starting with just an lld patch that reads the call
>> graph from a file. The format would be very similar to what you propose,
>> just weight,caller,callee.
>
> The advantage of the proposed approach (weighted callgraph section) is
> that it's completely transparent: it works regardless of the particular
> profiling methodology (as long as there's !perf metadata when the pass
> runs). For this reason, it fits neatly into an *existing* PGO-based
> build flow. I only need to add 1 compiler flag to enable it. That's a
> big plus.
>
> On the other hand, I could see how your idea (callgraph input file for
> linker) would be useful in situations where I just want to do section
> layout but no PGO in the compiler... and of course for testing of the
> linker's sorting algorithm.
>
> So there's merits in both, but for my use cases Michael's original
> approach is the most practical.

Yes, I must stress that I am not proposing having the option of reading
the callgraph from another file *instead* of reading it from .o
files. Just that doing it first decouples most of the lld patch form the
llvm changes and can be useful in cases where only samples are available.

Cheers,
Rafael
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Profile guided section layout

Alex Bradbury via llvm-dev
A rebased version of the lld patch is attached.

Cheers,
Rafael


On 31 July 2017 at 15:11, Rafael Avila de Espindola
<[hidden email]> wrote:

> Tobias Edler von Koch <[hidden email]> writes:
>
>> Hi Rafael,
>>
>> On 07/31/2017 04:20 PM, Rafael Avila de Espindola via llvm-dev wrote:
>>> However, do we need to start with instrumentation? The original paper
>>> uses sampling with good results and current intel cpus can record every
>>> branch in a program.
>>>
>>> I would propose starting with just an lld patch that reads the call
>>> graph from a file. The format would be very similar to what you propose,
>>> just weight,caller,callee.
>>
>> The advantage of the proposed approach (weighted callgraph section) is
>> that it's completely transparent: it works regardless of the particular
>> profiling methodology (as long as there's !perf metadata when the pass
>> runs). For this reason, it fits neatly into an *existing* PGO-based
>> build flow. I only need to add 1 compiler flag to enable it. That's a
>> big plus.
>>
>> On the other hand, I could see how your idea (callgraph input file for
>> linker) would be useful in situations where I just want to do section
>> layout but no PGO in the compiler... and of course for testing of the
>> linker's sorting algorithm.
>>
>> So there's merits in both, but for my use cases Michael's original
>> approach is the most practical.
>
> Yes, I must stress that I am not proposing having the option of reading
> the callgraph from another file *instead* of reading it from .o
> files. Just that doing it first decouples most of the lld patch form the
> llvm changes and can be useful in cases where only samples are available.
>
> Cheers,
> Rafael

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

lld.diff (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Profile guided section layout

Alex Bradbury via llvm-dev
In reply to this post by Alex Bradbury via llvm-dev
On 07/31/2017 05:11 PM, Rafael Avila de Espindola wrote:

>
>> The advantage of the proposed approach (weighted callgraph section) is
>> that it's completely transparent: it works regardless of the particular
>> profiling methodology (as long as there's !perf metadata when the pass
>> runs). For this reason, it fits neatly into an *existing* PGO-based
>> build flow. I only need to add 1 compiler flag to enable it. That's a
>> big plus.
>>
>> On the other hand, I could see how your idea (callgraph input file for
>> linker) would be useful in situations where I just want to do section
>> layout but no PGO in the compiler... and of course for testing of the
>> linker's sorting algorithm.
>>
>> So there's merits in both, but for my use cases Michael's original
>> approach is the most practical.
> Yes, I must stress that I am not proposing having the option of reading
> the callgraph from another file *instead* of reading it from .o
> files. Just that doing it first decouples most of the lld patch form the
> llvm changes and can be useful in cases where only samples are available.

Absolutely!

Michael/Rafael, would you mind uploading the patches to Phabricator so
people can review them there? I think there is enough interest in this
feature to move towards merging it in tree.

Thanks,
Tobias

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Profile guided section layout

Alex Bradbury via llvm-dev
In reply to this post by Alex Bradbury via llvm-dev
I'm definitely interested in seeing this moving forward, (phabricator
will make the review easy), quick question, why the option is phrased
as negative? (i.e. --disable instead of --enable?)

Thanks,

--
Davide

On Mon, Jul 31, 2017 at 3:12 PM, Rafael Espíndola via llvm-dev
<[hidden email]> wrote:

> A rebased version of the lld patch is attached.
>
> Cheers,
> Rafael
>
>
> On 31 July 2017 at 15:11, Rafael Avila de Espindola
> <[hidden email]> wrote:
>> Tobias Edler von Koch <[hidden email]> writes:
>>
>>> Hi Rafael,
>>>
>>> On 07/31/2017 04:20 PM, Rafael Avila de Espindola via llvm-dev wrote:
>>>> However, do we need to start with instrumentation? The original paper
>>>> uses sampling with good results and current intel cpus can record every
>>>> branch in a program.
>>>>
>>>> I would propose starting with just an lld patch that reads the call
>>>> graph from a file. The format would be very similar to what you propose,
>>>> just weight,caller,callee.
>>>
>>> The advantage of the proposed approach (weighted callgraph section) is
>>> that it's completely transparent: it works regardless of the particular
>>> profiling methodology (as long as there's !perf metadata when the pass
>>> runs). For this reason, it fits neatly into an *existing* PGO-based
>>> build flow. I only need to add 1 compiler flag to enable it. That's a
>>> big plus.
>>>
>>> On the other hand, I could see how your idea (callgraph input file for
>>> linker) would be useful in situations where I just want to do section
>>> layout but no PGO in the compiler... and of course for testing of the
>>> linker's sorting algorithm.
>>>
>>> So there's merits in both, but for my use cases Michael's original
>>> approach is the most practical.
>>
>> Yes, I must stress that I am not proposing having the option of reading
>> the callgraph from another file *instead* of reading it from .o
>> files. Just that doing it first decouples most of the lld patch form the
>> llvm changes and can be useful in cases where only samples are available.
>>
>> Cheers,
>> Rafael
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>



--
Davide

"There are no solved problems; there are only problems that are more
or less solved" -- Henri Poincare
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Profile guided section layout

Alex Bradbury via llvm-dev
In reply to this post by Alex Bradbury via llvm-dev


On Jul 31, 2017 2:21 PM, "Rafael Avila de Espindola via llvm-dev" <[hidden email]> wrote:
Michael Spencer via llvm-dev <[hidden email]> writes:

> I've recently implemented profile guided section layout in llvm + lld using
> the Call-Chain Clustering (C³) heuristic from
> https://research.fb.com/wp-content/uploads/2017/01/cgo2017-hfsort-final1.pdf
> . In the programs I've tested it on I've gotten from 0% to 5% performance
> improvement over standard PGO with zero cases of slowdowns and up to 15%
> reduction in ITLB misses.
>
>
> There are three parts to this implementation.
>
> The first is a new llvm pass which uses branch frequency info to get counts
> for each call instruction and then adds a module flags metatdata table of
> function -> function edges along with their counts.
>
> The second takes the module flags metadata and writes it into a
> .note.llvm.callgraph section in the object file. This currently just dumps
> it as text, but could save space by reusing the string table.
>
> The last part is in lld. It reads the .note.llvm.callgraph data from each
> object file and merges them into a single table. It then builds a call
> graph based on the profile data then iteratively merges the hottest call
> edges using the C³ heuristic as long as it would not create a cluster
> larger than the page size. All clusters are then sorted by a density metric
> to further improve locality.

Since the branch frequency info is in a llvm specific format, it makes
sense for llvm to read it instead of expecting lld to do it again. Since
.o files is how the compiler talks to the linker, it also makes sense
for llvm to record the required information there.

In the same way, since the linker is the first place with global
knowledge, it makes sense for it to be the one that implements a section
ordering heuristic instead of just being told by some other tool, which
would complicate the build.

However, do we need to start with instrumentation? The original paper
uses sampling with good results and current intel cpus can record every
branch in a program.

Both sample and instrumentation are turned into the same profile metadata in the IR which is what feeds BranchFrequencyInfo. So it should be naturally agnostic to which method is used.

-- Sean Silva


I would propose starting with just an lld patch that reads the call
graph from a file. The format would be very similar to what you propose,
just weight,caller,callee.

In a another patch we can then look at instrumentation: Why it is more
convenient for some uses and what performance advantage it might have.

I have written a small tool that usesr intel_bts and 'perf script' to
construct the callgraph. I am giving it a try with your lld patch and
will hopefully post results today.

Cheers,
Rafael
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Profile guided section layout

Alex Bradbury via llvm-dev
In reply to this post by Alex Bradbury via llvm-dev
I updated the patch to read a call graph from a text file.

I tested it with the attached call.txt from lld linking chromium.
Unfortunately the resulting lld doesn't seem any faster. One thing I
noticed is that the most used symbols seem to be at the end of the
file.

In any case, can you add tests and send the lld patch for review?

Thanks,
Rafael



On 31 July 2017 at 15:19, Davide Italiano <[hidden email]> wrote:

> I'm definitely interested in seeing this moving forward, (phabricator
> will make the review easy), quick question, why the option is phrased
> as negative? (i.e. --disable instead of --enable?)
>
> Thanks,
>
> --
> Davide
>
> On Mon, Jul 31, 2017 at 3:12 PM, Rafael Espíndola via llvm-dev
> <[hidden email]> wrote:
>> A rebased version of the lld patch is attached.
>>
>> Cheers,
>> Rafael
>>
>>
>> On 31 July 2017 at 15:11, Rafael Avila de Espindola
>> <[hidden email]> wrote:
>>> Tobias Edler von Koch <[hidden email]> writes:
>>>
>>>> Hi Rafael,
>>>>
>>>> On 07/31/2017 04:20 PM, Rafael Avila de Espindola via llvm-dev wrote:
>>>>> However, do we need to start with instrumentation? The original paper
>>>>> uses sampling with good results and current intel cpus can record every
>>>>> branch in a program.
>>>>>
>>>>> I would propose starting with just an lld patch that reads the call
>>>>> graph from a file. The format would be very similar to what you propose,
>>>>> just weight,caller,callee.
>>>>
>>>> The advantage of the proposed approach (weighted callgraph section) is
>>>> that it's completely transparent: it works regardless of the particular
>>>> profiling methodology (as long as there's !perf metadata when the pass
>>>> runs). For this reason, it fits neatly into an *existing* PGO-based
>>>> build flow. I only need to add 1 compiler flag to enable it. That's a
>>>> big plus.
>>>>
>>>> On the other hand, I could see how your idea (callgraph input file for
>>>> linker) would be useful in situations where I just want to do section
>>>> layout but no PGO in the compiler... and of course for testing of the
>>>> linker's sorting algorithm.
>>>>
>>>> So there's merits in both, but for my use cases Michael's original
>>>> approach is the most practical.
>>>
>>> Yes, I must stress that I am not proposing having the option of reading
>>> the callgraph from another file *instead* of reading it from .o
>>> files. Just that doing it first decouples most of the lld patch form the
>>> llvm changes and can be useful in cases where only samples are available.
>>>
>>> Cheers,
>>> Rafael
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
>
> --
> Davide
>
> "There are no solved problems; there are only problems that are more
> or less solved" -- Henri Poincare

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

lld.diff (11K) Download Attachment
calls.txt (593K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Profile guided section layout

Alex Bradbury via llvm-dev
In reply to this post by Alex Bradbury via llvm-dev
On Mon, Jul 31, 2017 at 2:20 PM, Rafael Avila de Espindola <[hidden email]> wrote:
Michael Spencer via llvm-dev <[hidden email]> writes:

> I've recently implemented profile guided section layout in llvm + lld using
> the Call-Chain Clustering (C³) heuristic from
> https://research.fb.com/wp-content/uploads/2017/01/cgo2017-hfsort-final1.pdf
> . In the programs I've tested it on I've gotten from 0% to 5% performance
> improvement over standard PGO with zero cases of slowdowns and up to 15%
> reduction in ITLB misses.
>
>
> There are three parts to this implementation.
>
> The first is a new llvm pass which uses branch frequency info to get counts
> for each call instruction and then adds a module flags metatdata table of
> function -> function edges along with their counts.
>
> The second takes the module flags metadata and writes it into a
> .note.llvm.callgraph section in the object file. This currently just dumps
> it as text, but could save space by reusing the string table.
>
> The last part is in lld. It reads the .note.llvm.callgraph data from each
> object file and merges them into a single table. It then builds a call
> graph based on the profile data then iteratively merges the hottest call
> edges using the C³ heuristic as long as it would not create a cluster
> larger than the page size. All clusters are then sorted by a density metric
> to further improve locality.

Since the branch frequency info is in a llvm specific format, it makes
sense for llvm to read it instead of expecting lld to do it again. Since
.o files is how the compiler talks to the linker, it also makes sense
for llvm to record the required information there.

In the same way, since the linker is the first place with global
knowledge, it makes sense for it to be the one that implements a section
ordering heuristic instead of just being told by some other tool, which
would complicate the build.

However, do we need to start with instrumentation? The original paper
uses sampling with good results and current intel cpus can record every
branch in a program.

I would propose starting with just an lld patch that reads the call
graph from a file. The format would be very similar to what you propose,
just weight,caller,callee.

In a another patch we can then look at instrumentation: Why it is more
convenient for some uses and what performance advantage it might have.

I have written a small tool that usesr intel_bts and 'perf script' to
construct the callgraph. I am giving it a try with your lld patch and
will hopefully post results today.

Cheers,
Rafael

I'm fine with approaching this from lld first, as the input reading is the simplest part of this. I started from instrumentation based because I started with a target program that already had the infrastructure setup to do it.

- Michael Spencer

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [RFC] Profile guided section layout

Alex Bradbury via llvm-dev
In reply to this post by Alex Bradbury via llvm-dev
On Mon, Jul 31, 2017 at 5:26 PM, Rafael Espíndola <[hidden email]> wrote:
I updated the patch to read a call graph from a text file.

I tested it with the attached call.txt from lld linking chromium.
Unfortunately the resulting lld doesn't seem any faster. One thing I
noticed is that the most used symbols seem to be at the end of the
file.

Not all programs will get faster as it really depends on the working set size of the code. While lld is kinda large, if you're not running LTO then the instruction working set is pretty small.


In any case, can you add tests and send the lld patch for review?

Yep, should have that tomorrow.

- Michael Spencer
 

Thanks,
Rafael



On 31 July 2017 at 15:19, Davide Italiano <[hidden email]> wrote:
> I'm definitely interested in seeing this moving forward, (phabricator
> will make the review easy), quick question, why the option is phrased
> as negative? (i.e. --disable instead of --enable?)
>
> Thanks,
>
> --
> Davide
>
> On Mon, Jul 31, 2017 at 3:12 PM, Rafael Espíndola via llvm-dev
> <[hidden email]> wrote:
>> A rebased version of the lld patch is attached.
>>
>> Cheers,
>> Rafael
>>
>>
>> On 31 July 2017 at 15:11, Rafael Avila de Espindola
>> <[hidden email]> wrote:
>>> Tobias Edler von Koch <[hidden email]> writes:
>>>
>>>> Hi Rafael,
>>>>
>>>> On 07/31/2017 04:20 PM, Rafael Avila de Espindola via llvm-dev wrote:
>>>>> However, do we need to start with instrumentation? The original paper
>>>>> uses sampling with good results and current intel cpus can record every
>>>>> branch in a program.
>>>>>
>>>>> I would propose starting with just an lld patch that reads the call
>>>>> graph from a file. The format would be very similar to what you propose,
>>>>> just weight,caller,callee.
>>>>
>>>> The advantage of the proposed approach (weighted callgraph section) is
>>>> that it's completely transparent: it works regardless of the particular
>>>> profiling methodology (as long as there's !perf metadata when the pass
>>>> runs). For this reason, it fits neatly into an *existing* PGO-based
>>>> build flow. I only need to add 1 compiler flag to enable it. That's a
>>>> big plus.
>>>>
>>>> On the other hand, I could see how your idea (callgraph input file for
>>>> linker) would be useful in situations where I just want to do section
>>>> layout but no PGO in the compiler... and of course for testing of the
>>>> linker's sorting algorithm.
>>>>
>>>> So there's merits in both, but for my use cases Michael's original
>>>> approach is the most practical.
>>>
>>> Yes, I must stress that I am not proposing having the option of reading
>>> the callgraph from another file *instead* of reading it from .o
>>> files. Just that doing it first decouples most of the lld patch form the
>>> llvm changes and can be useful in cases where only samples are available.
>>>
>>> Cheers,
>>> Rafael
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> [hidden email]
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
>
>
> --
> Davide
>
> "There are no solved problems; there are only problems that are more
> or less solved" -- Henri Poincare


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
12