[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
43 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Joel E. Denny via llvm-dev
Hi everyone,

I started working on a long-standing request to have the summary dumped in a readable format to text, and specifically to emit to LLVM assembly. Proposal below, please let me know your thoughts.

Thanks,
Teresa

RFC: LLVM Assembly format for ThinLTO Summary

========================================


Background

-----------------


ThinLTO operates on small summaries computed during the compile step (i.e. with “-c -flto=thin”), which are then analyzed and updated during the Thin Link stage, and utilized to perform IR updates during the post-link ThinLTO backends. The summaries are emitted as LLVM Bitcode, however, not currently in the LLVM assembly.


There are two ways to generate a bitcode file containing summary records for a module:

  1. Compile with “clang -c -flto=thin”

  2. Build from LLVM assembly using “opt -module-summary”

Either of these will result in the ModuleSummaryIndex analysis pass (which builds the summary index in memory for a module) to be added to the pipeline just before bitcode emission.


Additionally, a combined index is created by merging all the per-module indexes during the Thin Link, which is optionally emitted as a bitcode file.


Currently, the only way to view these records is via “llvm-bcanalyzer -dump”, then manually decoding the raw bitcode dumps.


Relatedly, there is YAML reader/writer support for CFI related summary fields (-wholeprogramdevirt-read-summary and -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles Saternos implemented support to dump the summary in YAML from llvm-lto2 (D34080), including the rest of the summary fields (D34063), however, there was pushback on the related RFC for dumping via YAML or another format rather than emitting as LLVM assembly.


Goals:


  1. Define LLVM assembly format for summary index

  2. Define interaction between parsing of summary from LLVM assembly and synthesis of new summary index from IR.

  3. Implement printing and parsing of summary index LLVM assembly


Proposed LLVM Assembly Format

----------------------------------------------


There are several top level data structures within the ModuleSummaryIndex:

  1. ModulePathStringTable: Holds the paths to the modules summarized in the index (only one entry for per-module indexes and multiple in the combined index), along with their hashes (for incremental builds and global promotion).

  2. GlobalValueMap: A map from global value GUIDs to the corresponding function/variable/alias summary (or summaries for the combined index and weak linkage).

  3. CFI-related data structures (TypeIdMap, CfiFunctionDefs, and CfiFunctionDecls)


I have a WIP patch to AsmWriter.cpp to print the ModuleSummaryIndex that I was using to play with the format. It currently prints 1 and 2 above. I’ve left the CFI related summary data structures as a TODO for now, until the format is at least conceptually agreed, but from looking at those I don’t see an issue with using the same format (with a note/question for Peter on CFI type test representation below).


I modeled the proposed format on metadata, with a few key differences noted below. Like metadata, I propose enumerating the entries with the SlotTracker, and prefixing them with a special character. Avoiding characters already used in some fashion (i.e. “!” for metadata and “#” for attributes), I initially have chosen “^”. Open to suggestions though.


Consider the following example:


extern void foo();
int X;
int bar() {
 foo();
 return X;
}
void barAlias() __attribute__ ((alias ("bar")));
int main() {
 barAlias();
 return bar();
}

The proposed format has one entry per ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks like:


^0 = module: {path: testA.o, hash: 5487197307045666224}
^1 = gv: {guid: 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags: {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}
^2 = gv: {guid: 6699318081062747564, name: foo}
^3 = gv: {guid: 15822663052811949562, name: main, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}
^4 = gv: {guid: 16434608426314478903, name: bar, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs: {^1}}}}
^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries: {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live: 0, dsoLocal: 1}, aliasee: ^4}}}

Like metadata, the fields are tagged (currently using lower camel case, maybe upper camel case would be preferable).


The proposed format has a structure that reflects the data structures in the summary index. For example, consider the entry “^4”. This corresponds to the function “bar”. The entry for that GUID in the GlobalValueMap contains a list of summaries. For per-module summaries such as this, there will be at most one summary (with no summary list for an external function like “foo”). In the combined summary there may be multiple, e.g. in the case of linkonce_odr functions which have definitions in multiple modules. The summary list for bar (“^4”) contains a FunctionSummary, so the summary is tagged “function:”. The FunctionSummary contains both a flags structure (inherited from the base GlobalValueSummary class), and a funcFlags structure (specific to FunctionSummary). It therefore contains a brace-enclosed list of flag tags/values for each.


Where a global value summary references another global value summary (e.g. via a call list, reference list, or aliasee), the entry is referenced by its slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as “^4”.


Note that in comparison metadata assembly entries tend to be much more decomposed since many metadata fields are themselves metadata (so then entries tend to be shorter with references to other metadata nodes).

Currently, I am emitting the summary entries at the end, after the metadata nodes. Note that the ModuleSummaryIndex is not currently referenced from the Module, and isn’t currently created when parsing the Module IR bitcode (there is a separate derived class for reading the ModuleSummaryIndex from bitcode). This is because they are not currently used at the same time. However, in the future there is no reason why we couldn’t tag the global values in the Module’s LLVM assembly with the corresponding summary entry if the ModuleSummaryIndex is available when printing the Module in the assembly writer. I.e. we could do the following for “main” from the above example when printing the IR definition (note the “^3” at the end):


define  dso_local i32 @main() #0 !dbg !17 ^3 {


For CFI data structures, the format would be similar. It appears that TypeIds are referred to by string name in the top level TypeIdMap (std::map indexed by std::string type identifier), whereas they are referenced by GUID within the FunctionSummary class (i.e. the TypeTests vector and the VFuncId structure). For the LLVM assembly I think there should be a top level entry for each TypeIdMap, which lists both the type identifier string and its GUID (followed by its associated information stored in the map), and the TypeTests/VFuncId references on the FunctionSummary entries can reference it by summary slot number. I.e. something like:


^1 = typeid: {guid: 12345, identifier: name_of_type, …

^2 = gv: {... {function: {.... typeTests: {^1, …


Peter - is that correct and does that sound ok?


Issues when Parsing of Summaries from Assembly

--------------------------------------------------------------------


When reading an LLVM assembly file containing module summary entries, a ModuleSummaryIndex will be created from the entries.


Things to consider are the behavior when:

  • Invoked with “opt -module-summary” (which currently builds a new summary index from the IR). Options:

    1. recompute summary and throw away summary in the assembly file

    2. ignore -module-summary and build the summary from the LLVM assembly

    3. give an error

    4. compare the two summaries (one created from the assembly and the new one created by the analysis phase from the IR), and error if they are different.

My opinion is to do a),  so that the behavior using -module-summary doesn’t change. We also need a way to force building of a fresh module summary for cases where the user has modified the LLVM assembly of the IR (see below).


  • How to handle older LLVM assembly files that don’t contain new summary fields. Options:

    1. Force the LLVM assembly file to be recreated with a new summary. I.e. “opt -module-summary -o - | llvm-dis”.

    2. Auto-upgrade, by silently creating conservative values for the new summary entries.

I lean towards b) (when possible) for user-friendliness and to reduce required churn on test inputs.


  • How to handle partial or incorrect LLVM assembly summary entries. How to handle partial summaries depends in part on how we answer the prior question about auto-upgrading. I think the best option like there is to handle it automatically when possible. However, I do think we should error on glaring errors like obviously missing information. For example, when there is summary data in the LLVM assembly, but summary entries are missing for some global values. E.g. if the user modified the assembly to add a function but forgot to add a corresponding summary entry. We could still have subtle issues (e.g. user adds a new call but forgets to update the caller’s summary call list), but it will be harder to detect those.



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Joel E. Denny via llvm-dev
Hi Teresa

Thanks for the proposal. Serializing out the summary in a readable format is very help for debugging and development. Some comments inline.

On Apr 24, 2018, at 7:43 AM, Teresa Johnson via llvm-dev <[hidden email]> wrote:

Hi everyone,

I started working on a long-standing request to have the summary dumped in a readable format to text, and specifically to emit to LLVM assembly. Proposal below, please let me know your thoughts.

Thanks,
Teresa

RFC: LLVM Assembly format for ThinLTO Summary
========================================

Background
-----------------

ThinLTO operates on small summaries computed during the compile step (i.e. with “-c -flto=thin”), which are then analyzed and updated during the Thin Link stage, and utilized to perform IR updates during the post-link ThinLTO backends. The summaries are emitted as LLVM Bitcode, however, not currently in the LLVM assembly.

There are two ways to generate a bitcode file containing summary records for a module:
  1. Compile with “clang -c -flto=thin”
  2. Build from LLVM assembly using “opt -module-summary”
Either of these will result in the ModuleSummaryIndex analysis pass (which builds the summary index in memory for a module) to be added to the pipeline just before bitcode emission.

Additionally, a combined index is created by merging all the per-module indexes during the Thin Link, which is optionally emitted as a bitcode file.

Currently, the only way to view these records is via “llvm-bcanalyzer -dump”, then manually decoding the raw bitcode dumps.

Relatedly, there is YAML reader/writer support for CFI related summary fields (-wholeprogramdevirt-read-summary and -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles Saternos implemented support to dump the summary in YAML from llvm-lto2 (D34080), including the rest of the summary fields (D34063), however, there was pushback on the related RFC for dumping via YAML or another format rather than emitting as LLVM assembly.

Can you elaborate what the reason for pushback for YAML support? I want to know what concern people have for YAML format so we can address them in LLVM assembly. I could not find much context reading through the review and mailing list.


Goals:

  1. Define LLVM assembly format for summary index
  2. Define interaction between parsing of summary from LLVM assembly and synthesis of new summary index from IR.
  3. Implement printing and parsing of summary index LLVM assembly

Proposed LLVM Assembly Format
----------------------------------------------

There are several top level data structures within the ModuleSummaryIndex:
  1. ModulePathStringTable: Holds the paths to the modules summarized in the index (only one entry for per-module indexes and multiple in the combined index), along with their hashes (for incremental builds and global promotion).
  2. GlobalValueMap: A map from global value GUIDs to the corresponding function/variable/alias summary (or summaries for the combined index and weak linkage).
  3. CFI-related data structures (TypeIdMap, CfiFunctionDefs, and CfiFunctionDecls)

I have a WIP patch to AsmWriter.cpp to print the ModuleSummaryIndex that I was using to play with the format. It currently prints 1 and 2 above. I’ve left the CFI related summary data structures as a TODO for now, until the format is at least conceptually agreed, but from looking at those I don’t see an issue with using the same format (with a note/question for Peter on CFI type test representation below).

I modeled the proposed format on metadata, with a few key differences noted below. Like metadata, I propose enumerating the entries with the SlotTracker, and prefixing them with a special character. Avoiding characters already used in some fashion (i.e. “!” for metadata and “#” for attributes), I initially have chosen “^”. Open to suggestions though.

Is there any reason or downside for just using metadata for summary? We can just stream summary related metadata into summary block in bitcode.


Consider the following example:

extern void foo();
int X;
int bar() {
 foo();
 return X;
}
void barAlias() __attribute__ ((alias ("bar")));
int main() {
 barAlias();
 return bar();
}

The proposed format has one entry per ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks like:

^0 = module: {path: testA.o, hash: 5487197307045666224}
^1 = gv: {guid: 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags: {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}
^2 = gv: {guid: 6699318081062747564, name: foo}
^3 = gv: {guid: 15822663052811949562, name: main, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}
^4 = gv: {guid: 16434608426314478903, name: bar, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs: {^1}}}}
^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries: {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live: 0, dsoLocal: 1}, aliasee: ^4}}}

Like metadata, the fields are tagged (currently using lower camel case, maybe upper camel case would be preferable).

The proposed format has a structure that reflects the data structures in the summary index. For example, consider the entry “^4”. This corresponds to the function “bar”. The entry for that GUID in the GlobalValueMap contains a list of summaries. For per-module summaries such as this, there will be at most one summary (with no summary list for an external function like “foo”). In the combined summary there may be multiple, e.g. in the case of linkonce_odr functions which have definitions in multiple modules. The summary list for bar (“^4”) contains a FunctionSummary, so the summary is tagged “function:”. The FunctionSummary contains both a flags structure (inherited from the base GlobalValueSummary class), and a funcFlags structure (specific to FunctionSummary). It therefore contains a brace-enclosed list of flag tags/values for each.

Where a global value summary references another global value summary (e.g. via a call list, reference list, or aliasee), the entry is referenced by its slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as “^4”.

Note that in comparison metadata assembly entries tend to be much more decomposed since many metadata fields are themselves metadata (so then entries tend to be shorter with references to other metadata nodes).

Currently, I am emitting the summary entries at the end, after the metadata nodes. Note that the ModuleSummaryIndex is not currently referenced from the Module, and isn’t currently created when parsing the Module IR bitcode (there is a separate derived class for reading the ModuleSummaryIndex from bitcode). This is because they are not currently used at the same time. However, in the future there is no reason why we couldn’t tag the global values in the Module’s LLVM assembly with the corresponding summary entry if the ModuleSummaryIndex is available when printing the Module in the assembly writer. I.e. we could do the following for “main” from the above example when printing the IR definition (note the “^3” at the end):

define  dso_local i32 @main() #0 !dbg !17 ^3 {

I don't have any real preference regarding the syntax. Tagging the summary for the IR definition is nice and it increases the readability but it might also has problem. Summary is currently standalone and IR definition doesn't really hold a reference to the summary. You have to lookup through GUID. 
If you make summary to be tightly coupled with IR, should we verify the state of the summary as part of IR verifier? I guess it is related to your concern in the end of the email.


For CFI data structures, the format would be similar. It appears that TypeIds are referred to by string name in the top level TypeIdMap (std::map indexed by std::string type identifier), whereas they are referenced by GUID within the FunctionSummary class (i.e. the TypeTests vector and the VFuncId structure). For the LLVM assembly I think there should be a top level entry for each TypeIdMap, which lists both the type identifier string and its GUID (followed by its associated information stored in the map), and the TypeTests/VFuncId references on the FunctionSummary entries can reference it by summary slot number. I.e. something like:

^1 = typeid: {guid: 12345, identifier: name_of_type, …
^2 = gv: {... {function: {.... typeTests: {^1, …

Peter - is that correct and does that sound ok?

Issues when Parsing of Summaries from Assembly
--------------------------------------------------------------------

When reading an LLVM assembly file containing module summary entries, a ModuleSummaryIndex will be created from the entries.

Things to consider are the behavior when:
  • Invoked with “opt -module-summary” (which currently builds a new summary index from the IR). Options:
    1. recompute summary and throw away summary in the assembly file
    2. ignore -module-summary and build the summary from the LLVM assembly
    3. give an error
    4. compare the two summaries (one created from the assembly and the new one created by the analysis phase from the IR), and error if they are different.
My opinion is to do a),  so that the behavior using -module-summary doesn’t change. We also need a way to force building of a fresh module summary for cases where the user has modified the LLVM assembly of the IR (see below).

I prefer a). d) can be achieved with a different pass.


  • How to handle older LLVM assembly files that don’t contain new summary fields. Options:
    1. Force the LLVM assembly file to be recreated with a new summary. I.e. “opt -module-summary -o - | llvm-dis”.
    2. Auto-upgrade, by silently creating conservative values for the new summary entries.
I lean towards b) (when possible) for user-friendliness and to reduce required churn on test inputs.

Assembly file doesn't need to be compatible. For the older LLVM assembly file don't contains new summary field, we can just error out. The story is different for bitcode file and they should be auto upgraded for compatibility. It is also necessary to debug summary. Otherwise, if you llvm-dis older bitcode, you will get regenerated summary info.


  • How to handle partial or incorrect LLVM assembly summary entries. How to handle partial summaries depends in part on how we answer the prior question about auto-upgrading. I think the best option like there is to handle it automatically when possible. However, I do think we should error on glaring errors like obviously missing information. For example, when there is summary data in the LLVM assembly, but summary entries are missing for some global values. E.g. if the user modified the assembly to add a function but forgot to add a corresponding summary entry. We could still have subtle issues (e.g. user adds a new call but forgets to update the caller’s summary call list), but it will be harder to detect those.

I would prefer to serialize them as them are in the error state if possible. We can rely on a verifier pass to catch the errors so you have the option to skip the verifier to see the wrong summary when debugging. This aligns with LLVM IR as well.

Steven



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Joel E. Denny via llvm-dev
Hi Steven,

Thanks for your comments! Replies inline.

Teresa

On Wed, Apr 25, 2018 at 1:13 PM Steven Wu <[hidden email]> wrote:
Hi Teresa

Thanks for the proposal. Serializing out the summary in a readable format is very help for debugging and development. Some comments inline.

On Apr 24, 2018, at 7:43 AM, Teresa Johnson via llvm-dev <[hidden email]> wrote:

Hi everyone,

I started working on a long-standing request to have the summary dumped in a readable format to text, and specifically to emit to LLVM assembly. Proposal below, please let me know your thoughts.

Thanks,
Teresa

RFC: LLVM Assembly format for ThinLTO Summary
========================================

Background
-----------------

ThinLTO operates on small summaries computed during the compile step (i.e. with “-c -flto=thin”), which are then analyzed and updated during the Thin Link stage, and utilized to perform IR updates during the post-link ThinLTO backends. The summaries are emitted as LLVM Bitcode, however, not currently in the LLVM assembly.

There are two ways to generate a bitcode file containing summary records for a module:
  1. Compile with “clang -c -flto=thin”
  2. Build from LLVM assembly using “opt -module-summary”
Either of these will result in the ModuleSummaryIndex analysis pass (which builds the summary index in memory for a module) to be added to the pipeline just before bitcode emission.

Additionally, a combined index is created by merging all the per-module indexes during the Thin Link, which is optionally emitted as a bitcode file.

Currently, the only way to view these records is via “llvm-bcanalyzer -dump”, then manually decoding the raw bitcode dumps.

Relatedly, there is YAML reader/writer support for CFI related summary fields (-wholeprogramdevirt-read-summary and -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles Saternos implemented support to dump the summary in YAML from llvm-lto2 (D34080), including the rest of the summary fields (D34063), however, there was pushback on the related RFC for dumping via YAML or another format rather than emitting as LLVM assembly.

Can you elaborate what the reason for pushback for YAML support? I want to know what concern people have for YAML format so we can address them in LLVM assembly. I could not find much context reading through the review and mailing list.

The comments were on the RFC ("[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format") and the discussion on https://reviews.llvm.org/D34080 (the latter contains a more abbreviated version of the discussion on the RFC, so that's a good place to look), were essentially that we should prioritize a round-trippable assembly format, and doing YAML (or any other dumper format) is going in the wrong direction from that.



Goals:

  1. Define LLVM assembly format for summary index
  2. Define interaction between parsing of summary from LLVM assembly and synthesis of new summary index from IR.
  3. Implement printing and parsing of summary index LLVM assembly

Proposed LLVM Assembly Format
----------------------------------------------

There are several top level data structures within the ModuleSummaryIndex:
  1. ModulePathStringTable: Holds the paths to the modules summarized in the index (only one entry for per-module indexes and multiple in the combined index), along with their hashes (for incremental builds and global promotion).
  2. GlobalValueMap: A map from global value GUIDs to the corresponding function/variable/alias summary (or summaries for the combined index and weak linkage).
  3. CFI-related data structures (TypeIdMap, CfiFunctionDefs, and CfiFunctionDecls)

I have a WIP patch to AsmWriter.cpp to print the ModuleSummaryIndex that I was using to play with the format. It currently prints 1 and 2 above. I’ve left the CFI related summary data structures as a TODO for now, until the format is at least conceptually agreed, but from looking at those I don’t see an issue with using the same format (with a note/question for Peter on CFI type test representation below).

I modeled the proposed format on metadata, with a few key differences noted below. Like metadata, I propose enumerating the entries with the SlotTracker, and prefixing them with a special character. Avoiding characters already used in some fashion (i.e. “!” for metadata and “#” for attributes), I initially have chosen “^”. Open to suggestions though.

Is there any reason or downside for just using metadata for summary? We can just stream summary related metadata into summary block in bitcode.

I assume you mean use the metadata printing format, i.e. "!"? I considered that but it seemed a little odd to me since this isn't in fact metadata. We could presumably use summary-specific tags that would indicate to the parser that it is not in fact metadata but rather summary, but it seemed cleaner to me to use a separate format.
 


Consider the following example:

extern void foo();
int X;
int bar() {
 foo();
 return X;
}
void barAlias() __attribute__ ((alias ("bar")));
int main() {
 barAlias();
 return bar();
}

The proposed format has one entry per ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks like:

^0 = module: {path: testA.o, hash: 5487197307045666224}
^1 = gv: {guid: 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags: {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}
^2 = gv: {guid: 6699318081062747564, name: foo}
^3 = gv: {guid: 15822663052811949562, name: main, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}
^4 = gv: {guid: 16434608426314478903, name: bar, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs: {^1}}}}
^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries: {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live: 0, dsoLocal: 1}, aliasee: ^4}}}

Like metadata, the fields are tagged (currently using lower camel case, maybe upper camel case would be preferable).

The proposed format has a structure that reflects the data structures in the summary index. For example, consider the entry “^4”. This corresponds to the function “bar”. The entry for that GUID in the GlobalValueMap contains a list of summaries. For per-module summaries such as this, there will be at most one summary (with no summary list for an external function like “foo”). In the combined summary there may be multiple, e.g. in the case of linkonce_odr functions which have definitions in multiple modules. The summary list for bar (“^4”) contains a FunctionSummary, so the summary is tagged “function:”. The FunctionSummary contains both a flags structure (inherited from the base GlobalValueSummary class), and a funcFlags structure (specific to FunctionSummary). It therefore contains a brace-enclosed list of flag tags/values for each.

Where a global value summary references another global value summary (e.g. via a call list, reference list, or aliasee), the entry is referenced by its slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as “^4”.

Note that in comparison metadata assembly entries tend to be much more decomposed since many metadata fields are themselves metadata (so then entries tend to be shorter with references to other metadata nodes).

Currently, I am emitting the summary entries at the end, after the metadata nodes. Note that the ModuleSummaryIndex is not currently referenced from the Module, and isn’t currently created when parsing the Module IR bitcode (there is a separate derived class for reading the ModuleSummaryIndex from bitcode). This is because they are not currently used at the same time. However, in the future there is no reason why we couldn’t tag the global values in the Module’s LLVM assembly with the corresponding summary entry if the ModuleSummaryIndex is available when printing the Module in the assembly writer. I.e. we could do the following for “main” from the above example when printing the IR definition (note the “^3” at the end):

define  dso_local i32 @main() #0 !dbg !17 ^3 {

I don't have any real preference regarding the syntax. Tagging the summary for the IR definition is nice and it increases the readability but it might also has problem. Summary is currently standalone and IR definition doesn't really hold a reference to the summary. You have to lookup through GUID. 
If you make summary to be tightly coupled with IR, should we verify the state of the summary as part of IR verifier? I guess it is related to your concern in the end of the email.

Right - tagging the IR is something we can do in the future, assuming we make the index available to the Module structure when printing, but there isn't currently a reference to it. Presumably it would be optional, i.e. if the summary index is available while printing, lookup the global's entry via its GUID, which is easily accessed from the GlobalValue, and print that.

Regarding verification, I tend to think that it should be a separately available option (as discussed below in your comments).



For CFI data structures, the format would be similar. It appears that TypeIds are referred to by string name in the top level TypeIdMap (std::map indexed by std::string type identifier), whereas they are referenced by GUID within the FunctionSummary class (i.e. the TypeTests vector and the VFuncId structure). For the LLVM assembly I think there should be a top level entry for each TypeIdMap, which lists both the type identifier string and its GUID (followed by its associated information stored in the map), and the TypeTests/VFuncId references on the FunctionSummary entries can reference it by summary slot number. I.e. something like:

^1 = typeid: {guid: 12345, identifier: name_of_type, …
^2 = gv: {... {function: {.... typeTests: {^1, …

Peter - is that correct and does that sound ok?

Issues when Parsing of Summaries from Assembly
--------------------------------------------------------------------

When reading an LLVM assembly file containing module summary entries, a ModuleSummaryIndex will be created from the entries.

Things to consider are the behavior when:
  • Invoked with “opt -module-summary” (which currently builds a new summary index from the IR). Options:
    1. recompute summary and throw away summary in the assembly file
    2. ignore -module-summary and build the summary from the LLVM assembly
    3. give an error
    4. compare the two summaries (one created from the assembly and the new one created by the analysis phase from the IR), and error if they are different.
My opinion is to do a),  so that the behavior using -module-summary doesn’t change. We also need a way to force building of a fresh module summary for cases where the user has modified the LLVM assembly of the IR (see below).

I prefer a). d) can be achieved with a different pass.

Great, agreed.
 


  • How to handle older LLVM assembly files that don’t contain new summary fields. Options:
    1. Force the LLVM assembly file to be recreated with a new summary. I.e. “opt -module-summary -o - | llvm-dis”.
    2. Auto-upgrade, by silently creating conservative values for the new summary entries.
I lean towards b) (when possible) for user-friendliness and to reduce required churn on test inputs.

Assembly file doesn't need to be compatible. For the older LLVM assembly file don't contains new summary field, we can just error out. The story is different for bitcode file and they should be auto upgraded for compatibility. It is also necessary to debug summary. Otherwise, if you llvm-dis older bitcode, you will get regenerated summary info.

Right, the bitcode summary format is auto-upgraded. I suppose there are not so many test .ll files containing summary that it wouldn't be too difficult to require that they be upgraded by patches that introduce new fields. I could go either way here.
 


  • How to handle partial or incorrect LLVM assembly summary entries. How to handle partial summaries depends in part on how we answer the prior question about auto-upgrading. I think the best option like there is to handle it automatically when possible. However, I do think we should error on glaring errors like obviously missing information. For example, when there is summary data in the LLVM assembly, but summary entries are missing for some global values. E.g. if the user modified the assembly to add a function but forgot to add a corresponding summary entry. We could still have subtle issues (e.g. user adds a new call but forgets to update the caller’s summary call list), but it will be harder to detect those.

I would prefer to serialize them as them are in the error state if possible. We can rely on a verifier pass to catch the errors so you have the option to skip the verifier to see the wrong summary when debugging. This aligns with LLVM IR as well.

So, deserialize it into bitcode (assuming that the summary in assembly isn't so corrupted that we can't), then rely on a new summary verifier that runs afterwards (possibly optional) - did I summarize that correctly?

Teresa
 

Steven



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Joel E. Denny via llvm-dev
In reply to this post by Joel E. Denny via llvm-dev
Hi Teresa,

Thanks for sending this proposal out.

I would again like to register my disagreement with the whole idea of writing summaries in LLVM assembly format. In my view it is clear that this is not the right direction, as it only invites additional complexity and more ways for things to go wrong for no real benefit. However, I don't have the energy to argue that point any further, so I won't stand in the way here.

On Tue, Apr 24, 2018 at 7:43 AM, Teresa Johnson <[hidden email]> wrote:
Hi everyone,

I started working on a long-standing request to have the summary dumped in a readable format to text, and specifically to emit to LLVM assembly. Proposal below, please let me know your thoughts.

Thanks,
Teresa

RFC: LLVM Assembly format for ThinLTO Summary

========================================


Background

-----------------


ThinLTO operates on small summaries computed during the compile step (i.e. with “-c -flto=thin”), which are then analyzed and updated during the Thin Link stage, and utilized to perform IR updates during the post-link ThinLTO backends. The summaries are emitted as LLVM Bitcode, however, not currently in the LLVM assembly.


There are two ways to generate a bitcode file containing summary records for a module:

  1. Compile with “clang -c -flto=thin”

  2. Build from LLVM assembly using “opt -module-summary”

Either of these will result in the ModuleSummaryIndex analysis pass (which builds the summary index in memory for a module) to be added to the pipeline just before bitcode emission.


Additionally, a combined index is created by merging all the per-module indexes during the Thin Link, which is optionally emitted as a bitcode file.


Currently, the only way to view these records is via “llvm-bcanalyzer -dump”, then manually decoding the raw bitcode dumps.


Relatedly, there is YAML reader/writer support for CFI related summary fields (-wholeprogramdevirt-read-summary and -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles Saternos implemented support to dump the summary in YAML from llvm-lto2 (D34080), including the rest of the summary fields (D34063), however, there was pushback on the related RFC for dumping via YAML or another format rather than emitting as LLVM assembly.


Goals:


  1. Define LLVM assembly format for summary index

  2. Define interaction between parsing of summary from LLVM assembly and synthesis of new summary index from IR.

  3. Implement printing and parsing of summary index LLVM assembly


Proposed LLVM Assembly Format

----------------------------------------------


There are several top level data structures within the ModuleSummaryIndex:

  1. ModulePathStringTable: Holds the paths to the modules summarized in the index (only one entry for per-module indexes and multiple in the combined index), along with their hashes (for incremental builds and global promotion).

  2. GlobalValueMap: A map from global value GUIDs to the corresponding function/variable/alias summary (or summaries for the combined index and weak linkage).

  3. CFI-related data structures (TypeIdMap, CfiFunctionDefs, and CfiFunctionDecls)


I have a WIP patch to AsmWriter.cpp to print the ModuleSummaryIndex that I was using to play with the format. It currently prints 1 and 2 above. I’ve left the CFI related summary data structures as a TODO for now, until the format is at least conceptually agreed, but from looking at those I don’t see an issue with using the same format (with a note/question for Peter on CFI type test representation below).


I modeled the proposed format on metadata, with a few key differences noted below. Like metadata, I propose enumerating the entries with the SlotTracker, and prefixing them with a special character. Avoiding characters already used in some fashion (i.e. “!” for metadata and “#” for attributes), I initially have chosen “^”. Open to suggestions though.


Consider the following example:


extern void foo();
int X;
int bar() {
 foo();
 return X;
}
void barAlias() __attribute__ ((alias ("bar")));
int main() {
 barAlias();
 return bar();
}

The proposed format has one entry per ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks like:


^0 = module: {path: testA.o, hash: 5487197307045666224}
^1 = gv: {guid: 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags: {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}
^2 = gv: {guid: 6699318081062747564, name: foo}
^3 = gv: {guid: 15822663052811949562, name: main, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}
^4 = gv: {guid: 16434608426314478903, name: bar, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs: {^1}}}}
^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries: {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live: 0, dsoLocal: 1}, aliasee: ^4}}}

Like metadata, the fields are tagged (currently using lower camel case, maybe upper camel case would be preferable).


The proposed format has a structure that reflects the data structures in the summary index. For example, consider the entry “^4”. This corresponds to the function “bar”. The entry for that GUID in the GlobalValueMap contains a list of summaries. For per-module summaries such as this, there will be at most one summary (with no summary list for an external function like “foo”). In the combined summary there may be multiple, e.g. in the case of linkonce_odr functions which have definitions in multiple modules. The summary list for bar (“^4”) contains a FunctionSummary, so the summary is tagged “function:”. The FunctionSummary contains both a flags structure (inherited from the base GlobalValueSummary class), and a funcFlags structure (specific to FunctionSummary). It therefore contains a brace-enclosed list of flag tags/values for each.


Where a global value summary references another global value summary (e.g. via a call list, reference list, or aliasee), the entry is referenced by its slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as “^4”.


Note that in comparison metadata assembly entries tend to be much more decomposed since many metadata fields are themselves metadata (so then entries tend to be shorter with references to other metadata nodes).

Currently, I am emitting the summary entries at the end, after the metadata nodes. Note that the ModuleSummaryIndex is not currently referenced from the Module, and isn’t currently created when parsing the Module IR bitcode (there is a separate derived class for reading the ModuleSummaryIndex from bitcode). This is because they are not currently used at the same time. However, in the future there is no reason why we couldn’t tag the global values in the Module’s LLVM assembly with the corresponding summary entry if the ModuleSummaryIndex is available when printing the Module in the assembly writer. I.e. we could do the following for “main” from the above example when printing the IR definition (note the “^3” at the end):


define  dso_local i32 @main() #0 !dbg !17 ^3 {


For CFI data structures, the format would be similar. It appears that TypeIds are referred to by string name in the top level TypeIdMap (std::map indexed by std::string type identifier), whereas they are referenced by GUID within the FunctionSummary class (i.e. the TypeTests vector and the VFuncId structure). For the LLVM assembly I think there should be a top level entry for each TypeIdMap, which lists both the type identifier string and its GUID (followed by its associated information stored in the map), and the TypeTests/VFuncId references on the FunctionSummary entries can reference it by summary slot number. I.e. something like:


^1 = typeid: {guid: 12345, identifier: name_of_type, …

^2 = gv: {... {function: {.... typeTests: {^1, …


Peter - is that correct and does that sound ok?


I don't think that would work because the purpose of the top-level TypeIdMap is to contain resolutions for each type identifier, and per-module summaries do not contain resolutions (only the combined summary does). What that means in practice is that we would not be able to recover and write out a type identifier name for per-module summaries as part of ^1 in your example (well, we could in principle, because the name is stored somewhere in the function's IR, but that could get complicated). Probably the easiest thing to do is to keep the type identifiers as GUIDs in the function summaries and write out the mapping of type identifiers as a top-level entity.

Peter


Issues when Parsing of Summaries from Assembly

--------------------------------------------------------------------


When reading an LLVM assembly file containing module summary entries, a ModuleSummaryIndex will be created from the entries.


Things to consider are the behavior when:

  • Invoked with “opt -module-summary” (which currently builds a new summary index from the IR). Options:

    1. recompute summary and throw away summary in the assembly file

    2. ignore -module-summary and build the summary from the LLVM assembly

    3. give an error

    4. compare the two summaries (one created from the assembly and the new one created by the analysis phase from the IR), and error if they are different.

My opinion is to do a),  so that the behavior using -module-summary doesn’t change. We also need a way to force building of a fresh module summary for cases where the user has modified the LLVM assembly of the IR (see below).


  • How to handle older LLVM assembly files that don’t contain new summary fields. Options:

    1. Force the LLVM assembly file to be recreated with a new summary. I.e. “opt -module-summary -o - | llvm-dis”.

    2. Auto-upgrade, by silently creating conservative values for the new summary entries.

I lean towards b) (when possible) for user-friendliness and to reduce required churn on test inputs.


  • How to handle partial or incorrect LLVM assembly summary entries. How to handle partial summaries depends in part on how we answer the prior question about auto-upgrading. I think the best option like there is to handle it automatically when possible. However, I do think we should error on glaring errors like obviously missing information. For example, when there is summary data in the LLVM assembly, but summary entries are missing for some global values. E.g. if the user modified the assembly to add a function but forgot to add a corresponding summary entry. We could still have subtle issues (e.g. user adds a new call but forgets to update the caller’s summary call list), but it will be harder to detect those.



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Joel E. Denny via llvm-dev
Hi Peter,
Thanks for your comments, replies below.
Teresa

On Wed, Apr 25, 2018 at 2:08 PM Peter Collingbourne <[hidden email]> wrote:
Hi Teresa,

Thanks for sending this proposal out.

I would again like to register my disagreement with the whole idea of writing summaries in LLVM assembly format. In my view it is clear that this is not the right direction, as it only invites additional complexity and more ways for things to go wrong for no real benefit. However, I don't have the energy to argue that point any further, so I won't stand in the way here.

I assume you are most concerned with the re-assembly/deserialization of the summary. My main goal is to get this dumped into a text format, and I went this route since the last dumper RFC was blocked with the LLVM assembly direction pushed.


On Tue, Apr 24, 2018 at 7:43 AM, Teresa Johnson <[hidden email]> wrote:
Hi everyone,

I started working on a long-standing request to have the summary dumped in a readable format to text, and specifically to emit to LLVM assembly. Proposal below, please let me know your thoughts.

Thanks,
Teresa

RFC: LLVM Assembly format for ThinLTO Summary

========================================


Background

-----------------


ThinLTO operates on small summaries computed during the compile step (i.e. with “-c -flto=thin”), which are then analyzed and updated during the Thin Link stage, and utilized to perform IR updates during the post-link ThinLTO backends. The summaries are emitted as LLVM Bitcode, however, not currently in the LLVM assembly.


There are two ways to generate a bitcode file containing summary records for a module:

  1. Compile with “clang -c -flto=thin”

  2. Build from LLVM assembly using “opt -module-summary”

Either of these will result in the ModuleSummaryIndex analysis pass (which builds the summary index in memory for a module) to be added to the pipeline just before bitcode emission.


Additionally, a combined index is created by merging all the per-module indexes during the Thin Link, which is optionally emitted as a bitcode file.


Currently, the only way to view these records is via “llvm-bcanalyzer -dump”, then manually decoding the raw bitcode dumps.


Relatedly, there is YAML reader/writer support for CFI related summary fields (-wholeprogramdevirt-read-summary and -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles Saternos implemented support to dump the summary in YAML from llvm-lto2 (D34080), including the rest of the summary fields (D34063), however, there was pushback on the related RFC for dumping via YAML or another format rather than emitting as LLVM assembly.


Goals:


  1. Define LLVM assembly format for summary index

  2. Define interaction between parsing of summary from LLVM assembly and synthesis of new summary index from IR.

  3. Implement printing and parsing of summary index LLVM assembly


Proposed LLVM Assembly Format

----------------------------------------------


There are several top level data structures within the ModuleSummaryIndex:

  1. ModulePathStringTable: Holds the paths to the modules summarized in the index (only one entry for per-module indexes and multiple in the combined index), along with their hashes (for incremental builds and global promotion).

  2. GlobalValueMap: A map from global value GUIDs to the corresponding function/variable/alias summary (or summaries for the combined index and weak linkage).

  3. CFI-related data structures (TypeIdMap, CfiFunctionDefs, and CfiFunctionDecls)


I have a WIP patch to AsmWriter.cpp to print the ModuleSummaryIndex that I was using to play with the format. It currently prints 1 and 2 above. I’ve left the CFI related summary data structures as a TODO for now, until the format is at least conceptually agreed, but from looking at those I don’t see an issue with using the same format (with a note/question for Peter on CFI type test representation below).


I modeled the proposed format on metadata, with a few key differences noted below. Like metadata, I propose enumerating the entries with the SlotTracker, and prefixing them with a special character. Avoiding characters already used in some fashion (i.e. “!” for metadata and “#” for attributes), I initially have chosen “^”. Open to suggestions though.


Consider the following example:


extern void foo();
int X;
int bar() {
 foo();
 return X;
}
void barAlias() __attribute__ ((alias ("bar")));
int main() {
 barAlias();
 return bar();
}

The proposed format has one entry per ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks like:


^0 = module: {path: testA.o, hash: 5487197307045666224}
^1 = gv: {guid: 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags: {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}
^2 = gv: {guid: 6699318081062747564, name: foo}
^3 = gv: {guid: 15822663052811949562, name: main, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}
^4 = gv: {guid: 16434608426314478903, name: bar, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs: {^1}}}}
^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries: {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live: 0, dsoLocal: 1}, aliasee: ^4}}}

Like metadata, the fields are tagged (currently using lower camel case, maybe upper camel case would be preferable).


The proposed format has a structure that reflects the data structures in the summary index. For example, consider the entry “^4”. This corresponds to the function “bar”. The entry for that GUID in the GlobalValueMap contains a list of summaries. For per-module summaries such as this, there will be at most one summary (with no summary list for an external function like “foo”). In the combined summary there may be multiple, e.g. in the case of linkonce_odr functions which have definitions in multiple modules. The summary list for bar (“^4”) contains a FunctionSummary, so the summary is tagged “function:”. The FunctionSummary contains both a flags structure (inherited from the base GlobalValueSummary class), and a funcFlags structure (specific to FunctionSummary). It therefore contains a brace-enclosed list of flag tags/values for each.


Where a global value summary references another global value summary (e.g. via a call list, reference list, or aliasee), the entry is referenced by its slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as “^4”.


Note that in comparison metadata assembly entries tend to be much more decomposed since many metadata fields are themselves metadata (so then entries tend to be shorter with references to other metadata nodes).

Currently, I am emitting the summary entries at the end, after the metadata nodes. Note that the ModuleSummaryIndex is not currently referenced from the Module, and isn’t currently created when parsing the Module IR bitcode (there is a separate derived class for reading the ModuleSummaryIndex from bitcode). This is because they are not currently used at the same time. However, in the future there is no reason why we couldn’t tag the global values in the Module’s LLVM assembly with the corresponding summary entry if the ModuleSummaryIndex is available when printing the Module in the assembly writer. I.e. we could do the following for “main” from the above example when printing the IR definition (note the “^3” at the end):


define  dso_local i32 @main() #0 !dbg !17 ^3 {


For CFI data structures, the format would be similar. It appears that TypeIds are referred to by string name in the top level TypeIdMap (std::map indexed by std::string type identifier), whereas they are referenced by GUID within the FunctionSummary class (i.e. the TypeTests vector and the VFuncId structure). For the LLVM assembly I think there should be a top level entry for each TypeIdMap, which lists both the type identifier string and its GUID (followed by its associated information stored in the map), and the TypeTests/VFuncId references on the FunctionSummary entries can reference it by summary slot number. I.e. something like:


^1 = typeid: {guid: 12345, identifier: name_of_type, …

^2 = gv: {... {function: {.... typeTests: {^1, …


Peter - is that correct and does that sound ok?


I don't think that would work because the purpose of the top-level TypeIdMap is to contain resolutions for each type identifier, and per-module summaries do not contain resolutions (only the combined summary does). What that means in practice is that we would not be able to recover and write out a type identifier name for per-module summaries as part of ^1 in your example (well, we could in principle, because the name is stored somewhere in the function's IR, but that could get complicated).

Ah ok. I guess the top-level map then is generated by the regular LTO portion of the link (since it presumably requires IR during the Thin Link to get into the combined summary)?

Probably the easiest thing to do is to keep the type identifiers as GUIDs in the function summaries and write out the mapping of type identifiers as a top-level entity.

To confirm, you mean during the compile step create a top-level entity that maps GUID -> identifier?

Thanks,
Teresa
 

Peter


Issues when Parsing of Summaries from Assembly

--------------------------------------------------------------------


When reading an LLVM assembly file containing module summary entries, a ModuleSummaryIndex will be created from the entries.


Things to consider are the behavior when:

  • Invoked with “opt -module-summary” (which currently builds a new summary index from the IR). Options:

    1. recompute summary and throw away summary in the assembly file

    2. ignore -module-summary and build the summary from the LLVM assembly

    3. give an error

    4. compare the two summaries (one created from the assembly and the new one created by the analysis phase from the IR), and error if they are different.

My opinion is to do a),  so that the behavior using -module-summary doesn’t change. We also need a way to force building of a fresh module summary for cases where the user has modified the LLVM assembly of the IR (see below).


  • How to handle older LLVM assembly files that don’t contain new summary fields. Options:

    1. Force the LLVM assembly file to be recreated with a new summary. I.e. “opt -module-summary -o - | llvm-dis”.

    2. Auto-upgrade, by silently creating conservative values for the new summary entries.

I lean towards b) (when possible) for user-friendliness and to reduce required churn on test inputs.


  • How to handle partial or incorrect LLVM assembly summary entries. How to handle partial summaries depends in part on how we answer the prior question about auto-upgrading. I think the best option like there is to handle it automatically when possible. However, I do think we should error on glaring errors like obviously missing information. For example, when there is summary data in the LLVM assembly, but summary entries are missing for some global values. E.g. if the user modified the assembly to add a function but forgot to add a corresponding summary entry. We could still have subtle issues (e.g. user adds a new call but forgets to update the caller’s summary call list), but it will be harder to detect those.



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



--
-- 
Peter


--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Joel E. Denny via llvm-dev
On Mon, Apr 30, 2018 at 8:32 AM, Teresa Johnson <[hidden email]> wrote:
Hi Peter,
Thanks for your comments, replies below.
Teresa

On Wed, Apr 25, 2018 at 2:08 PM Peter Collingbourne <[hidden email]> wrote:
Hi Teresa,

Thanks for sending this proposal out.

I would again like to register my disagreement with the whole idea of writing summaries in LLVM assembly format. In my view it is clear that this is not the right direction, as it only invites additional complexity and more ways for things to go wrong for no real benefit. However, I don't have the energy to argue that point any further, so I won't stand in the way here.

I assume you are most concerned with the re-assembly/deserialization of the summary. My main goal is to get this dumped into a text format, and I went this route since the last dumper RFC was blocked with the LLVM assembly direction pushed.

Yes, my main concern is with the deserialization. My view is that it should only be allowed for combined summaries -- allowing it for per-module is unnecessary as it creates the possibility of things gettting out of sync. Given that, we don't actually need an assembly representation and we can use whichever format is most convenient. But given the opposition to this viewpoint I am willing to concede getting the format (IMHO) right in favour of something that is acceptable to others, just so that we can test things in a more reasonable way.


On Tue, Apr 24, 2018 at 7:43 AM, Teresa Johnson <[hidden email]> wrote:
Hi everyone,

I started working on a long-standing request to have the summary dumped in a readable format to text, and specifically to emit to LLVM assembly. Proposal below, please let me know your thoughts.

Thanks,
Teresa

RFC: LLVM Assembly format for ThinLTO Summary

========================================


Background

-----------------


ThinLTO operates on small summaries computed during the compile step (i.e. with “-c -flto=thin”), which are then analyzed and updated during the Thin Link stage, and utilized to perform IR updates during the post-link ThinLTO backends. The summaries are emitted as LLVM Bitcode, however, not currently in the LLVM assembly.


There are two ways to generate a bitcode file containing summary records for a module:

  1. Compile with “clang -c -flto=thin”

  2. Build from LLVM assembly using “opt -module-summary”

Either of these will result in the ModuleSummaryIndex analysis pass (which builds the summary index in memory for a module) to be added to the pipeline just before bitcode emission.


Additionally, a combined index is created by merging all the per-module indexes during the Thin Link, which is optionally emitted as a bitcode file.


Currently, the only way to view these records is via “llvm-bcanalyzer -dump”, then manually decoding the raw bitcode dumps.


Relatedly, there is YAML reader/writer support for CFI related summary fields (-wholeprogramdevirt-read-summary and -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles Saternos implemented support to dump the summary in YAML from llvm-lto2 (D34080), including the rest of the summary fields (D34063), however, there was pushback on the related RFC for dumping via YAML or another format rather than emitting as LLVM assembly.


Goals:


  1. Define LLVM assembly format for summary index

  2. Define interaction between parsing of summary from LLVM assembly and synthesis of new summary index from IR.

  3. Implement printing and parsing of summary index LLVM assembly


Proposed LLVM Assembly Format

----------------------------------------------


There are several top level data structures within the ModuleSummaryIndex:

  1. ModulePathStringTable: Holds the paths to the modules summarized in the index (only one entry for per-module indexes and multiple in the combined index), along with their hashes (for incremental builds and global promotion).

  2. GlobalValueMap: A map from global value GUIDs to the corresponding function/variable/alias summary (or summaries for the combined index and weak linkage).

  3. CFI-related data structures (TypeIdMap, CfiFunctionDefs, and CfiFunctionDecls)


I have a WIP patch to AsmWriter.cpp to print the ModuleSummaryIndex that I was using to play with the format. It currently prints 1 and 2 above. I’ve left the CFI related summary data structures as a TODO for now, until the format is at least conceptually agreed, but from looking at those I don’t see an issue with using the same format (with a note/question for Peter on CFI type test representation below).


I modeled the proposed format on metadata, with a few key differences noted below. Like metadata, I propose enumerating the entries with the SlotTracker, and prefixing them with a special character. Avoiding characters already used in some fashion (i.e. “!” for metadata and “#” for attributes), I initially have chosen “^”. Open to suggestions though.


Consider the following example:


extern void foo();
int X;
int bar() {
 foo();
 return X;
}
void barAlias() __attribute__ ((alias ("bar")));
int main() {
 barAlias();
 return bar();
}

The proposed format has one entry per ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks like:


^0 = module: {path: testA.o, hash: 5487197307045666224}
^1 = gv: {guid: 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags: {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}
^2 = gv: {guid: 6699318081062747564, name: foo}
^3 = gv: {guid: 15822663052811949562, name: main, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}
^4 = gv: {guid: 16434608426314478903, name: bar, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs: {^1}}}}
^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries: {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live: 0, dsoLocal: 1}, aliasee: ^4}}}

Like metadata, the fields are tagged (currently using lower camel case, maybe upper camel case would be preferable).


The proposed format has a structure that reflects the data structures in the summary index. For example, consider the entry “^4”. This corresponds to the function “bar”. The entry for that GUID in the GlobalValueMap contains a list of summaries. For per-module summaries such as this, there will be at most one summary (with no summary list for an external function like “foo”). In the combined summary there may be multiple, e.g. in the case of linkonce_odr functions which have definitions in multiple modules. The summary list for bar (“^4”) contains a FunctionSummary, so the summary is tagged “function:”. The FunctionSummary contains both a flags structure (inherited from the base GlobalValueSummary class), and a funcFlags structure (specific to FunctionSummary). It therefore contains a brace-enclosed list of flag tags/values for each.


Where a global value summary references another global value summary (e.g. via a call list, reference list, or aliasee), the entry is referenced by its slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as “^4”.


Note that in comparison metadata assembly entries tend to be much more decomposed since many metadata fields are themselves metadata (so then entries tend to be shorter with references to other metadata nodes).

Currently, I am emitting the summary entries at the end, after the metadata nodes. Note that the ModuleSummaryIndex is not currently referenced from the Module, and isn’t currently created when parsing the Module IR bitcode (there is a separate derived class for reading the ModuleSummaryIndex from bitcode). This is because they are not currently used at the same time. However, in the future there is no reason why we couldn’t tag the global values in the Module’s LLVM assembly with the corresponding summary entry if the ModuleSummaryIndex is available when printing the Module in the assembly writer. I.e. we could do the following for “main” from the above example when printing the IR definition (note the “^3” at the end):


define  dso_local i32 @main() #0 !dbg !17 ^3 {


For CFI data structures, the format would be similar. It appears that TypeIds are referred to by string name in the top level TypeIdMap (std::map indexed by std::string type identifier), whereas they are referenced by GUID within the FunctionSummary class (i.e. the TypeTests vector and the VFuncId structure). For the LLVM assembly I think there should be a top level entry for each TypeIdMap, which lists both the type identifier string and its GUID (followed by its associated information stored in the map), and the TypeTests/VFuncId references on the FunctionSummary entries can reference it by summary slot number. I.e. something like:


^1 = typeid: {guid: 12345, identifier: name_of_type, …

^2 = gv: {... {function: {.... typeTests: {^1, …


Peter - is that correct and does that sound ok?


I don't think that would work because the purpose of the top-level TypeIdMap is to contain resolutions for each type identifier, and per-module summaries do not contain resolutions (only the combined summary does). What that means in practice is that we would not be able to recover and write out a type identifier name for per-module summaries as part of ^1 in your example (well, we could in principle, because the name is stored somewhere in the function's IR, but that could get complicated).

Ah ok. I guess the top-level map then is generated by the regular LTO portion of the link (since it presumably requires IR during the Thin Link to get into the combined summary)?

Yes, we fill in the map during the LowerTypeTests and WholeProgramDevirt passes in the regular LTO part of the link, e.g. here:


Probably the easiest thing to do is to keep the type identifiers as GUIDs in the function summaries and write out the mapping of type identifiers as a top-level entity.

To confirm, you mean during the compile step create a top-level entity that maps GUID -> identifier?

I mean that you could represent this with something like:

^typeids = {^1, ^2, ^3}
^1 = typeid: {identifier: typeid1, ...}
^2 = typeid: {identifier: typeid2, ...}
^3 = typeid: {identifier: typeid3, ...}

There's no need to store the GUIDs here because they can be computed from the type identifiers. The GUIDs would only be stored in the typeTests (etc.) fields in each function summary.

Peter

Thanks,
Teresa
 

Peter


Issues when Parsing of Summaries from Assembly

--------------------------------------------------------------------


When reading an LLVM assembly file containing module summary entries, a ModuleSummaryIndex will be created from the entries.


Things to consider are the behavior when:

  • Invoked with “opt -module-summary” (which currently builds a new summary index from the IR). Options:

    1. recompute summary and throw away summary in the assembly file

    2. ignore -module-summary and build the summary from the LLVM assembly

    3. give an error

    4. compare the two summaries (one created from the assembly and the new one created by the analysis phase from the IR), and error if they are different.

My opinion is to do a),  so that the behavior using -module-summary doesn’t change. We also need a way to force building of a fresh module summary for cases where the user has modified the LLVM assembly of the IR (see below).


  • How to handle older LLVM assembly files that don’t contain new summary fields. Options:

    1. Force the LLVM assembly file to be recreated with a new summary. I.e. “opt -module-summary -o - | llvm-dis”.

    2. Auto-upgrade, by silently creating conservative values for the new summary entries.

I lean towards b) (when possible) for user-friendliness and to reduce required churn on test inputs.


  • How to handle partial or incorrect LLVM assembly summary entries. How to handle partial summaries depends in part on how we answer the prior question about auto-upgrading. I think the best option like there is to handle it automatically when possible. However, I do think we should error on glaring errors like obviously missing information. For example, when there is summary data in the LLVM assembly, but summary entries are missing for some global values. E.g. if the user modified the assembly to add a function but forgot to add a corresponding summary entry. We could still have subtle issues (e.g. user adds a new call but forgets to update the caller’s summary call list), but it will be harder to detect those.



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



--
-- 
Peter


--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Joel E. Denny via llvm-dev
In reply to this post by Joel E. Denny via llvm-dev
Hi Teresa,

Awesome to see - looking forward to it!

On Tue, Apr 24, 2018 at 7:44 AM Teresa Johnson via llvm-dev <[hidden email]> wrote:
Hi everyone,

I started working on a long-standing request to have the summary dumped in a readable format to text, and specifically to emit to LLVM assembly. Proposal below, please let me know your thoughts.

Thanks,
Teresa

RFC: LLVM Assembly format for ThinLTO Summary

========================================


Background

-----------------


ThinLTO operates on small summaries computed during the compile step (i.e. with “-c -flto=thin”), which are then analyzed and updated during the Thin Link stage, and utilized to perform IR updates during the post-link ThinLTO backends. The summaries are emitted as LLVM Bitcode, however, not currently in the LLVM assembly.


There are two ways to generate a bitcode file containing summary records for a module:

  1. Compile with “clang -c -flto=thin”


As an aside - I seem to recall that at least internally at Google some kind of summary-only bitcode files are used (so that the whole bitcode file (especially in builds with debug info) doesn't have to be shipped to the node doing the summary merging). How are those summary-only files produced? Is that upstream? Or done in a more low-level way (like an objcopy, llvm-* tool invocation done as a post-processing step, etc)?
 
  1. Build from LLVM assembly using “opt -module-summary”

Either of these will result in the ModuleSummaryIndex analysis pass (which builds the summary index in memory for a module) to be added to the pipeline just before bitcode emission.


Additionally, a combined index is created by merging all the per-module indexes during the Thin Link, which is optionally emitted as a bitcode file.


Currently, the only way to view these records is via “llvm-bcanalyzer -dump”, then manually decoding the raw bitcode dumps.


Relatedly, there is YAML reader/writer support for CFI related summary fields (-wholeprogramdevirt-read-summary and -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles Saternos implemented support to dump the summary in YAML from llvm-lto2 (D34080), including the rest of the summary fields (D34063), however, there was pushback on the related RFC for dumping via YAML or another format rather than emitting as LLVM assembly.


Goals:


  1. Define LLVM assembly format for summary index

  2. Define interaction between parsing of summary from LLVM assembly and synthesis of new summary index from IR.

  3. Implement printing and parsing of summary index LLVM assembly


Proposed LLVM Assembly Format

----------------------------------------------


There are several top level data structures within the ModuleSummaryIndex:

  1. ModulePathStringTable: Holds the paths to the modules summarized in the index (only one entry for per-module indexes and multiple in the combined index), along with their hashes (for incremental builds and global promotion).

  2. GlobalValueMap: A map from global value GUIDs to the corresponding function/variable/alias summary (or summaries for the combined index and weak linkage).

  3. CFI-related data structures (TypeIdMap, CfiFunctionDefs, and CfiFunctionDecls)


I have a WIP patch to AsmWriter.cpp to print the ModuleSummaryIndex that I was using to play with the format. It currently prints 1 and 2 above. I’ve left the CFI related summary data structures as a TODO for now, until the format is at least conceptually agreed, but from looking at those I don’t see an issue with using the same format (with a note/question for Peter on CFI type test representation below).


I modeled the proposed format on metadata, with a few key differences noted below. Like metadata, I propose enumerating the entries with the SlotTracker, and prefixing them with a special character. Avoiding characters already used in some fashion (i.e. “!” for metadata and “#” for attributes), I initially have chosen “^”. Open to suggestions though.


Consider the following example:


extern void foo();
int X;
int bar() {
 foo();
 return X;
}
void barAlias() __attribute__ ((alias ("bar")));
int main() {
 barAlias();
 return bar();
}

The proposed format has one entry per ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks like:


^0 = module: {path: testA.o, hash: 5487197307045666224}
^1 = gv: {guid: 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags: {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}
^2 = gv: {guid: 6699318081062747564, name: foo}
^3 = gv: {guid: 15822663052811949562, name: main, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}
^4 = gv: {guid: 16434608426314478903, name: bar, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs: {^1}}}}
^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries: {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live: 0, dsoLocal: 1}, aliasee: ^4}}}


Syntax seems pretty good to me!

Like metadata, the fields are tagged (currently using lower camel case, maybe upper camel case would be preferable).


The proposed format has a structure that reflects the data structures in the summary index. For example, consider the entry “^4”. This corresponds to the function “bar”. The entry for that GUID in the GlobalValueMap contains a list of summaries. For per-module summaries such as this, there will be at most one summary (with no summary list for an external function like “foo”). In the combined summary there may be multiple, e.g. in the case of linkonce_odr functions which have definitions in multiple modules. The summary list for bar (“^4”) contains a FunctionSummary, so the summary is tagged “function:”. The FunctionSummary contains both a flags structure (inherited from the base GlobalValueSummary class), and a funcFlags structure (specific to FunctionSummary). It therefore contains a brace-enclosed list of flag tags/values for each.


Where a global value summary references another global value summary (e.g. via a call list, reference list, or aliasee), the entry is referenced by its slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as “^4”.


Note that in comparison metadata assembly entries tend to be much more decomposed since many metadata fields are themselves metadata (so then entries tend to be shorter with references to other metadata nodes).

Currently, I am emitting the summary entries at the end, after the metadata nodes. Note that the ModuleSummaryIndex is not currently referenced from the Module, and isn’t currently created when parsing the Module IR bitcode (there is a separate derived class for reading the ModuleSummaryIndex from bitcode). This is because they are not currently used at the same time. However, in the future there is no reason why we couldn’t tag the global values in the Module’s LLVM assembly with the corresponding summary entry if the ModuleSummaryIndex is available when printing the Module in the assembly writer. I.e. we could do the following for “main” from the above example when printing the IR definition (note the “^3” at the end):


define  dso_local i32 @main() #0 !dbg !17 ^3 {


For CFI data structures, the format would be similar. It appears that TypeIds are referred to by string name in the top level TypeIdMap (std::map indexed by std::string type identifier), whereas they are referenced by GUID within the FunctionSummary class (i.e. the TypeTests vector and the VFuncId structure). For the LLVM assembly I think there should be a top level entry for each TypeIdMap, which lists both the type identifier string and its GUID (followed by its associated information stored in the map), and the TypeTests/VFuncId references on the FunctionSummary entries can reference it by summary slot number. I.e. something like:


^1 = typeid: {guid: 12345, identifier: name_of_type, …

^2 = gv: {... {function: {.... typeTests: {^1, …


Peter - is that correct and does that sound ok?


Issues when Parsing of Summaries from Assembly

--------------------------------------------------------------------


When reading an LLVM assembly file containing module summary entries, a ModuleSummaryIndex will be created from the entries.


Things to consider are the behavior when:

  • Invoked with “opt -module-summary” (which currently builds a new summary index from the IR). Options:

    1. recompute summary and throw away summary in the assembly file


What happens currently if you run `opt -module-summary` on a bitcode file that already contains a summary? I feel like the behavior should be the same when run on a textual IR file containing a summary, probably?
 
    1. ignore -module-summary and build the summary from the LLVM assembly

    2. give an error

    3. compare the two summaries (one created from the assembly and the new one created by the analysis phase from the IR), and error if they are different.

My opinion is to do a),  so that the behavior using -module-summary doesn’t change. We also need a way to force building of a fresh module summary for cases where the user has modified the LLVM assembly of the IR (see below).


  • How to handle older LLVM assembly files that don’t contain new summary fields. Options:


Same thoughts would apply here for "what do we do in the bitcode case" - with the option to not support old/difficult textual IR. If there are easy/obvious defaults, I'd say it's probably worth baking those in (& baking them in even for the existing fields we know about, to make it easier to write more terse test cases that don't have to verbosily/redundantly specify lots of default values?) to the parsing/loading logic?
 
    1. Force the LLVM assembly file to be recreated with a new summary. I.e. “opt -module-summary -o - | llvm-dis”.

    2. Auto-upgrade, by silently creating conservative values for the new summary entries.

I lean towards b) (when possible) for user-friendliness and to reduce required churn on test inputs.


  • How to handle partial or incorrect LLVM assembly summary entries. How to handle partial summaries depends in part on how we answer the prior question about auto-upgrading. I think the best option like there is to handle it automatically when possible. However, I do think we should error on glaring errors like obviously missing information. For example, when there is summary data in the LLVM assembly, but summary entries are missing for some global values. E.g. if the user modified the assembly to add a function but forgot to add a corresponding summary entry. We could still have subtle issues (e.g. user adds a new call but forgets to update the caller’s summary call list), but it will be harder to detect those.


I'd be OK with the summary being validated by the IR validator (same way other properties of IR are validated & even simple things like if you use the wrong IR type to refer to an IR value, you get a parse error, etc) - which, I realize, would make it feel like the textual summary was entirely redundant (except in cases of standalone summaries - which I imagine will be the common case in tests, because the summary processing should be tested in isolation (except for testing things like this validation logic itself, etc)).

- Dave
 


--
Teresa Johnson | Software Engineer | [hidden email] | <a href="tel:(408)%20460-2413" value="+14084602413" target="_blank">408-460-2413
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Joel E. Denny via llvm-dev
In reply to this post by Joel E. Denny via llvm-dev


On Mon, Apr 30, 2018 at 10:21 AM Peter Collingbourne <[hidden email]> wrote:
On Mon, Apr 30, 2018 at 8:32 AM, Teresa Johnson <[hidden email]> wrote:
Hi Peter,
Thanks for your comments, replies below.
Teresa

On Wed, Apr 25, 2018 at 2:08 PM Peter Collingbourne <[hidden email]> wrote:
Hi Teresa,

Thanks for sending this proposal out.

I would again like to register my disagreement with the whole idea of writing summaries in LLVM assembly format. In my view it is clear that this is not the right direction, as it only invites additional complexity and more ways for things to go wrong for no real benefit. However, I don't have the energy to argue that point any further, so I won't stand in the way here.

I assume you are most concerned with the re-assembly/deserialization of the summary. My main goal is to get this dumped into a text format, and I went this route since the last dumper RFC was blocked with the LLVM assembly direction pushed.

Yes, my main concern is with the deserialization. My view is that it should only be allowed for combined summaries -- allowing it for per-module is unnecessary as it creates the possibility of things gettting out of sync. Given that, we don't actually need an assembly representation and we can use whichever format is most convenient. But given the opposition to this viewpoint I am willing to concede getting the format (IMHO) right in favour of something that is acceptable to others, just so that we can test things in a more reasonable way.


On Tue, Apr 24, 2018 at 7:43 AM, Teresa Johnson <[hidden email]> wrote:
Hi everyone,

I started working on a long-standing request to have the summary dumped in a readable format to text, and specifically to emit to LLVM assembly. Proposal below, please let me know your thoughts.

Thanks,
Teresa

RFC: LLVM Assembly format for ThinLTO Summary

========================================


Background

-----------------


ThinLTO operates on small summaries computed during the compile step (i.e. with “-c -flto=thin”), which are then analyzed and updated during the Thin Link stage, and utilized to perform IR updates during the post-link ThinLTO backends. The summaries are emitted as LLVM Bitcode, however, not currently in the LLVM assembly.


There are two ways to generate a bitcode file containing summary records for a module:

  1. Compile with “clang -c -flto=thin”

  2. Build from LLVM assembly using “opt -module-summary”

Either of these will result in the ModuleSummaryIndex analysis pass (which builds the summary index in memory for a module) to be added to the pipeline just before bitcode emission.


Additionally, a combined index is created by merging all the per-module indexes during the Thin Link, which is optionally emitted as a bitcode file.


Currently, the only way to view these records is via “llvm-bcanalyzer -dump”, then manually decoding the raw bitcode dumps.


Relatedly, there is YAML reader/writer support for CFI related summary fields (-wholeprogramdevirt-read-summary and -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles Saternos implemented support to dump the summary in YAML from llvm-lto2 (D34080), including the rest of the summary fields (D34063), however, there was pushback on the related RFC for dumping via YAML or another format rather than emitting as LLVM assembly.


Goals:


  1. Define LLVM assembly format for summary index

  2. Define interaction between parsing of summary from LLVM assembly and synthesis of new summary index from IR.

  3. Implement printing and parsing of summary index LLVM assembly


Proposed LLVM Assembly Format

----------------------------------------------


There are several top level data structures within the ModuleSummaryIndex:

  1. ModulePathStringTable: Holds the paths to the modules summarized in the index (only one entry for per-module indexes and multiple in the combined index), along with their hashes (for incremental builds and global promotion).

  2. GlobalValueMap: A map from global value GUIDs to the corresponding function/variable/alias summary (or summaries for the combined index and weak linkage).

  3. CFI-related data structures (TypeIdMap, CfiFunctionDefs, and CfiFunctionDecls)


I have a WIP patch to AsmWriter.cpp to print the ModuleSummaryIndex that I was using to play with the format. It currently prints 1 and 2 above. I’ve left the CFI related summary data structures as a TODO for now, until the format is at least conceptually agreed, but from looking at those I don’t see an issue with using the same format (with a note/question for Peter on CFI type test representation below).


I modeled the proposed format on metadata, with a few key differences noted below. Like metadata, I propose enumerating the entries with the SlotTracker, and prefixing them with a special character. Avoiding characters already used in some fashion (i.e. “!” for metadata and “#” for attributes), I initially have chosen “^”. Open to suggestions though.


Consider the following example:


extern void foo();
int X;
int bar() {
 foo();
 return X;
}
void barAlias() __attribute__ ((alias ("bar")));
int main() {
 barAlias();
 return bar();
}

The proposed format has one entry per ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks like:


^0 = module: {path: testA.o, hash: 5487197307045666224}
^1 = gv: {guid: 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags: {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}
^2 = gv: {guid: 6699318081062747564, name: foo}
^3 = gv: {guid: 15822663052811949562, name: main, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}
^4 = gv: {guid: 16434608426314478903, name: bar, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs: {^1}}}}
^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries: {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live: 0, dsoLocal: 1}, aliasee: ^4}}}

Like metadata, the fields are tagged (currently using lower camel case, maybe upper camel case would be preferable).


The proposed format has a structure that reflects the data structures in the summary index. For example, consider the entry “^4”. This corresponds to the function “bar”. The entry for that GUID in the GlobalValueMap contains a list of summaries. For per-module summaries such as this, there will be at most one summary (with no summary list for an external function like “foo”). In the combined summary there may be multiple, e.g. in the case of linkonce_odr functions which have definitions in multiple modules. The summary list for bar (“^4”) contains a FunctionSummary, so the summary is tagged “function:”. The FunctionSummary contains both a flags structure (inherited from the base GlobalValueSummary class), and a funcFlags structure (specific to FunctionSummary). It therefore contains a brace-enclosed list of flag tags/values for each.


Where a global value summary references another global value summary (e.g. via a call list, reference list, or aliasee), the entry is referenced by its slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as “^4”.


Note that in comparison metadata assembly entries tend to be much more decomposed since many metadata fields are themselves metadata (so then entries tend to be shorter with references to other metadata nodes).

Currently, I am emitting the summary entries at the end, after the metadata nodes. Note that the ModuleSummaryIndex is not currently referenced from the Module, and isn’t currently created when parsing the Module IR bitcode (there is a separate derived class for reading the ModuleSummaryIndex from bitcode). This is because they are not currently used at the same time. However, in the future there is no reason why we couldn’t tag the global values in the Module’s LLVM assembly with the corresponding summary entry if the ModuleSummaryIndex is available when printing the Module in the assembly writer. I.e. we could do the following for “main” from the above example when printing the IR definition (note the “^3” at the end):


define  dso_local i32 @main() #0 !dbg !17 ^3 {


For CFI data structures, the format would be similar. It appears that TypeIds are referred to by string name in the top level TypeIdMap (std::map indexed by std::string type identifier), whereas they are referenced by GUID within the FunctionSummary class (i.e. the TypeTests vector and the VFuncId structure). For the LLVM assembly I think there should be a top level entry for each TypeIdMap, which lists both the type identifier string and its GUID (followed by its associated information stored in the map), and the TypeTests/VFuncId references on the FunctionSummary entries can reference it by summary slot number. I.e. something like:


^1 = typeid: {guid: 12345, identifier: name_of_type, …

^2 = gv: {... {function: {.... typeTests: {^1, …


Peter - is that correct and does that sound ok?


I don't think that would work because the purpose of the top-level TypeIdMap is to contain resolutions for each type identifier, and per-module summaries do not contain resolutions (only the combined summary does). What that means in practice is that we would not be able to recover and write out a type identifier name for per-module summaries as part of ^1 in your example (well, we could in principle, because the name is stored somewhere in the function's IR, but that could get complicated).

Ah ok. I guess the top-level map then is generated by the regular LTO portion of the link (since it presumably requires IR during the Thin Link to get into the combined summary)?

Yes, we fill in the map during the LowerTypeTests and WholeProgramDevirt passes in the regular LTO part of the link, e.g. here:


Probably the easiest thing to do is to keep the type identifiers as GUIDs in the function summaries and write out the mapping of type identifiers as a top-level entity.

To confirm, you mean during the compile step create a top-level entity that maps GUID -> identifier?

I mean that you could represent this with something like:

^typeids = {^1, ^2, ^3}
^1 = typeid: {identifier: typeid1, ...}
^2 = typeid: {identifier: typeid2, ...}
^3 = typeid: {identifier: typeid3, ...}

There's no need to store the GUIDs here because they can be computed from the type identifiers. The GUIDs would only be stored in the typeTests (etc.) fields in each function summary.

I suppose we don't need to store the GUIDs at the top level in the in-memory summary. But I think it would be good to emit the GUIDs in the typeid assembly entries because it makes the association in the assembly much more obvious. I.e. going back to my original example:

^1 = typeid: {guid: 12345, identifier: name_of_type, …
^2 = gv: {... {function: {.... typeTests: {^1, …

If we didn't include the GUID in the typeid entry, but rather just the identifier, and put the GUID in the typeTest list in the GV's entry, it wouldn't be obvious at all from the assembly listing which typeid goes with which typeTest. It's also less compact to include the GUID in each typeTests list. Or perhaps we are saying the same thing - I can't tell from your above example if the GUID is also emitted in the "typeid:" entries.

I'm not sure there is a need for the:
^typeids = {^1, ^2, ^3}
We can just build the typeids list on the fly as " = typeid: " entries are read in.

Teresa



Peter

Thanks,
Teresa
 

Peter


Issues when Parsing of Summaries from Assembly

--------------------------------------------------------------------


When reading an LLVM assembly file containing module summary entries, a ModuleSummaryIndex will be created from the entries.


Things to consider are the behavior when:

  • Invoked with “opt -module-summary” (which currently builds a new summary index from the IR). Options:

    1. recompute summary and throw away summary in the assembly file

    2. ignore -module-summary and build the summary from the LLVM assembly

    3. give an error

    4. compare the two summaries (one created from the assembly and the new one created by the analysis phase from the IR), and error if they are different.

My opinion is to do a),  so that the behavior using -module-summary doesn’t change. We also need a way to force building of a fresh module summary for cases where the user has modified the LLVM assembly of the IR (see below).


  • How to handle older LLVM assembly files that don’t contain new summary fields. Options:

    1. Force the LLVM assembly file to be recreated with a new summary. I.e. “opt -module-summary -o - | llvm-dis”.

    2. Auto-upgrade, by silently creating conservative values for the new summary entries.

I lean towards b) (when possible) for user-friendliness and to reduce required churn on test inputs.


  • How to handle partial or incorrect LLVM assembly summary entries. How to handle partial summaries depends in part on how we answer the prior question about auto-upgrading. I think the best option like there is to handle it automatically when possible. However, I do think we should error on glaring errors like obviously missing information. For example, when there is summary data in the LLVM assembly, but summary entries are missing for some global values. E.g. if the user modified the assembly to add a function but forgot to add a corresponding summary entry. We could still have subtle issues (e.g. user adds a new call but forgets to update the caller’s summary call list), but it will be harder to detect those.



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



--
-- 
Peter


--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



--
-- 
Peter


--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Joel E. Denny via llvm-dev
In reply to this post by Joel E. Denny via llvm-dev
Hi David,
Thanks for the comments, replies below.
Teresa

On Mon, Apr 30, 2018 at 11:52 AM David Blaikie <[hidden email]> wrote:
Hi Teresa,

Awesome to see - looking forward to it!

On Tue, Apr 24, 2018 at 7:44 AM Teresa Johnson via llvm-dev <[hidden email]> wrote:
Hi everyone,

I started working on a long-standing request to have the summary dumped in a readable format to text, and specifically to emit to LLVM assembly. Proposal below, please let me know your thoughts.

Thanks,
Teresa

RFC: LLVM Assembly format for ThinLTO Summary

========================================


Background

-----------------


ThinLTO operates on small summaries computed during the compile step (i.e. with “-c -flto=thin”), which are then analyzed and updated during the Thin Link stage, and utilized to perform IR updates during the post-link ThinLTO backends. The summaries are emitted as LLVM Bitcode, however, not currently in the LLVM assembly.


There are two ways to generate a bitcode file containing summary records for a module:

  1. Compile with “clang -c -flto=thin”


As an aside - I seem to recall that at least internally at Google some kind of summary-only bitcode files are used (so that the whole bitcode file (especially in builds with debug info) doesn't have to be shipped to the node doing the summary merging). How are those summary-only files produced? Is that upstream? Or done in a more low-level way (like an objcopy, llvm-* tool invocation done as a post-processing step, etc)?

This is done upstream, under a special clang option that can be given in addition to -flto=thin, so that the compile step emits both the full IR+summary (for the distributed backends) as well as a minimized bitcode file with summary (for the thin link). Note that the distributed backends don't actually need the summary with the IR (as it gets all the info it needs from the combined summary index written out by the thin link), so we could theoretically improve this to suppress the summary write for that first file under that option.

 
  1. Build from LLVM assembly using “opt -module-summary”

Either of these will result in the ModuleSummaryIndex analysis pass (which builds the summary index in memory for a module) to be added to the pipeline just before bitcode emission.


Additionally, a combined index is created by merging all the per-module indexes during the Thin Link, which is optionally emitted as a bitcode file.


Currently, the only way to view these records is via “llvm-bcanalyzer -dump”, then manually decoding the raw bitcode dumps.


Relatedly, there is YAML reader/writer support for CFI related summary fields (-wholeprogramdevirt-read-summary and -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles Saternos implemented support to dump the summary in YAML from llvm-lto2 (D34080), including the rest of the summary fields (D34063), however, there was pushback on the related RFC for dumping via YAML or another format rather than emitting as LLVM assembly.


Goals:


  1. Define LLVM assembly format for summary index

  2. Define interaction between parsing of summary from LLVM assembly and synthesis of new summary index from IR.

  3. Implement printing and parsing of summary index LLVM assembly


Proposed LLVM Assembly Format

----------------------------------------------


There are several top level data structures within the ModuleSummaryIndex:

  1. ModulePathStringTable: Holds the paths to the modules summarized in the index (only one entry for per-module indexes and multiple in the combined index), along with their hashes (for incremental builds and global promotion).

  2. GlobalValueMap: A map from global value GUIDs to the corresponding function/variable/alias summary (or summaries for the combined index and weak linkage).

  3. CFI-related data structures (TypeIdMap, CfiFunctionDefs, and CfiFunctionDecls)


I have a WIP patch to AsmWriter.cpp to print the ModuleSummaryIndex that I was using to play with the format. It currently prints 1 and 2 above. I’ve left the CFI related summary data structures as a TODO for now, until the format is at least conceptually agreed, but from looking at those I don’t see an issue with using the same format (with a note/question for Peter on CFI type test representation below).


I modeled the proposed format on metadata, with a few key differences noted below. Like metadata, I propose enumerating the entries with the SlotTracker, and prefixing them with a special character. Avoiding characters already used in some fashion (i.e. “!” for metadata and “#” for attributes), I initially have chosen “^”. Open to suggestions though.


Consider the following example:


extern void foo();
int X;
int bar() {
 foo();
 return X;
}
void barAlias() __attribute__ ((alias ("bar")));
int main() {
 barAlias();
 return bar();
}

The proposed format has one entry per ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks like:


^0 = module: {path: testA.o, hash: 5487197307045666224}
^1 = gv: {guid: 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags: {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}
^2 = gv: {guid: 6699318081062747564, name: foo}
^3 = gv: {guid: 15822663052811949562, name: main, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}
^4 = gv: {guid: 16434608426314478903, name: bar, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs: {^1}}}}
^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries: {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live: 0, dsoLocal: 1}, aliasee: ^4}}}


Syntax seems pretty good to me!

Great!
 

Like metadata, the fields are tagged (currently using lower camel case, maybe upper camel case would be preferable).


The proposed format has a structure that reflects the data structures in the summary index. For example, consider the entry “^4”. This corresponds to the function “bar”. The entry for that GUID in the GlobalValueMap contains a list of summaries. For per-module summaries such as this, there will be at most one summary (with no summary list for an external function like “foo”). In the combined summary there may be multiple, e.g. in the case of linkonce_odr functions which have definitions in multiple modules. The summary list for bar (“^4”) contains a FunctionSummary, so the summary is tagged “function:”. The FunctionSummary contains both a flags structure (inherited from the base GlobalValueSummary class), and a funcFlags structure (specific to FunctionSummary). It therefore contains a brace-enclosed list of flag tags/values for each.


Where a global value summary references another global value summary (e.g. via a call list, reference list, or aliasee), the entry is referenced by its slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as “^4”.


Note that in comparison metadata assembly entries tend to be much more decomposed since many metadata fields are themselves metadata (so then entries tend to be shorter with references to other metadata nodes).

Currently, I am emitting the summary entries at the end, after the metadata nodes. Note that the ModuleSummaryIndex is not currently referenced from the Module, and isn’t currently created when parsing the Module IR bitcode (there is a separate derived class for reading the ModuleSummaryIndex from bitcode). This is because they are not currently used at the same time. However, in the future there is no reason why we couldn’t tag the global values in the Module’s LLVM assembly with the corresponding summary entry if the ModuleSummaryIndex is available when printing the Module in the assembly writer. I.e. we could do the following for “main” from the above example when printing the IR definition (note the “^3” at the end):


define  dso_local i32 @main() #0 !dbg !17 ^3 {


For CFI data structures, the format would be similar. It appears that TypeIds are referred to by string name in the top level TypeIdMap (std::map indexed by std::string type identifier), whereas they are referenced by GUID within the FunctionSummary class (i.e. the TypeTests vector and the VFuncId structure). For the LLVM assembly I think there should be a top level entry for each TypeIdMap, which lists both the type identifier string and its GUID (followed by its associated information stored in the map), and the TypeTests/VFuncId references on the FunctionSummary entries can reference it by summary slot number. I.e. something like:


^1 = typeid: {guid: 12345, identifier: name_of_type, …

^2 = gv: {... {function: {.... typeTests: {^1, …


Peter - is that correct and does that sound ok?


Issues when Parsing of Summaries from Assembly

--------------------------------------------------------------------


When reading an LLVM assembly file containing module summary entries, a ModuleSummaryIndex will be created from the entries.


Things to consider are the behavior when:

  • Invoked with “opt -module-summary” (which currently builds a new summary index from the IR). Options:

    1. recompute summary and throw away summary in the assembly file


What happens currently if you run `opt -module-summary` on a bitcode file that already contains a summary? I feel like the behavior should be the same when run on a textual IR file containing a summary, probably?

We rebuild the summary. Note that this in part is due to the fact mentioned above that we have separate readers for the Module IR and the summary. The opt tool does not even read the summary if present. We currently only read the summary during the thin link (when building the combined index for analysis), and in the distributed backends where we read the combined summary index file emitted for that file by the distributed thin link.

 
    1. ignore -module-summary and build the summary from the LLVM assembly

    2. give an error

    3. compare the two summaries (one created from the assembly and the new one created by the analysis phase from the IR), and error if they are different.

My opinion is to do a),  so that the behavior using -module-summary doesn’t change. We also need a way to force building of a fresh module summary for cases where the user has modified the LLVM assembly of the IR (see below).


  • How to handle older LLVM assembly files that don’t contain new summary fields. Options:


Same thoughts would apply here for "what do we do in the bitcode case" - with the option to not support old/difficult textual IR. If there are easy/obvious defaults, I'd say it's probably worth baking those in (& baking them in even for the existing fields we know about, to make it easier to write more terse test cases that don't have to verbosily/redundantly specify lots of default values?) to the parsing/loading logic?

So we do emit an index version in the bitcode, and auto-upgrade in a conservative manner anything that wasn't emitted prior. We could presumably serialize out the version number and handle auto-upgrading from textual assembly the same way (as the version is bumped beyond the current version at least). If we want to allow omission of some fields for test simplicity, we could do a similar thing and apply conservative values where possible for omitted fields (e.g. the flags). That seems fine to me, in which case I don't think we need a version number. Although this has implications for the validator, see below.

 
    1. Force the LLVM assembly file to be recreated with a new summary. I.e. “opt -module-summary -o - | llvm-dis”.

    2. Auto-upgrade, by silently creating conservative values for the new summary entries.

I lean towards b) (when possible) for user-friendliness and to reduce required churn on test inputs.


  • How to handle partial or incorrect LLVM assembly summary entries. How to handle partial summaries depends in part on how we answer the prior question about auto-upgrading. I think the best option like there is to handle it automatically when possible. However, I do think we should error on glaring errors like obviously missing information. For example, when there is summary data in the LLVM assembly, but summary entries are missing for some global values. E.g. if the user modified the assembly to add a function but forgot to add a corresponding summary entry. We could still have subtle issues (e.g. user adds a new call but forgets to update the caller’s summary call list), but it will be harder to detect those.


I'd be OK with the summary being validated by the IR validator (same way other properties of IR are validated & even simple things like if you use the wrong IR type to refer to an IR value, you get a parse error, etc) - which, I realize, would make it feel like the textual summary was entirely redundant

It is redundant when the IR is also available, which relates to Peter and others' objections to serializing this back in. An issue with validation would be if we allowed omission of some fields and/or auto-upgrading as discussed above. The applied conservative values might very well not match the recomputed values. But as I mentioned here we may just want to validate for glaring errors like required info - i.e. I think we should require that every GV has an associated summary entry.

(except in cases of standalone summaries - which I imagine will be the common case in tests, because the summary processing should be tested in isolation (except for testing things like this validation logic itself, etc)).

Yes, I suspect the biggest usage in tests would be a standalone combined summary file that we can use to test the application of the thin link optimizations on a single IR file in the LTO backend pipeline. I.e the input to the test would be one module IR assembly file (no summary) and one combined index assembly file, it would run just the ThinLTO backend pipeline, and check the resulting IR via llvm-dis to ensure the optimization is applied effectively.



- Dave
 


--
Teresa Johnson | Software Engineer | [hidden email] | <a href="tel:(408)%20460-2413" value="+14084602413" target="_blank">408-460-2413
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Joel E. Denny via llvm-dev
In reply to this post by Joel E. Denny via llvm-dev


On Tue, May 1, 2018 at 9:00 AM, Teresa Johnson <[hidden email]> wrote:


On Mon, Apr 30, 2018 at 10:21 AM Peter Collingbourne <[hidden email]> wrote:
On Mon, Apr 30, 2018 at 8:32 AM, Teresa Johnson <[hidden email]> wrote:
Hi Peter,
Thanks for your comments, replies below.
Teresa

On Wed, Apr 25, 2018 at 2:08 PM Peter Collingbourne <[hidden email]> wrote:
Hi Teresa,

Thanks for sending this proposal out.

I would again like to register my disagreement with the whole idea of writing summaries in LLVM assembly format. In my view it is clear that this is not the right direction, as it only invites additional complexity and more ways for things to go wrong for no real benefit. However, I don't have the energy to argue that point any further, so I won't stand in the way here.

I assume you are most concerned with the re-assembly/deserialization of the summary. My main goal is to get this dumped into a text format, and I went this route since the last dumper RFC was blocked with the LLVM assembly direction pushed.

Yes, my main concern is with the deserialization. My view is that it should only be allowed for combined summaries -- allowing it for per-module is unnecessary as it creates the possibility of things gettting out of sync. Given that, we don't actually need an assembly representation and we can use whichever format is most convenient. But given the opposition to this viewpoint I am willing to concede getting the format (IMHO) right in favour of something that is acceptable to others, just so that we can test things in a more reasonable way.


On Tue, Apr 24, 2018 at 7:43 AM, Teresa Johnson <[hidden email]> wrote:
Hi everyone,

I started working on a long-standing request to have the summary dumped in a readable format to text, and specifically to emit to LLVM assembly. Proposal below, please let me know your thoughts.

Thanks,
Teresa

RFC: LLVM Assembly format for ThinLTO Summary

========================================


Background

-----------------


ThinLTO operates on small summaries computed during the compile step (i.e. with “-c -flto=thin”), which are then analyzed and updated during the Thin Link stage, and utilized to perform IR updates during the post-link ThinLTO backends. The summaries are emitted as LLVM Bitcode, however, not currently in the LLVM assembly.


There are two ways to generate a bitcode file containing summary records for a module:

  1. Compile with “clang -c -flto=thin”

  2. Build from LLVM assembly using “opt -module-summary”

Either of these will result in the ModuleSummaryIndex analysis pass (which builds the summary index in memory for a module) to be added to the pipeline just before bitcode emission.


Additionally, a combined index is created by merging all the per-module indexes during the Thin Link, which is optionally emitted as a bitcode file.


Currently, the only way to view these records is via “llvm-bcanalyzer -dump”, then manually decoding the raw bitcode dumps.


Relatedly, there is YAML reader/writer support for CFI related summary fields (-wholeprogramdevirt-read-summary and -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles Saternos implemented support to dump the summary in YAML from llvm-lto2 (D34080), including the rest of the summary fields (D34063), however, there was pushback on the related RFC for dumping via YAML or another format rather than emitting as LLVM assembly.


Goals:


  1. Define LLVM assembly format for summary index

  2. Define interaction between parsing of summary from LLVM assembly and synthesis of new summary index from IR.

  3. Implement printing and parsing of summary index LLVM assembly


Proposed LLVM Assembly Format

----------------------------------------------


There are several top level data structures within the ModuleSummaryIndex:

  1. ModulePathStringTable: Holds the paths to the modules summarized in the index (only one entry for per-module indexes and multiple in the combined index), along with their hashes (for incremental builds and global promotion).

  2. GlobalValueMap: A map from global value GUIDs to the corresponding function/variable/alias summary (or summaries for the combined index and weak linkage).

  3. CFI-related data structures (TypeIdMap, CfiFunctionDefs, and CfiFunctionDecls)


I have a WIP patch to AsmWriter.cpp to print the ModuleSummaryIndex that I was using to play with the format. It currently prints 1 and 2 above. I’ve left the CFI related summary data structures as a TODO for now, until the format is at least conceptually agreed, but from looking at those I don’t see an issue with using the same format (with a note/question for Peter on CFI type test representation below).


I modeled the proposed format on metadata, with a few key differences noted below. Like metadata, I propose enumerating the entries with the SlotTracker, and prefixing them with a special character. Avoiding characters already used in some fashion (i.e. “!” for metadata and “#” for attributes), I initially have chosen “^”. Open to suggestions though.


Consider the following example:


extern void foo();
int X;
int bar() {
 foo();
 return X;
}
void barAlias() __attribute__ ((alias ("bar")));
int main() {
 barAlias();
 return bar();
}

The proposed format has one entry per ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks like:


^0 = module: {path: testA.o, hash: 5487197307045666224}
^1 = gv: {guid: 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags: {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}
^2 = gv: {guid: 6699318081062747564, name: foo}
^3 = gv: {guid: 15822663052811949562, name: main, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}
^4 = gv: {guid: 16434608426314478903, name: bar, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs: {^1}}}}
^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries: {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live: 0, dsoLocal: 1}, aliasee: ^4}}}

Like metadata, the fields are tagged (currently using lower camel case, maybe upper camel case would be preferable).


The proposed format has a structure that reflects the data structures in the summary index. For example, consider the entry “^4”. This corresponds to the function “bar”. The entry for that GUID in the GlobalValueMap contains a list of summaries. For per-module summaries such as this, there will be at most one summary (with no summary list for an external function like “foo”). In the combined summary there may be multiple, e.g. in the case of linkonce_odr functions which have definitions in multiple modules. The summary list for bar (“^4”) contains a FunctionSummary, so the summary is tagged “function:”. The FunctionSummary contains both a flags structure (inherited from the base GlobalValueSummary class), and a funcFlags structure (specific to FunctionSummary). It therefore contains a brace-enclosed list of flag tags/values for each.


Where a global value summary references another global value summary (e.g. via a call list, reference list, or aliasee), the entry is referenced by its slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as “^4”.


Note that in comparison metadata assembly entries tend to be much more decomposed since many metadata fields are themselves metadata (so then entries tend to be shorter with references to other metadata nodes).

Currently, I am emitting the summary entries at the end, after the metadata nodes. Note that the ModuleSummaryIndex is not currently referenced from the Module, and isn’t currently created when parsing the Module IR bitcode (there is a separate derived class for reading the ModuleSummaryIndex from bitcode). This is because they are not currently used at the same time. However, in the future there is no reason why we couldn’t tag the global values in the Module’s LLVM assembly with the corresponding summary entry if the ModuleSummaryIndex is available when printing the Module in the assembly writer. I.e. we could do the following for “main” from the above example when printing the IR definition (note the “^3” at the end):


define  dso_local i32 @main() #0 !dbg !17 ^3 {


For CFI data structures, the format would be similar. It appears that TypeIds are referred to by string name in the top level TypeIdMap (std::map indexed by std::string type identifier), whereas they are referenced by GUID within the FunctionSummary class (i.e. the TypeTests vector and the VFuncId structure). For the LLVM assembly I think there should be a top level entry for each TypeIdMap, which lists both the type identifier string and its GUID (followed by its associated information stored in the map), and the TypeTests/VFuncId references on the FunctionSummary entries can reference it by summary slot number. I.e. something like:


^1 = typeid: {guid: 12345, identifier: name_of_type, …

^2 = gv: {... {function: {.... typeTests: {^1, …


Peter - is that correct and does that sound ok?


I don't think that would work because the purpose of the top-level TypeIdMap is to contain resolutions for each type identifier, and per-module summaries do not contain resolutions (only the combined summary does). What that means in practice is that we would not be able to recover and write out a type identifier name for per-module summaries as part of ^1 in your example (well, we could in principle, because the name is stored somewhere in the function's IR, but that could get complicated).

Ah ok. I guess the top-level map then is generated by the regular LTO portion of the link (since it presumably requires IR during the Thin Link to get into the combined summary)?

Yes, we fill in the map during the LowerTypeTests and WholeProgramDevirt passes in the regular LTO part of the link, e.g. here:


Probably the easiest thing to do is to keep the type identifiers as GUIDs in the function summaries and write out the mapping of type identifiers as a top-level entity.

To confirm, you mean during the compile step create a top-level entity that maps GUID -> identifier?

I mean that you could represent this with something like:

^typeids = {^1, ^2, ^3}
^1 = typeid: {identifier: typeid1, ...}
^2 = typeid: {identifier: typeid2, ...}
^3 = typeid: {identifier: typeid3, ...}

There's no need to store the GUIDs here because they can be computed from the type identifiers. The GUIDs would only be stored in the typeTests (etc.) fields in each function summary.

I suppose we don't need to store the GUIDs at the top level in the in-memory summary. But I think it would be good to emit the GUIDs in the typeid assembly entries because it makes the association in the assembly much more obvious. I.e. going back to my original example:

^1 = typeid: {guid: 12345, identifier: name_of_type, …
^2 = gv: {... {function: {.... typeTests: {^1, …

If we didn't include the GUID in the typeid entry, but rather just the identifier, and put the GUID in the typeTest list in the GV's entry, it wouldn't be obvious at all from the assembly listing which typeid goes with which typeTest. It's also less compact to include the GUID in each typeTests list.

I get that, but my point was that in a per-module summary the TypeIdMap is empty, so there will be no names, only GUIDs.

For "making the association more obvious" we might just want to have the assembly writer emit the GUID of a name as a comment.
 
Or perhaps we are saying the same thing - I can't tell from your above example if the GUID is also emitted in the "typeid:" entries.

No, it wouldn't be.

I'm not sure there is a need for the:
^typeids = {^1, ^2, ^3}
We can just build the typeids list on the fly as " = typeid: " entries are read in.

That's true. Given that nothing actually needs to refer to them, we can just represent the typeids as something like 
typeid: {identifier: typeid1, ...} ; guid = 123
typeid: {identifier: typeid2, ...} ; guid = 456
typeid: {identifier: typeid3, ...} ; guid = 789
without an associated number.

Peter

 


Teresa



Peter

Thanks,
Teresa
 

Peter


Issues when Parsing of Summaries from Assembly

--------------------------------------------------------------------


When reading an LLVM assembly file containing module summary entries, a ModuleSummaryIndex will be created from the entries.


Things to consider are the behavior when:

  • Invoked with “opt -module-summary” (which currently builds a new summary index from the IR). Options:

    1. recompute summary and throw away summary in the assembly file

    2. ignore -module-summary and build the summary from the LLVM assembly

    3. give an error

    4. compare the two summaries (one created from the assembly and the new one created by the analysis phase from the IR), and error if they are different.

My opinion is to do a),  so that the behavior using -module-summary doesn’t change. We also need a way to force building of a fresh module summary for cases where the user has modified the LLVM assembly of the IR (see below).


  • How to handle older LLVM assembly files that don’t contain new summary fields. Options:

    1. Force the LLVM assembly file to be recreated with a new summary. I.e. “opt -module-summary -o - | llvm-dis”.

    2. Auto-upgrade, by silently creating conservative values for the new summary entries.

I lean towards b) (when possible) for user-friendliness and to reduce required churn on test inputs.


  • How to handle partial or incorrect LLVM assembly summary entries. How to handle partial summaries depends in part on how we answer the prior question about auto-upgrading. I think the best option like there is to handle it automatically when possible. However, I do think we should error on glaring errors like obviously missing information. For example, when there is summary data in the LLVM assembly, but summary entries are missing for some global values. E.g. if the user modified the assembly to add a function but forgot to add a corresponding summary entry. We could still have subtle issues (e.g. user adds a new call but forgets to update the caller’s summary call list), but it will be harder to detect those.



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



--
-- 
Peter


--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



--
-- 
Peter


--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Joel E. Denny via llvm-dev
Hi,

My main concern is this one:

> Currently, I am emitting the summary entries at the end, after the metadata nodes. Note that the ModuleSummaryIndex is not currently referenced from the Module, and isn’t currently created when parsing the Module IR bitcode (there is a separate derived class for reading the ModuleSummaryIndex from bitcode). This is because they are not currently used at the same time. However, in the future there is no reason why we couldn’t tag the global values in the Module’s LLVM assembly with the corresponding summary entry if the ModuleSummaryIndex is available when printing the Module in the assembly writer. I.e. we could do the following for “main” from the above example when printing the IR definition (note the “^3” at the end):


I believe the reason that the ModuleSummaryIndex is not attached to the Module is that it is fundamentally not a piece of IR, but it is conceptually really an Analysis result.
Usually other analyses don't serialize their result, we happen to serialize this one for an optimization purpose (reloading it and making the thin-link faster).
The fundamental problem is that an analysis result has to be able to be invalidated with IR changes, attaching this directly to the module wouldn't achieve this. The risk is that when the IR and the summary get out-of-sync (`clang -O2 my_module_with_summaries.ll -emit-llvm -o my_optimized module_with_summaries.ll`) the summaries would be badly wrong.

Have you looked into what it'd take to make it a "real" analysis in the pass manager?

Cheers,

-- 
Mehdi




Le mar. 1 mai 2018 à 11:10, Peter Collingbourne <[hidden email]> a écrit :


On Tue, May 1, 2018 at 9:00 AM, Teresa Johnson <[hidden email]> wrote:


On Mon, Apr 30, 2018 at 10:21 AM Peter Collingbourne <[hidden email]> wrote:
On Mon, Apr 30, 2018 at 8:32 AM, Teresa Johnson <[hidden email]> wrote:
Hi Peter,
Thanks for your comments, replies below.
Teresa

On Wed, Apr 25, 2018 at 2:08 PM Peter Collingbourne <[hidden email]> wrote:
Hi Teresa,

Thanks for sending this proposal out.

I would again like to register my disagreement with the whole idea of writing summaries in LLVM assembly format. In my view it is clear that this is not the right direction, as it only invites additional complexity and more ways for things to go wrong for no real benefit. However, I don't have the energy to argue that point any further, so I won't stand in the way here.

I assume you are most concerned with the re-assembly/deserialization of the summary. My main goal is to get this dumped into a text format, and I went this route since the last dumper RFC was blocked with the LLVM assembly direction pushed.

Yes, my main concern is with the deserialization. My view is that it should only be allowed for combined summaries -- allowing it for per-module is unnecessary as it creates the possibility of things gettting out of sync. Given that, we don't actually need an assembly representation and we can use whichever format is most convenient. But given the opposition to this viewpoint I am willing to concede getting the format (IMHO) right in favour of something that is acceptable to others, just so that we can test things in a more reasonable way.


On Tue, Apr 24, 2018 at 7:43 AM, Teresa Johnson <[hidden email]> wrote:
Hi everyone,

I started working on a long-standing request to have the summary dumped in a readable format to text, and specifically to emit to LLVM assembly. Proposal below, please let me know your thoughts.

Thanks,
Teresa

RFC: LLVM Assembly format for ThinLTO Summary

========================================


Background

-----------------


ThinLTO operates on small summaries computed during the compile step (i.e. with “-c -flto=thin”), which are then analyzed and updated during the Thin Link stage, and utilized to perform IR updates during the post-link ThinLTO backends. The summaries are emitted as LLVM Bitcode, however, not currently in the LLVM assembly.


There are two ways to generate a bitcode file containing summary records for a module:

  1. Compile with “clang -c -flto=thin”

  2. Build from LLVM assembly using “opt -module-summary”

Either of these will result in the ModuleSummaryIndex analysis pass (which builds the summary index in memory for a module) to be added to the pipeline just before bitcode emission.


Additionally, a combined index is created by merging all the per-module indexes during the Thin Link, which is optionally emitted as a bitcode file.


Currently, the only way to view these records is via “llvm-bcanalyzer -dump”, then manually decoding the raw bitcode dumps.


Relatedly, there is YAML reader/writer support for CFI related summary fields (-wholeprogramdevirt-read-summary and -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles Saternos implemented support to dump the summary in YAML from llvm-lto2 (D34080), including the rest of the summary fields (D34063), however, there was pushback on the related RFC for dumping via YAML or another format rather than emitting as LLVM assembly.


Goals:


  1. Define LLVM assembly format for summary index

  2. Define interaction between parsing of summary from LLVM assembly and synthesis of new summary index from IR.

  3. Implement printing and parsing of summary index LLVM assembly


Proposed LLVM Assembly Format

----------------------------------------------


There are several top level data structures within the ModuleSummaryIndex:

  1. ModulePathStringTable: Holds the paths to the modules summarized in the index (only one entry for per-module indexes and multiple in the combined index), along with their hashes (for incremental builds and global promotion).

  2. GlobalValueMap: A map from global value GUIDs to the corresponding function/variable/alias summary (or summaries for the combined index and weak linkage).

  3. CFI-related data structures (TypeIdMap, CfiFunctionDefs, and CfiFunctionDecls)


I have a WIP patch to AsmWriter.cpp to print the ModuleSummaryIndex that I was using to play with the format. It currently prints 1 and 2 above. I’ve left the CFI related summary data structures as a TODO for now, until the format is at least conceptually agreed, but from looking at those I don’t see an issue with using the same format (with a note/question for Peter on CFI type test representation below).


I modeled the proposed format on metadata, with a few key differences noted below. Like metadata, I propose enumerating the entries with the SlotTracker, and prefixing them with a special character. Avoiding characters already used in some fashion (i.e. “!” for metadata and “#” for attributes), I initially have chosen “^”. Open to suggestions though.


Consider the following example:


extern void foo();
int X;
int bar() {
 foo();
 return X;
}
void barAlias() __attribute__ ((alias ("bar")));
int main() {
 barAlias();
 return bar();
}

The proposed format has one entry per ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks like:


^0 = module: {path: testA.o, hash: 5487197307045666224}
^1 = gv: {guid: 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags: {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}
^2 = gv: {guid: 6699318081062747564, name: foo}
^3 = gv: {guid: 15822663052811949562, name: main, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}
^4 = gv: {guid: 16434608426314478903, name: bar, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs: {^1}}}}
^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries: {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live: 0, dsoLocal: 1}, aliasee: ^4}}}

Like metadata, the fields are tagged (currently using lower camel case, maybe upper camel case would be preferable).


The proposed format has a structure that reflects the data structures in the summary index. For example, consider the entry “^4”. This corresponds to the function “bar”. The entry for that GUID in the GlobalValueMap contains a list of summaries. For per-module summaries such as this, there will be at most one summary (with no summary list for an external function like “foo”). In the combined summary there may be multiple, e.g. in the case of linkonce_odr functions which have definitions in multiple modules. The summary list for bar (“^4”) contains a FunctionSummary, so the summary is tagged “function:”. The FunctionSummary contains both a flags structure (inherited from the base GlobalValueSummary class), and a funcFlags structure (specific to FunctionSummary). It therefore contains a brace-enclosed list of flag tags/values for each.


Where a global value summary references another global value summary (e.g. via a call list, reference list, or aliasee), the entry is referenced by its slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as “^4”.


Note that in comparison metadata assembly entries tend to be much more decomposed since many metadata fields are themselves metadata (so then entries tend to be shorter with references to other metadata nodes).

Currently, I am emitting the summary entries at the end, after the metadata nodes. Note that the ModuleSummaryIndex is not currently referenced from the Module, and isn’t currently created when parsing the Module IR bitcode (there is a separate derived class for reading the ModuleSummaryIndex from bitcode). This is because they are not currently used at the same time. However, in the future there is no reason why we couldn’t tag the global values in the Module’s LLVM assembly with the corresponding summary entry if the ModuleSummaryIndex is available when printing the Module in the assembly writer. I.e. we could do the following for “main” from the above example when printing the IR definition (note the “^3” at the end):


define  dso_local i32 @main() #0 !dbg !17 ^3 {


For CFI data structures, the format would be similar. It appears that TypeIds are referred to by string name in the top level TypeIdMap (std::map indexed by std::string type identifier), whereas they are referenced by GUID within the FunctionSummary class (i.e. the TypeTests vector and the VFuncId structure). For the LLVM assembly I think there should be a top level entry for each TypeIdMap, which lists both the type identifier string and its GUID (followed by its associated information stored in the map), and the TypeTests/VFuncId references on the FunctionSummary entries can reference it by summary slot number. I.e. something like:


^1 = typeid: {guid: 12345, identifier: name_of_type, …

^2 = gv: {... {function: {.... typeTests: {^1, …


Peter - is that correct and does that sound ok?


I don't think that would work because the purpose of the top-level TypeIdMap is to contain resolutions for each type identifier, and per-module summaries do not contain resolutions (only the combined summary does). What that means in practice is that we would not be able to recover and write out a type identifier name for per-module summaries as part of ^1 in your example (well, we could in principle, because the name is stored somewhere in the function's IR, but that could get complicated).

Ah ok. I guess the top-level map then is generated by the regular LTO portion of the link (since it presumably requires IR during the Thin Link to get into the combined summary)?

Yes, we fill in the map during the LowerTypeTests and WholeProgramDevirt passes in the regular LTO part of the link, e.g. here:


Probably the easiest thing to do is to keep the type identifiers as GUIDs in the function summaries and write out the mapping of type identifiers as a top-level entity.

To confirm, you mean during the compile step create a top-level entity that maps GUID -> identifier?

I mean that you could represent this with something like:

^typeids = {^1, ^2, ^3}
^1 = typeid: {identifier: typeid1, ...}
^2 = typeid: {identifier: typeid2, ...}
^3 = typeid: {identifier: typeid3, ...}

There's no need to store the GUIDs here because they can be computed from the type identifiers. The GUIDs would only be stored in the typeTests (etc.) fields in each function summary.

I suppose we don't need to store the GUIDs at the top level in the in-memory summary. But I think it would be good to emit the GUIDs in the typeid assembly entries because it makes the association in the assembly much more obvious. I.e. going back to my original example:

^1 = typeid: {guid: 12345, identifier: name_of_type, …
^2 = gv: {... {function: {.... typeTests: {^1, …

If we didn't include the GUID in the typeid entry, but rather just the identifier, and put the GUID in the typeTest list in the GV's entry, it wouldn't be obvious at all from the assembly listing which typeid goes with which typeTest. It's also less compact to include the GUID in each typeTests list.

I get that, but my point was that in a per-module summary the TypeIdMap is empty, so there will be no names, only GUIDs.

For "making the association more obvious" we might just want to have the assembly writer emit the GUID of a name as a comment.
 
Or perhaps we are saying the same thing - I can't tell from your above example if the GUID is also emitted in the "typeid:" entries.

No, it wouldn't be.

I'm not sure there is a need for the:
^typeids = {^1, ^2, ^3}
We can just build the typeids list on the fly as " = typeid: " entries are read in.

That's true. Given that nothing actually needs to refer to them, we can just represent the typeids as something like 
typeid: {identifier: typeid1, ...} ; guid = 123
typeid: {identifier: typeid2, ...} ; guid = 456
typeid: {identifier: typeid3, ...} ; guid = 789
without an associated number.

Peter

 


Teresa



Peter

Thanks,
Teresa
 

Peter


Issues when Parsing of Summaries from Assembly

--------------------------------------------------------------------


When reading an LLVM assembly file containing module summary entries, a ModuleSummaryIndex will be created from the entries.


Things to consider are the behavior when:

  • Invoked with “opt -module-summary” (which currently builds a new summary index from the IR). Options:

    1. recompute summary and throw away summary in the assembly file

    2. ignore -module-summary and build the summary from the LLVM assembly

    3. give an error

    4. compare the two summaries (one created from the assembly and the new one created by the analysis phase from the IR), and error if they are different.

My opinion is to do a),  so that the behavior using -module-summary doesn’t change. We also need a way to force building of a fresh module summary for cases where the user has modified the LLVM assembly of the IR (see below).


  • How to handle older LLVM assembly files that don’t contain new summary fields. Options:

    1. Force the LLVM assembly file to be recreated with a new summary. I.e. “opt -module-summary -o - | llvm-dis”.

    2. Auto-upgrade, by silently creating conservative values for the new summary entries.

I lean towards b) (when possible) for user-friendliness and to reduce required churn on test inputs.


  • How to handle partial or incorrect LLVM assembly summary entries. How to handle partial summaries depends in part on how we answer the prior question about auto-upgrading. I think the best option like there is to handle it automatically when possible. However, I do think we should error on glaring errors like obviously missing information. For example, when there is summary data in the LLVM assembly, but summary entries are missing for some global values. E.g. if the user modified the assembly to add a function but forgot to add a corresponding summary entry. We could still have subtle issues (e.g. user adds a new call but forgets to update the caller’s summary call list), but it will be harder to detect those.



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



--
-- 
Peter


--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



--
-- 
Peter


--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Joel E. Denny via llvm-dev
Hi Mehdi, thanks for the comments, responses and a tweaked proposal below. Teresa

On Tue, May 1, 2018 at 11:37 AM, Mehdi AMINI <[hidden email]> wrote:
Hi,

My main concern is this one:

> Currently, I am emitting the summary entries at the end, after the metadata nodes. Note that the ModuleSummaryIndex is not currently referenced from the Module, and isn’t currently created when parsing the Module IR bitcode (there is a separate derived class for reading the ModuleSummaryIndex from bitcode). This is because they are not currently used at the same time. However, in the future there is no reason why we couldn’t tag the global values in the Module’s LLVM assembly with the corresponding summary entry if the ModuleSummaryIndex is available when printing the Module in the assembly writer. I.e. we could do the following for “main” from the above example when printing the IR definition (note the “^3” at the end):


I believe the reason that the ModuleSummaryIndex is not attached to the Module is that it is fundamentally not a piece of IR, but it is conceptually really an Analysis result.
Usually other analyses don't serialize their result, we happen to serialize this one for an optimization purpose (reloading it and making the thin-link faster).

True. My understanding is that the push for having it serialized via assembly is due to the fact that it is emitted into the bitcode. I know there is disagreement on this reasoning, I am hoping to have a proposal that is acceptable to everyone. =)

 
The fundamental problem is that an analysis result has to be able to be invalidated with IR changes, attaching this directly to the module wouldn't achieve this. The risk is that when the IR and the summary get out-of-sync (`clang -O2 my_module_with_summaries.ll -emit-llvm -o my_optimized module_with_summaries.ll`) the summaries would be badly wrong.

Have you looked into what it'd take to make it a "real" analysis in the pass manager?

Thanks for raising this issue specifically, I hadn't addressed it in my proposal and it is a big one. I am not proposing that we attempt to maintain the summary through optimization passes, and definitely don't think we should do that. IMO deserializing it should be for testing the thin link and the combined summaries in the backends only. To that end, I have an idea (below some background first).

Note that in some cases the module summary analysis is an analysis pass. I.e. when invoked by "opt -module-summary=". However, some time ago when Peter added the support for splitting the bitcode (for CFI purposes) and therefore needed to generate a summary in each partition (Thin and Regular), he added the ThinLTOBitcodeWriterPass, which invokes the module summary builder directly (twice). This writer is what gets invoked now when building via "clang -flto=thin", and with "opt -thinlto-bc". So there it is not invoked/maintained as an analysis pass/result. It would be tricky to figure out how to even split rather than recompute the module summary index in that case. Even in the case where we are still invoking as an analysis pass (opt -module-summary), we would need to figure out how to read in the module summary to use as the analysis result when available (so that it could be invalidated and recomputed when stale).

Rather than add this plumbing, and just have it discarded if opt does any optimization, I think we should focus at least for the time being on supporting reading the summary from assembly exactly where we currently read in the summary from bitcode:
1) For the thin link (e.g. tools such as llvm-lto2 or llvm-lto, which currently have to be preceded by "opt -module-summary/-thinlto-bc" to generate an index, but could just build it from assembly instead).
2) For the LTO backends (e.g. tools such as llvm-lto which can consume a combined index and invoke the backends, or "clang -fthinlto-index=" for distributed ThinLTO backend testing), where we could build the combined summary index from assembly instead.

This greatly simplifies the reading side, as there are no optimizations performed on the IR after the index is read in these cases that would require invalidation. It also simplifies adding the parsing support, since it gets invoked exactly where we expect to build an index currently (i.e. since we don't currently build or store the ModuleSummaryIndex when parsing the Module from bitcode). It doesn't preclude someone from figuring out how to compute the module summary analysis result from the assembly, and invalidating it after optimization, when reading the Module IR via 'opt' in the future.

Does this seem like a reasonable proposal to everyone?

Thanks,
Teresa



Cheers,

-- 
Mehdi




Le mar. 1 mai 2018 à 11:10, Peter Collingbourne <[hidden email]> a écrit :


On Tue, May 1, 2018 at 9:00 AM, Teresa Johnson <[hidden email]> wrote:


On Mon, Apr 30, 2018 at 10:21 AM Peter Collingbourne <[hidden email]> wrote:
On Mon, Apr 30, 2018 at 8:32 AM, Teresa Johnson <[hidden email]> wrote:
Hi Peter,
Thanks for your comments, replies below.
Teresa

On Wed, Apr 25, 2018 at 2:08 PM Peter Collingbourne <[hidden email]> wrote:
Hi Teresa,

Thanks for sending this proposal out.

I would again like to register my disagreement with the whole idea of writing summaries in LLVM assembly format. In my view it is clear that this is not the right direction, as it only invites additional complexity and more ways for things to go wrong for no real benefit. However, I don't have the energy to argue that point any further, so I won't stand in the way here.

I assume you are most concerned with the re-assembly/deserialization of the summary. My main goal is to get this dumped into a text format, and I went this route since the last dumper RFC was blocked with the LLVM assembly direction pushed.

Yes, my main concern is with the deserialization. My view is that it should only be allowed for combined summaries -- allowing it for per-module is unnecessary as it creates the possibility of things gettting out of sync. Given that, we don't actually need an assembly representation and we can use whichever format is most convenient. But given the opposition to this viewpoint I am willing to concede getting the format (IMHO) right in favour of something that is acceptable to others, just so that we can test things in a more reasonable way.


On Tue, Apr 24, 2018 at 7:43 AM, Teresa Johnson <[hidden email]> wrote:
Hi everyone,

I started working on a long-standing request to have the summary dumped in a readable format to text, and specifically to emit to LLVM assembly. Proposal below, please let me know your thoughts.

Thanks,
Teresa

RFC: LLVM Assembly format for ThinLTO Summary

========================================


Background

-----------------


ThinLTO operates on small summaries computed during the compile step (i.e. with “-c -flto=thin”), which are then analyzed and updated during the Thin Link stage, and utilized to perform IR updates during the post-link ThinLTO backends. The summaries are emitted as LLVM Bitcode, however, not currently in the LLVM assembly.


There are two ways to generate a bitcode file containing summary records for a module:

  1. Compile with “clang -c -flto=thin”

  2. Build from LLVM assembly using “opt -module-summary”

Either of these will result in the ModuleSummaryIndex analysis pass (which builds the summary index in memory for a module) to be added to the pipeline just before bitcode emission.


Additionally, a combined index is created by merging all the per-module indexes during the Thin Link, which is optionally emitted as a bitcode file.


Currently, the only way to view these records is via “llvm-bcanalyzer -dump”, then manually decoding the raw bitcode dumps.


Relatedly, there is YAML reader/writer support for CFI related summary fields (-wholeprogramdevirt-read-summary and -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles Saternos implemented support to dump the summary in YAML from llvm-lto2 (D34080), including the rest of the summary fields (D34063), however, there was pushback on the related RFC for dumping via YAML or another format rather than emitting as LLVM assembly.


Goals:


  1. Define LLVM assembly format for summary index

  2. Define interaction between parsing of summary from LLVM assembly and synthesis of new summary index from IR.

  3. Implement printing and parsing of summary index LLVM assembly


Proposed LLVM Assembly Format

----------------------------------------------


There are several top level data structures within the ModuleSummaryIndex:

  1. ModulePathStringTable: Holds the paths to the modules summarized in the index (only one entry for per-module indexes and multiple in the combined index), along with their hashes (for incremental builds and global promotion).

  2. GlobalValueMap: A map from global value GUIDs to the corresponding function/variable/alias summary (or summaries for the combined index and weak linkage).

  3. CFI-related data structures (TypeIdMap, CfiFunctionDefs, and CfiFunctionDecls)


I have a WIP patch to AsmWriter.cpp to print the ModuleSummaryIndex that I was using to play with the format. It currently prints 1 and 2 above. I’ve left the CFI related summary data structures as a TODO for now, until the format is at least conceptually agreed, but from looking at those I don’t see an issue with using the same format (with a note/question for Peter on CFI type test representation below).


I modeled the proposed format on metadata, with a few key differences noted below. Like metadata, I propose enumerating the entries with the SlotTracker, and prefixing them with a special character. Avoiding characters already used in some fashion (i.e. “!” for metadata and “#” for attributes), I initially have chosen “^”. Open to suggestions though.


Consider the following example:


extern void foo();
int X;
int bar() {
 foo();
 return X;
}
void barAlias() __attribute__ ((alias ("bar")));
int main() {
 barAlias();
 return bar();
}

The proposed format has one entry per ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks like:


^0 = module: {path: testA.o, hash: 5487197307045666224}
^1 = gv: {guid: 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags: {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}
^2 = gv: {guid: 6699318081062747564, name: foo}
^3 = gv: {guid: 15822663052811949562, name: main, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}
^4 = gv: {guid: 16434608426314478903, name: bar, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs: {^1}}}}
^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries: {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live: 0, dsoLocal: 1}, aliasee: ^4}}}

Like metadata, the fields are tagged (currently using lower camel case, maybe upper camel case would be preferable).


The proposed format has a structure that reflects the data structures in the summary index. For example, consider the entry “^4”. This corresponds to the function “bar”. The entry for that GUID in the GlobalValueMap contains a list of summaries. For per-module summaries such as this, there will be at most one summary (with no summary list for an external function like “foo”). In the combined summary there may be multiple, e.g. in the case of linkonce_odr functions which have definitions in multiple modules. The summary list for bar (“^4”) contains a FunctionSummary, so the summary is tagged “function:”. The FunctionSummary contains both a flags structure (inherited from the base GlobalValueSummary class), and a funcFlags structure (specific to FunctionSummary). It therefore contains a brace-enclosed list of flag tags/values for each.


Where a global value summary references another global value summary (e.g. via a call list, reference list, or aliasee), the entry is referenced by its slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as “^4”.


Note that in comparison metadata assembly entries tend to be much more decomposed since many metadata fields are themselves metadata (so then entries tend to be shorter with references to other metadata nodes).

Currently, I am emitting the summary entries at the end, after the metadata nodes. Note that the ModuleSummaryIndex is not currently referenced from the Module, and isn’t currently created when parsing the Module IR bitcode (there is a separate derived class for reading the ModuleSummaryIndex from bitcode). This is because they are not currently used at the same time. However, in the future there is no reason why we couldn’t tag the global values in the Module’s LLVM assembly with the corresponding summary entry if the ModuleSummaryIndex is available when printing the Module in the assembly writer. I.e. we could do the following for “main” from the above example when printing the IR definition (note the “^3” at the end):


define  dso_local i32 @main() #0 !dbg !17 ^3 {


For CFI data structures, the format would be similar. It appears that TypeIds are referred to by string name in the top level TypeIdMap (std::map indexed by std::string type identifier), whereas they are referenced by GUID within the FunctionSummary class (i.e. the TypeTests vector and the VFuncId structure). For the LLVM assembly I think there should be a top level entry for each TypeIdMap, which lists both the type identifier string and its GUID (followed by its associated information stored in the map), and the TypeTests/VFuncId references on the FunctionSummary entries can reference it by summary slot number. I.e. something like:


^1 = typeid: {guid: 12345, identifier: name_of_type, …

^2 = gv: {... {function: {.... typeTests: {^1, …


Peter - is that correct and does that sound ok?


I don't think that would work because the purpose of the top-level TypeIdMap is to contain resolutions for each type identifier, and per-module summaries do not contain resolutions (only the combined summary does). What that means in practice is that we would not be able to recover and write out a type identifier name for per-module summaries as part of ^1 in your example (well, we could in principle, because the name is stored somewhere in the function's IR, but that could get complicated).

Ah ok. I guess the top-level map then is generated by the regular LTO portion of the link (since it presumably requires IR during the Thin Link to get into the combined summary)?

Yes, we fill in the map during the LowerTypeTests and WholeProgramDevirt passes in the regular LTO part of the link, e.g. here:


Probably the easiest thing to do is to keep the type identifiers as GUIDs in the function summaries and write out the mapping of type identifiers as a top-level entity.

To confirm, you mean during the compile step create a top-level entity that maps GUID -> identifier?

I mean that you could represent this with something like:

^typeids = {^1, ^2, ^3}
^1 = typeid: {identifier: typeid1, ...}
^2 = typeid: {identifier: typeid2, ...}
^3 = typeid: {identifier: typeid3, ...}

There's no need to store the GUIDs here because they can be computed from the type identifiers. The GUIDs would only be stored in the typeTests (etc.) fields in each function summary.

I suppose we don't need to store the GUIDs at the top level in the in-memory summary. But I think it would be good to emit the GUIDs in the typeid assembly entries because it makes the association in the assembly much more obvious. I.e. going back to my original example:

^1 = typeid: {guid: 12345, identifier: name_of_type, …
^2 = gv: {... {function: {.... typeTests: {^1, …

If we didn't include the GUID in the typeid entry, but rather just the identifier, and put the GUID in the typeTest list in the GV's entry, it wouldn't be obvious at all from the assembly listing which typeid goes with which typeTest. It's also less compact to include the GUID in each typeTests list.

I get that, but my point was that in a per-module summary the TypeIdMap is empty, so there will be no names, only GUIDs.

For "making the association more obvious" we might just want to have the assembly writer emit the GUID of a name as a comment.
 
Or perhaps we are saying the same thing - I can't tell from your above example if the GUID is also emitted in the "typeid:" entries.

No, it wouldn't be.

I'm not sure there is a need for the:
^typeids = {^1, ^2, ^3}
We can just build the typeids list on the fly as " = typeid: " entries are read in.

That's true. Given that nothing actually needs to refer to them, we can just represent the typeids as something like 
typeid: {identifier: typeid1, ...} ; guid = 123
typeid: {identifier: typeid2, ...} ; guid = 456
typeid: {identifier: typeid3, ...} ; guid = 789
without an associated number.

Peter

 


Teresa



Peter

Thanks,
Teresa
 

Peter


Issues when Parsing of Summaries from Assembly

--------------------------------------------------------------------


When reading an LLVM assembly file containing module summary entries, a ModuleSummaryIndex will be created from the entries.


Things to consider are the behavior when:

  • Invoked with “opt -module-summary” (which currently builds a new summary index from the IR). Options:

    1. recompute summary and throw away summary in the assembly file

    2. ignore -module-summary and build the summary from the LLVM assembly

    3. give an error

    4. compare the two summaries (one created from the assembly and the new one created by the analysis phase from the IR), and error if they are different.

My opinion is to do a),  so that the behavior using -module-summary doesn’t change. We also need a way to force building of a fresh module summary for cases where the user has modified the LLVM assembly of the IR (see below).


  • How to handle older LLVM assembly files that don’t contain new summary fields. Options:

    1. Force the LLVM assembly file to be recreated with a new summary. I.e. “opt -module-summary -o - | llvm-dis”.

    2. Auto-upgrade, by silently creating conservative values for the new summary entries.

I lean towards b) (when possible) for user-friendliness and to reduce required churn on test inputs.


  • How to handle partial or incorrect LLVM assembly summary entries. How to handle partial summaries depends in part on how we answer the prior question about auto-upgrading. I think the best option like there is to handle it automatically when possible. However, I do think we should error on glaring errors like obviously missing information. For example, when there is summary data in the LLVM assembly, but summary entries are missing for some global values. E.g. if the user modified the assembly to add a function but forgot to add a corresponding summary entry. We could still have subtle issues (e.g. user adds a new call but forgets to update the caller’s summary call list), but it will be harder to detect those.



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



--
-- 
Peter


--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



--
-- 
Peter


--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



--
-- 
Peter



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Joel E. Denny via llvm-dev


On May 1, 2018, at 4:50 PM, Teresa Johnson via llvm-dev <[hidden email]> wrote:

Hi Mehdi, thanks for the comments, responses and a tweaked proposal below. Teresa

On Tue, May 1, 2018 at 11:37 AM, Mehdi AMINI <[hidden email]> wrote:
Hi,

My main concern is this one:

> Currently, I am emitting the summary entries at the end, after the metadata nodes. Note that the ModuleSummaryIndex is not currently referenced from the Module, and isn’t currently created when parsing the Module IR bitcode (there is a separate derived class for reading the ModuleSummaryIndex from bitcode). This is because they are not currently used at the same time. However, in the future there is no reason why we couldn’t tag the global values in the Module’s LLVM assembly with the corresponding summary entry if the ModuleSummaryIndex is available when printing the Module in the assembly writer. I.e. we could do the following for “main” from the above example when printing the IR definition (note the “^3” at the end):


I believe the reason that the ModuleSummaryIndex is not attached to the Module is that it is fundamentally not a piece of IR, but it is conceptually really an Analysis result.
Usually other analyses don't serialize their result, we happen to serialize this one for an optimization purpose (reloading it and making the thin-link faster).

True. My understanding is that the push for having it serialized via assembly is due to the fact that it is emitted into the bitcode. I know there is disagreement on this reasoning, I am hoping to have a proposal that is acceptable to everyone. =)

I totally agree with Mehdi’s concern. It is probably much easier to treat module summary as analysis result, rather than part of the module. Making it part of IR might be overkill just to fix the debugging and readability of the summary. 


 
The fundamental problem is that an analysis result has to be able to be invalidated with IR changes, attaching this directly to the module wouldn't achieve this. The risk is that when the IR and the summary get out-of-sync (`clang -O2 my_module_with_summaries.ll -emit-llvm -o my_optimized module_with_summaries.ll`) the summaries would be badly wrong.

Have you looked into what it'd take to make it a "real" analysis in the pass manager?

Thanks for raising this issue specifically, I hadn't addressed it in my proposal and it is a big one. I am not proposing that we attempt to maintain the summary through optimization passes, and definitely don't think we should do that. IMO deserializing it should be for testing the thin link and the combined summaries in the backends only. To that end, I have an idea (below some background first).

Note that in some cases the module summary analysis is an analysis pass. I.e. when invoked by "opt -module-summary=". However, some time ago when Peter added the support for splitting the bitcode (for CFI purposes) and therefore needed to generate a summary in each partition (Thin and Regular), he added the ThinLTOBitcodeWriterPass, which invokes the module summary builder directly (twice). This writer is what gets invoked now when building via "clang -flto=thin", and with "opt -thinlto-bc". So there it is not invoked/maintained as an analysis pass/result. It would be tricky to figure out how to even split rather than recompute the module summary index in that case. Even in the case where we are still invoking as an analysis pass (opt -module-summary), we would need to figure out how to read in the module summary to use as the analysis result when available (so that it could be invalidated and recomputed when stale).

Rather than add this plumbing, and just have it discarded if opt does any optimization, I think we should focus at least for the time being on supporting reading the summary from assembly exactly where we currently read in the summary from bitcode:
1) For the thin link (e.g. tools such as llvm-lto2 or llvm-lto, which currently have to be preceded by "opt -module-summary/-thinlto-bc" to generate an index, but could just build it from assembly instead).
2) For the LTO backends (e.g. tools such as llvm-lto which can consume a combined index and invoke the backends, or "clang -fthinlto-index=" for distributed ThinLTO backend testing), where we could build the combined summary index from assembly instead.

This greatly simplifies the reading side, as there are no optimizations performed on the IR after the index is read in these cases that would require invalidation. It also simplifies adding the parsing support, since it gets invoked exactly where we expect to build an index currently (i.e. since we don't currently build or store the ModuleSummaryIndex when parsing the Module from bitcode). It doesn't preclude someone from figuring out how to compute the module summary analysis result from the assembly, and invalidating it after optimization, when reading the Module IR via 'opt' in the future.

Does this seem like a reasonable proposal to everyone?

I would +1 for making this an analysis pass. How about a ModuleSummaryAnalysis that knows how to serialize its result into bitcode and YAML format? We can also add a module flag to indicate if the module should contain module summary or not.

* For thinLTO compilation, bitcode writer runs the analysis pass if the module flag is set, and emit module summary into bitcode.
* When reading bitcode with module summary, the summary is parsed into ModuleSummaryAnalysis result. 
* If there are any transform invalidates the analysis, module summary will be recomputed automatically when writing bitcode, otherwise, it will serialize out the same result as input (probably with some auto upgrade).
* For debugging, ModuleSummaryAnalysis can be serialized out to YAML
* For testing, ModuleSummaryAnalysis result can be created from YAML

I suggested YAML but it can be any other format, including LLVM assembly. For readability, I would prefer YAML.

Thanks

Steven


Thanks,
Teresa



Cheers,

-- 
Mehdi




Le mar. 1 mai 2018 à 11:10, Peter Collingbourne <[hidden email]> a écrit :


On Tue, May 1, 2018 at 9:00 AM, Teresa Johnson <[hidden email]> wrote:


On Mon, Apr 30, 2018 at 10:21 AM Peter Collingbourne <[hidden email]> wrote:
On Mon, Apr 30, 2018 at 8:32 AM, Teresa Johnson <[hidden email]> wrote:
Hi Peter,
Thanks for your comments, replies below.
Teresa

On Wed, Apr 25, 2018 at 2:08 PM Peter Collingbourne <[hidden email]> wrote:
Hi Teresa,

Thanks for sending this proposal out.

I would again like to register my disagreement with the whole idea of writing summaries in LLVM assembly format. In my view it is clear that this is not the right direction, as it only invites additional complexity and more ways for things to go wrong for no real benefit. However, I don't have the energy to argue that point any further, so I won't stand in the way here.

I assume you are most concerned with the re-assembly/deserialization of the summary. My main goal is to get this dumped into a text format, and I went this route since the last dumper RFC was blocked with the LLVM assembly direction pushed.

Yes, my main concern is with the deserialization. My view is that it should only be allowed for combined summaries -- allowing it for per-module is unnecessary as it creates the possibility of things gettting out of sync. Given that, we don't actually need an assembly representation and we can use whichever format is most convenient. But given the opposition to this viewpoint I am willing to concede getting the format (IMHO) right in favour of something that is acceptable to others, just so that we can test things in a more reasonable way.


On Tue, Apr 24, 2018 at 7:43 AM, Teresa Johnson <[hidden email]> wrote:
Hi everyone,

I started working on a long-standing request to have the summary dumped in a readable format to text, and specifically to emit to LLVM assembly. Proposal below, please let me know your thoughts.

Thanks,
Teresa

RFC: LLVM Assembly format for ThinLTO Summary
========================================

Background
-----------------

ThinLTO operates on small summaries computed during the compile step (i.e. with “-c -flto=thin”), which are then analyzed and updated during the Thin Link stage, and utilized to perform IR updates during the post-link ThinLTO backends. The summaries are emitted as LLVM Bitcode, however, not currently in the LLVM assembly.

There are two ways to generate a bitcode file containing summary records for a module:
  1. Compile with “clang -c -flto=thin”
  2. Build from LLVM assembly using “opt -module-summary”
Either of these will result in the ModuleSummaryIndex analysis pass (which builds the summary index in memory for a module) to be added to the pipeline just before bitcode emission.

Additionally, a combined index is created by merging all the per-module indexes during the Thin Link, which is optionally emitted as a bitcode file.

Currently, the only way to view these records is via “llvm-bcanalyzer -dump”, then manually decoding the raw bitcode dumps.

Relatedly, there is YAML reader/writer support for CFI related summary fields (-wholeprogramdevirt-read-summary and -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles Saternos implemented support to dump the summary in YAML from llvm-lto2 (D34080), including the rest of the summary fields (D34063), however, there was pushback on the related RFC for dumping via YAML or another format rather than emitting as LLVM assembly.

Goals:

  1. Define LLVM assembly format for summary index
  2. Define interaction between parsing of summary from LLVM assembly and synthesis of new summary index from IR.
  3. Implement printing and parsing of summary index LLVM assembly

Proposed LLVM Assembly Format
----------------------------------------------

There are several top level data structures within the ModuleSummaryIndex:
  1. ModulePathStringTable: Holds the paths to the modules summarized in the index (only one entry for per-module indexes and multiple in the combined index), along with their hashes (for incremental builds and global promotion).
  2. GlobalValueMap: A map from global value GUIDs to the corresponding function/variable/alias summary (or summaries for the combined index and weak linkage).
  3. CFI-related data structures (TypeIdMap, CfiFunctionDefs, and CfiFunctionDecls)

I have a WIP patch to AsmWriter.cpp to print the ModuleSummaryIndex that I was using to play with the format. It currently prints 1 and 2 above. I’ve left the CFI related summary data structures as a TODO for now, until the format is at least conceptually agreed, but from looking at those I don’t see an issue with using the same format (with a note/question for Peter on CFI type test representation below).

I modeled the proposed format on metadata, with a few key differences noted below. Like metadata, I propose enumerating the entries with the SlotTracker, and prefixing them with a special character. Avoiding characters already used in some fashion (i.e. “!” for metadata and “#” for attributes), I initially have chosen “^”. Open to suggestions though.

Consider the following example:

extern void foo();
int X;
int bar() {
 foo();
 return X;
}
void barAlias() __attribute__ ((alias ("bar")));
int main() {
 barAlias();
 return bar();
}

The proposed format has one entry per ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks like:

^0 = module: {path: testA.o, hash: 5487197307045666224}
^1 = gv: {guid: 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags: {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}
^2 = gv: {guid: 6699318081062747564, name: foo}
^3 = gv: {guid: 15822663052811949562, name: main, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}
^4 = gv: {guid: 16434608426314478903, name: bar, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs: {^1}}}}
^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries: {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live: 0, dsoLocal: 1}, aliasee: ^4}}}

Like metadata, the fields are tagged (currently using lower camel case, maybe upper camel case would be preferable).

The proposed format has a structure that reflects the data structures in the summary index. For example, consider the entry “^4”. This corresponds to the function “bar”. The entry for that GUID in the GlobalValueMap contains a list of summaries. For per-module summaries such as this, there will be at most one summary (with no summary list for an external function like “foo”). In the combined summary there may be multiple, e.g. in the case of linkonce_odr functions which have definitions in multiple modules. The summary list for bar (“^4”) contains a FunctionSummary, so the summary is tagged “function:”. The FunctionSummary contains both a flags structure (inherited from the base GlobalValueSummary class), and a funcFlags structure (specific to FunctionSummary). It therefore contains a brace-enclosed list of flag tags/values for each.

Where a global value summary references another global value summary (e.g. via a call list, reference list, or aliasee), the entry is referenced by its slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as “^4”.

Note that in comparison metadata assembly entries tend to be much more decomposed since many metadata fields are themselves metadata (so then entries tend to be shorter with references to other metadata nodes).

Currently, I am emitting the summary entries at the end, after the metadata nodes. Note that the ModuleSummaryIndex is not currently referenced from the Module, and isn’t currently created when parsing the Module IR bitcode (there is a separate derived class for reading the ModuleSummaryIndex from bitcode). This is because they are not currently used at the same time. However, in the future there is no reason why we couldn’t tag the global values in the Module’s LLVM assembly with the corresponding summary entry if the ModuleSummaryIndex is available when printing the Module in the assembly writer. I.e. we could do the following for “main” from the above example when printing the IR definition (note the “^3” at the end):

define  dso_local i32 @main() #0 !dbg !17 ^3 {

For CFI data structures, the format would be similar. It appears that TypeIds are referred to by string name in the top level TypeIdMap (std::map indexed by std::string type identifier), whereas they are referenced by GUID within the FunctionSummary class (i.e. the TypeTests vector and the VFuncId structure). For the LLVM assembly I think there should be a top level entry for each TypeIdMap, which lists both the type identifier string and its GUID (followed by its associated information stored in the map), and the TypeTests/VFuncId references on the FunctionSummary entries can reference it by summary slot number. I.e. something like:

^1 = typeid: {guid: 12345, identifier: name_of_type, …
^2 = gv: {... {function: {.... typeTests: {^1, …

Peter - is that correct and does that sound ok?

I don't think that would work because the purpose of the top-level TypeIdMap is to contain resolutions for each type identifier, and per-module summaries do not contain resolutions (only the combined summary does). What that means in practice is that we would not be able to recover and write out a type identifier name for per-module summaries as part of ^1 in your example (well, we could in principle, because the name is stored somewhere in the function's IR, but that could get complicated).

Ah ok. I guess the top-level map then is generated by the regular LTO portion of the link (since it presumably requires IR during the Thin Link to get into the combined summary)?

Yes, we fill in the map during the LowerTypeTests and WholeProgramDevirt passes in the regular LTO part of the link, e.g. here:


Probably the easiest thing to do is to keep the type identifiers as GUIDs in the function summaries and write out the mapping of type identifiers as a top-level entity.

To confirm, you mean during the compile step create a top-level entity that maps GUID -> identifier?

I mean that you could represent this with something like:

^typeids = {^1, ^2, ^3}
^1 = typeid: {identifier: typeid1, ...}
^2 = typeid: {identifier: typeid2, ...}
^3 = typeid: {identifier: typeid3, ...}

There's no need to store the GUIDs here because they can be computed from the type identifiers. The GUIDs would only be stored in the typeTests (etc.) fields in each function summary.

I suppose we don't need to store the GUIDs at the top level in the in-memory summary. But I think it would be good to emit the GUIDs in the typeid assembly entries because it makes the association in the assembly much more obvious. I.e. going back to my original example:

^1 = typeid: {guid: 12345, identifier: name_of_type, …
^2 = gv: {... {function: {.... typeTests: {^1, …

If we didn't include the GUID in the typeid entry, but rather just the identifier, and put the GUID in the typeTest list in the GV's entry, it wouldn't be obvious at all from the assembly listing which typeid goes with which typeTest. It's also less compact to include the GUID in each typeTests list.

I get that, but my point was that in a per-module summary the TypeIdMap is empty, so there will be no names, only GUIDs.

For "making the association more obvious" we might just want to have the assembly writer emit the GUID of a name as a comment.
 
Or perhaps we are saying the same thing - I can't tell from your above example if the GUID is also emitted in the "typeid:" entries.

No, it wouldn't be.

I'm not sure there is a need for the:
^typeids = {^1, ^2, ^3}
We can just build the typeids list on the fly as " = typeid: " entries are read in.

That's true. Given that nothing actually needs to refer to them, we can just represent the typeids as something like 
typeid: {identifier: typeid1, ...} ; guid = 123
typeid: {identifier: typeid2, ...} ; guid = 456
typeid: {identifier: typeid3, ...} ; guid = 789
without an associated number.

Peter

 


Teresa


Peter

Thanks,
Teresa
 

Peter


Issues when Parsing of Summaries from Assembly
--------------------------------------------------------------------

When reading an LLVM assembly file containing module summary entries, a ModuleSummaryIndex will be created from the entries.

Things to consider are the behavior when:
  • Invoked with “opt -module-summary” (which currently builds a new summary index from the IR). Options:
    1. recompute summary and throw away summary in the assembly file
    2. ignore -module-summary and build the summary from the LLVM assembly
    3. give an error
    4. compare the two summaries (one created from the assembly and the new one created by the analysis phase from the IR), and error if they are different.
My opinion is to do a),  so that the behavior using -module-summary doesn’t change. We also need a way to force building of a fresh module summary for cases where the user has modified the LLVM assembly of the IR (see below).

  • How to handle older LLVM assembly files that don’t contain new summary fields. Options:
    1. Force the LLVM assembly file to be recreated with a new summary. I.e. “opt -module-summary -o - | llvm-dis”.
    2. Auto-upgrade, by silently creating conservative values for the new summary entries.
I lean towards b) (when possible) for user-friendliness and to reduce required churn on test inputs.

  • How to handle partial or incorrect LLVM assembly summary entries. How to handle partial summaries depends in part on how we answer the prior question about auto-upgrading. I think the best option like there is to handle it automatically when possible. However, I do think we should error on glaring errors like obviously missing information. For example, when there is summary data in the LLVM assembly, but summary entries are missing for some global values. E.g. if the user modified the assembly to add a function but forgot to add a corresponding summary entry. We could still have subtle issues (e.g. user adds a new call but forgets to update the caller’s summary call list), but it will be harder to detect those.


-- 
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



-- 
-- 
Peter


-- 
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



-- 
-- 
Peter


-- 
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



-- 
-- 
Peter



-- 
Teresa Johnson | Software Engineer | [hidden email] | <a href="tel:408-460-2413" class="">408-460-2413
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Joel E. Denny via llvm-dev
In reply to this post by Joel E. Denny via llvm-dev


Le mar. 1 mai 2018 à 16:50, Teresa Johnson <[hidden email]> a écrit :
Hi Mehdi, thanks for the comments, responses and a tweaked proposal below. Teresa

On Tue, May 1, 2018 at 11:37 AM, Mehdi AMINI <[hidden email]> wrote:
Hi,

My main concern is this one:

> Currently, I am emitting the summary entries at the end, after the metadata nodes. Note that the ModuleSummaryIndex is not currently referenced from the Module, and isn’t currently created when parsing the Module IR bitcode (there is a separate derived class for reading the ModuleSummaryIndex from bitcode). This is because they are not currently used at the same time. However, in the future there is no reason why we couldn’t tag the global values in the Module’s LLVM assembly with the corresponding summary entry if the ModuleSummaryIndex is available when printing the Module in the assembly writer. I.e. we could do the following for “main” from the above example when printing the IR definition (note the “^3” at the end):


I believe the reason that the ModuleSummaryIndex is not attached to the Module is that it is fundamentally not a piece of IR, but it is conceptually really an Analysis result.
Usually other analyses don't serialize their result, we happen to serialize this one for an optimization purpose (reloading it and making the thin-link faster).

True. My understanding is that the push for having it serialized via assembly is due to the fact that it is emitted into the bitcode. I know there is disagreement on this reasoning, I am hoping to have a proposal that is acceptable to everyone. =)

 
The fundamental problem is that an analysis result has to be able to be invalidated with IR changes, attaching this directly to the module wouldn't achieve this. The risk is that when the IR and the summary get out-of-sync (`clang -O2 my_module_with_summaries.ll -emit-llvm -o my_optimized module_with_summaries.ll`) the summaries would be badly wrong.

Have you looked into what it'd take to make it a "real" analysis in the pass manager?

Thanks for raising this issue specifically, I hadn't addressed it in my proposal and it is a big one. I am not proposing that we attempt to maintain the summary through optimization passes, and definitely don't think we should do that. IMO deserializing it should be for testing the thin link and the combined summaries in the backends only. To that end, I have an idea (below some background first).

Note that in some cases the module summary analysis is an analysis pass. I.e. when invoked by "opt -module-summary=". However, some time ago when Peter added the support for splitting the bitcode (for CFI purposes) and therefore needed to generate a summary in each partition (Thin and Regular), he added the ThinLTOBitcodeWriterPass, which invokes the module summary builder directly (twice). This writer is what gets invoked now when building via "clang -flto=thin", and with "opt -thinlto-bc". So there it is not invoked/maintained as an analysis pass/result. It would be tricky to figure out how to even split rather than recompute the module summary index in that case. Even in the case where we are still invoking as an analysis pass (opt -module-summary), we would need to figure out how to read in the module summary to use as the analysis result when available (so that it could be invalidated and recomputed when stale).

Rather than add this plumbing, and just have it discarded if opt does any optimization, I think we should focus at least for the time being on supporting reading the summary from assembly exactly where we currently read in the summary from bitcode:
1) For the thin link (e.g. tools such as llvm-lto2 or llvm-lto, which currently have to be preceded by "opt -module-summary/-thinlto-bc" to generate an index, but could just build it from assembly instead).
2) For the LTO backends (e.g. tools such as llvm-lto which can consume a combined index and invoke the backends, or "clang -fthinlto-index=" for distributed ThinLTO backend testing), where we could build the combined summary index from assembly instead.

This greatly simplifies the reading side, as there are no optimizations performed on the IR after the index is read in these cases that would require invalidation. It also simplifies adding the parsing support, since it gets invoked exactly where we expect to build an index currently (i.e. since we don't currently build or store the ModuleSummaryIndex when parsing the Module from bitcode). It doesn't preclude someone from figuring out how to compute the module summary analysis result from the assembly, and invalidating it after optimization, when reading the Module IR via 'opt' in the future.

Does this seem like a reasonable proposal to everyone?

Sounds good to me.

That would make .ll files quite convenient during debugging I think? We could disassemble, manually change summaries, and re-assemble a bitcode file before running the (thin-)link again.

Thanks,

-- 
Mehdi



 

Thanks,
Teresa



Cheers,

-- 
Mehdi




Le mar. 1 mai 2018 à 11:10, Peter Collingbourne <[hidden email]> a écrit :


On Tue, May 1, 2018 at 9:00 AM, Teresa Johnson <[hidden email]> wrote:


On Mon, Apr 30, 2018 at 10:21 AM Peter Collingbourne <[hidden email]> wrote:
On Mon, Apr 30, 2018 at 8:32 AM, Teresa Johnson <[hidden email]> wrote:
Hi Peter,
Thanks for your comments, replies below.
Teresa

On Wed, Apr 25, 2018 at 2:08 PM Peter Collingbourne <[hidden email]> wrote:
Hi Teresa,

Thanks for sending this proposal out.

I would again like to register my disagreement with the whole idea of writing summaries in LLVM assembly format. In my view it is clear that this is not the right direction, as it only invites additional complexity and more ways for things to go wrong for no real benefit. However, I don't have the energy to argue that point any further, so I won't stand in the way here.

I assume you are most concerned with the re-assembly/deserialization of the summary. My main goal is to get this dumped into a text format, and I went this route since the last dumper RFC was blocked with the LLVM assembly direction pushed.

Yes, my main concern is with the deserialization. My view is that it should only be allowed for combined summaries -- allowing it for per-module is unnecessary as it creates the possibility of things gettting out of sync. Given that, we don't actually need an assembly representation and we can use whichever format is most convenient. But given the opposition to this viewpoint I am willing to concede getting the format (IMHO) right in favour of something that is acceptable to others, just so that we can test things in a more reasonable way.


On Tue, Apr 24, 2018 at 7:43 AM, Teresa Johnson <[hidden email]> wrote:
Hi everyone,

I started working on a long-standing request to have the summary dumped in a readable format to text, and specifically to emit to LLVM assembly. Proposal below, please let me know your thoughts.

Thanks,
Teresa

RFC: LLVM Assembly format for ThinLTO Summary

========================================


Background

-----------------


ThinLTO operates on small summaries computed during the compile step (i.e. with “-c -flto=thin”), which are then analyzed and updated during the Thin Link stage, and utilized to perform IR updates during the post-link ThinLTO backends. The summaries are emitted as LLVM Bitcode, however, not currently in the LLVM assembly.


There are two ways to generate a bitcode file containing summary records for a module:

  1. Compile with “clang -c -flto=thin”

  2. Build from LLVM assembly using “opt -module-summary”

Either of these will result in the ModuleSummaryIndex analysis pass (which builds the summary index in memory for a module) to be added to the pipeline just before bitcode emission.


Additionally, a combined index is created by merging all the per-module indexes during the Thin Link, which is optionally emitted as a bitcode file.


Currently, the only way to view these records is via “llvm-bcanalyzer -dump”, then manually decoding the raw bitcode dumps.


Relatedly, there is YAML reader/writer support for CFI related summary fields (-wholeprogramdevirt-read-summary and -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles Saternos implemented support to dump the summary in YAML from llvm-lto2 (D34080), including the rest of the summary fields (D34063), however, there was pushback on the related RFC for dumping via YAML or another format rather than emitting as LLVM assembly.


Goals:


  1. Define LLVM assembly format for summary index

  2. Define interaction between parsing of summary from LLVM assembly and synthesis of new summary index from IR.

  3. Implement printing and parsing of summary index LLVM assembly


Proposed LLVM Assembly Format

----------------------------------------------


There are several top level data structures within the ModuleSummaryIndex:

  1. ModulePathStringTable: Holds the paths to the modules summarized in the index (only one entry for per-module indexes and multiple in the combined index), along with their hashes (for incremental builds and global promotion).

  2. GlobalValueMap: A map from global value GUIDs to the corresponding function/variable/alias summary (or summaries for the combined index and weak linkage).

  3. CFI-related data structures (TypeIdMap, CfiFunctionDefs, and CfiFunctionDecls)


I have a WIP patch to AsmWriter.cpp to print the ModuleSummaryIndex that I was using to play with the format. It currently prints 1 and 2 above. I’ve left the CFI related summary data structures as a TODO for now, until the format is at least conceptually agreed, but from looking at those I don’t see an issue with using the same format (with a note/question for Peter on CFI type test representation below).


I modeled the proposed format on metadata, with a few key differences noted below. Like metadata, I propose enumerating the entries with the SlotTracker, and prefixing them with a special character. Avoiding characters already used in some fashion (i.e. “!” for metadata and “#” for attributes), I initially have chosen “^”. Open to suggestions though.


Consider the following example:


extern void foo();
int X;
int bar() {
 foo();
 return X;
}
void barAlias() __attribute__ ((alias ("bar")));
int main() {
 barAlias();
 return bar();
}

The proposed format has one entry per ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks like:


^0 = module: {path: testA.o, hash: 5487197307045666224}
^1 = gv: {guid: 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags: {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}
^2 = gv: {guid: 6699318081062747564, name: foo}
^3 = gv: {guid: 15822663052811949562, name: main, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}
^4 = gv: {guid: 16434608426314478903, name: bar, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs: {^1}}}}
^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries: {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live: 0, dsoLocal: 1}, aliasee: ^4}}}

Like metadata, the fields are tagged (currently using lower camel case, maybe upper camel case would be preferable).


The proposed format has a structure that reflects the data structures in the summary index. For example, consider the entry “^4”. This corresponds to the function “bar”. The entry for that GUID in the GlobalValueMap contains a list of summaries. For per-module summaries such as this, there will be at most one summary (with no summary list for an external function like “foo”). In the combined summary there may be multiple, e.g. in the case of linkonce_odr functions which have definitions in multiple modules. The summary list for bar (“^4”) contains a FunctionSummary, so the summary is tagged “function:”. The FunctionSummary contains both a flags structure (inherited from the base GlobalValueSummary class), and a funcFlags structure (specific to FunctionSummary). It therefore contains a brace-enclosed list of flag tags/values for each.


Where a global value summary references another global value summary (e.g. via a call list, reference list, or aliasee), the entry is referenced by its slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as “^4”.


Note that in comparison metadata assembly entries tend to be much more decomposed since many metadata fields are themselves metadata (so then entries tend to be shorter with references to other metadata nodes).

Currently, I am emitting the summary entries at the end, after the metadata nodes. Note that the ModuleSummaryIndex is not currently referenced from the Module, and isn’t currently created when parsing the Module IR bitcode (there is a separate derived class for reading the ModuleSummaryIndex from bitcode). This is because they are not currently used at the same time. However, in the future there is no reason why we couldn’t tag the global values in the Module’s LLVM assembly with the corresponding summary entry if the ModuleSummaryIndex is available when printing the Module in the assembly writer. I.e. we could do the following for “main” from the above example when printing the IR definition (note the “^3” at the end):


define  dso_local i32 @main() #0 !dbg !17 ^3 {


For CFI data structures, the format would be similar. It appears that TypeIds are referred to by string name in the top level TypeIdMap (std::map indexed by std::string type identifier), whereas they are referenced by GUID within the FunctionSummary class (i.e. the TypeTests vector and the VFuncId structure). For the LLVM assembly I think there should be a top level entry for each TypeIdMap, which lists both the type identifier string and its GUID (followed by its associated information stored in the map), and the TypeTests/VFuncId references on the FunctionSummary entries can reference it by summary slot number. I.e. something like:


^1 = typeid: {guid: 12345, identifier: name_of_type, …

^2 = gv: {... {function: {.... typeTests: {^1, …


Peter - is that correct and does that sound ok?


I don't think that would work because the purpose of the top-level TypeIdMap is to contain resolutions for each type identifier, and per-module summaries do not contain resolutions (only the combined summary does). What that means in practice is that we would not be able to recover and write out a type identifier name for per-module summaries as part of ^1 in your example (well, we could in principle, because the name is stored somewhere in the function's IR, but that could get complicated).

Ah ok. I guess the top-level map then is generated by the regular LTO portion of the link (since it presumably requires IR during the Thin Link to get into the combined summary)?

Yes, we fill in the map during the LowerTypeTests and WholeProgramDevirt passes in the regular LTO part of the link, e.g. here:


Probably the easiest thing to do is to keep the type identifiers as GUIDs in the function summaries and write out the mapping of type identifiers as a top-level entity.

To confirm, you mean during the compile step create a top-level entity that maps GUID -> identifier?

I mean that you could represent this with something like:

^typeids = {^1, ^2, ^3}
^1 = typeid: {identifier: typeid1, ...}
^2 = typeid: {identifier: typeid2, ...}
^3 = typeid: {identifier: typeid3, ...}

There's no need to store the GUIDs here because they can be computed from the type identifiers. The GUIDs would only be stored in the typeTests (etc.) fields in each function summary.

I suppose we don't need to store the GUIDs at the top level in the in-memory summary. But I think it would be good to emit the GUIDs in the typeid assembly entries because it makes the association in the assembly much more obvious. I.e. going back to my original example:

^1 = typeid: {guid: 12345, identifier: name_of_type, …
^2 = gv: {... {function: {.... typeTests: {^1, …

If we didn't include the GUID in the typeid entry, but rather just the identifier, and put the GUID in the typeTest list in the GV's entry, it wouldn't be obvious at all from the assembly listing which typeid goes with which typeTest. It's also less compact to include the GUID in each typeTests list.

I get that, but my point was that in a per-module summary the TypeIdMap is empty, so there will be no names, only GUIDs.

For "making the association more obvious" we might just want to have the assembly writer emit the GUID of a name as a comment.
 
Or perhaps we are saying the same thing - I can't tell from your above example if the GUID is also emitted in the "typeid:" entries.

No, it wouldn't be.

I'm not sure there is a need for the:
^typeids = {^1, ^2, ^3}
We can just build the typeids list on the fly as " = typeid: " entries are read in.

That's true. Given that nothing actually needs to refer to them, we can just represent the typeids as something like 
typeid: {identifier: typeid1, ...} ; guid = 123
typeid: {identifier: typeid2, ...} ; guid = 456
typeid: {identifier: typeid3, ...} ; guid = 789
without an associated number.

Peter

 


Teresa



Peter

Thanks,
Teresa
 

Peter


Issues when Parsing of Summaries from Assembly

--------------------------------------------------------------------


When reading an LLVM assembly file containing module summary entries, a ModuleSummaryIndex will be created from the entries.


Things to consider are the behavior when:

  • Invoked with “opt -module-summary” (which currently builds a new summary index from the IR). Options:

    1. recompute summary and throw away summary in the assembly file

    2. ignore -module-summary and build the summary from the LLVM assembly

    3. give an error

    4. compare the two summaries (one created from the assembly and the new one created by the analysis phase from the IR), and error if they are different.

My opinion is to do a),  so that the behavior using -module-summary doesn’t change. We also need a way to force building of a fresh module summary for cases where the user has modified the LLVM assembly of the IR (see below).


  • How to handle older LLVM assembly files that don’t contain new summary fields. Options:

    1. Force the LLVM assembly file to be recreated with a new summary. I.e. “opt -module-summary -o - | llvm-dis”.

    2. Auto-upgrade, by silently creating conservative values for the new summary entries.

I lean towards b) (when possible) for user-friendliness and to reduce required churn on test inputs.


  • How to handle partial or incorrect LLVM assembly summary entries. How to handle partial summaries depends in part on how we answer the prior question about auto-upgrading. I think the best option like there is to handle it automatically when possible. However, I do think we should error on glaring errors like obviously missing information. For example, when there is summary data in the LLVM assembly, but summary entries are missing for some global values. E.g. if the user modified the assembly to add a function but forgot to add a corresponding summary entry. We could still have subtle issues (e.g. user adds a new call but forgets to update the caller’s summary call list), but it will be harder to detect those.



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



--
-- 
Peter


--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



--
-- 
Peter


--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



--
-- 
Peter



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Joel E. Denny via llvm-dev
In reply to this post by Joel E. Denny via llvm-dev


On Thu, May 3, 2018 at 2:16 PM, Steven Wu <[hidden email]> wrote:


On May 1, 2018, at 4:50 PM, Teresa Johnson via llvm-dev <[hidden email]> wrote:

Hi Mehdi, thanks for the comments, responses and a tweaked proposal below. Teresa

On Tue, May 1, 2018 at 11:37 AM, Mehdi AMINI <[hidden email]> wrote:
Hi,

My main concern is this one:

> Currently, I am emitting the summary entries at the end, after the metadata nodes. Note that the ModuleSummaryIndex is not currently referenced from the Module, and isn’t currently created when parsing the Module IR bitcode (there is a separate derived class for reading the ModuleSummaryIndex from bitcode). This is because they are not currently used at the same time. However, in the future there is no reason why we couldn’t tag the global values in the Module’s LLVM assembly with the corresponding summary entry if the ModuleSummaryIndex is available when printing the Module in the assembly writer. I.e. we could do the following for “main” from the above example when printing the IR definition (note the “^3” at the end):


I believe the reason that the ModuleSummaryIndex is not attached to the Module is that it is fundamentally not a piece of IR, but it is conceptually really an Analysis result.
Usually other analyses don't serialize their result, we happen to serialize this one for an optimization purpose (reloading it and making the thin-link faster).

True. My understanding is that the push for having it serialized via assembly is due to the fact that it is emitted into the bitcode. I know there is disagreement on this reasoning, I am hoping to have a proposal that is acceptable to everyone. =)

I totally agree with Mehdi’s concern. It is probably much easier to treat module summary as analysis result, rather than part of the module. Making it part of IR might be overkill just to fix the debugging and readability of the summary. 


 
The fundamental problem is that an analysis result has to be able to be invalidated with IR changes, attaching this directly to the module wouldn't achieve this. The risk is that when the IR and the summary get out-of-sync (`clang -O2 my_module_with_summaries.ll -emit-llvm -o my_optimized module_with_summaries.ll`) the summaries would be badly wrong.

Have you looked into what it'd take to make it a "real" analysis in the pass manager?

Thanks for raising this issue specifically, I hadn't addressed it in my proposal and it is a big one. I am not proposing that we attempt to maintain the summary through optimization passes, and definitely don't think we should do that. IMO deserializing it should be for testing the thin link and the combined summaries in the backends only. To that end, I have an idea (below some background first).

Note that in some cases the module summary analysis is an analysis pass. I.e. when invoked by "opt -module-summary=". However, some time ago when Peter added the support for splitting the bitcode (for CFI purposes) and therefore needed to generate a summary in each partition (Thin and Regular), he added the ThinLTOBitcodeWriterPass, which invokes the module summary builder directly (twice). This writer is what gets invoked now when building via "clang -flto=thin", and with "opt -thinlto-bc". So there it is not invoked/maintained as an analysis pass/result. It would be tricky to figure out how to even split rather than recompute the module summary index in that case. Even in the case where we are still invoking as an analysis pass (opt -module-summary), we would need to figure out how to read in the module summary to use as the analysis result when available (so that it could be invalidated and recomputed when stale).

Rather than add this plumbing, and just have it discarded if opt does any optimization, I think we should focus at least for the time being on supporting reading the summary from assembly exactly where we currently read in the summary from bitcode:
1) For the thin link (e.g. tools such as llvm-lto2 or llvm-lto, which currently have to be preceded by "opt -module-summary/-thinlto-bc" to generate an index, but could just build it from assembly instead).
2) For the LTO backends (e.g. tools such as llvm-lto which can consume a combined index and invoke the backends, or "clang -fthinlto-index=" for distributed ThinLTO backend testing), where we could build the combined summary index from assembly instead.

This greatly simplifies the reading side, as there are no optimizations performed on the IR after the index is read in these cases that would require invalidation. It also simplifies adding the parsing support, since it gets invoked exactly where we expect to build an index currently (i.e. since we don't currently build or store the ModuleSummaryIndex when parsing the Module from bitcode). It doesn't preclude someone from figuring out how to compute the module summary analysis result from the assembly, and invalidating it after optimization, when reading the Module IR via 'opt' in the future.

Does this seem like a reasonable proposal to everyone?

I would +1 for making this an analysis pass. How about a ModuleSummaryAnalysis that knows how to serialize its result into bitcode and YAML format? We can also add a module flag to indicate if the module should contain module summary or not.

* For thinLTO compilation, bitcode writer runs the analysis pass if the module flag is set, and emit module summary into bitcode.
* When reading bitcode with module summary, the summary is parsed into ModuleSummaryAnalysis result. 

Note the reason we don't do this now is that we don't ever need to use a the summary at the same time as when we use the IR. We only read the summary for the thin link (no IR involved), or for distributed ThinLTO build backends (but only the combined index is read, the summary attached to the module is not read as it is not needed at all). Which is why I suggested the simplification in my proposal above of only reading the summary from assembly where we currently read the bitcode summary (which is not when we normally read the IR).

Also, note as I mentioned above that the default path may split the IR into regular LTO and ThinLTO partitions, which is why we stopped invoking it as analysis pass on the default path, and instead invoke the index builder directly for each partition. Figuring out how to split a single module summary analysis result into two parts is not something I think we should block this on.


* If there are any transform invalidates the analysis, module summary will be recomputed automatically when writing bitcode, otherwise, it will serialize out the same result as input (probably with some auto upgrade).
* For debugging, ModuleSummaryAnalysis can be serialized out to YAML
* For testing, ModuleSummaryAnalysis result can be created from YAML

I suggested YAML but it can be any other format, including LLVM assembly. For readability, I would prefer YAML.

That is similar to the original proposal from last year, but there was not agreement from other upstream maintainers to dump as something other than assembly that could be serialized in.

That's why I'm proposing the version above - serialize in as assembly exactly where we consume the bitcode version of the module summary index today. The advantage is that it doesn't require analysis pass work and figuring out how to make the analysis pass work with split LTO partitions. At the same time, it doesn't preclude doing that in the future either. I think my proposal enables testing of the ThinLTO pipeline via textual assembly, which is one of the main goals of being able to serialize it back in.

Teresa
 
Thanks

Steven


Thanks,
Teresa



Cheers,

-- 
Mehdi




Le mar. 1 mai 2018 à 11:10, Peter Collingbourne <[hidden email]> a écrit :


On Tue, May 1, 2018 at 9:00 AM, Teresa Johnson <[hidden email]> wrote:


On Mon, Apr 30, 2018 at 10:21 AM Peter Collingbourne <[hidden email]> wrote:
On Mon, Apr 30, 2018 at 8:32 AM, Teresa Johnson <[hidden email]> wrote:
Hi Peter,
Thanks for your comments, replies below.
Teresa

On Wed, Apr 25, 2018 at 2:08 PM Peter Collingbourne <[hidden email]> wrote:
Hi Teresa,

Thanks for sending this proposal out.

I would again like to register my disagreement with the whole idea of writing summaries in LLVM assembly format. In my view it is clear that this is not the right direction, as it only invites additional complexity and more ways for things to go wrong for no real benefit. However, I don't have the energy to argue that point any further, so I won't stand in the way here.

I assume you are most concerned with the re-assembly/deserialization of the summary. My main goal is to get this dumped into a text format, and I went this route since the last dumper RFC was blocked with the LLVM assembly direction pushed.

Yes, my main concern is with the deserialization. My view is that it should only be allowed for combined summaries -- allowing it for per-module is unnecessary as it creates the possibility of things gettting out of sync. Given that, we don't actually need an assembly representation and we can use whichever format is most convenient. But given the opposition to this viewpoint I am willing to concede getting the format (IMHO) right in favour of something that is acceptable to others, just so that we can test things in a more reasonable way.


On Tue, Apr 24, 2018 at 7:43 AM, Teresa Johnson <[hidden email]> wrote:
Hi everyone,

I started working on a long-standing request to have the summary dumped in a readable format to text, and specifically to emit to LLVM assembly. Proposal below, please let me know your thoughts.

Thanks,
Teresa

RFC: LLVM Assembly format for ThinLTO Summary
========================================

Background
-----------------

ThinLTO operates on small summaries computed during the compile step (i.e. with “-c -flto=thin”), which are then analyzed and updated during the Thin Link stage, and utilized to perform IR updates during the post-link ThinLTO backends. The summaries are emitted as LLVM Bitcode, however, not currently in the LLVM assembly.

There are two ways to generate a bitcode file containing summary records for a module:
  1. Compile with “clang -c -flto=thin”
  2. Build from LLVM assembly using “opt -module-summary”
Either of these will result in the ModuleSummaryIndex analysis pass (which builds the summary index in memory for a module) to be added to the pipeline just before bitcode emission.

Additionally, a combined index is created by merging all the per-module indexes during the Thin Link, which is optionally emitted as a bitcode file.

Currently, the only way to view these records is via “llvm-bcanalyzer -dump”, then manually decoding the raw bitcode dumps.

Relatedly, there is YAML reader/writer support for CFI related summary fields (-wholeprogramdevirt-read-summary and -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles Saternos implemented support to dump the summary in YAML from llvm-lto2 (D34080), including the rest of the summary fields (D34063), however, there was pushback on the related RFC for dumping via YAML or another format rather than emitting as LLVM assembly.

Goals:

  1. Define LLVM assembly format for summary index
  2. Define interaction between parsing of summary from LLVM assembly and synthesis of new summary index from IR.
  3. Implement printing and parsing of summary index LLVM assembly

Proposed LLVM Assembly Format
----------------------------------------------

There are several top level data structures within the ModuleSummaryIndex:
  1. ModulePathStringTable: Holds the paths to the modules summarized in the index (only one entry for per-module indexes and multiple in the combined index), along with their hashes (for incremental builds and global promotion).
  2. GlobalValueMap: A map from global value GUIDs to the corresponding function/variable/alias summary (or summaries for the combined index and weak linkage).
  3. CFI-related data structures (TypeIdMap, CfiFunctionDefs, and CfiFunctionDecls)

I have a WIP patch to AsmWriter.cpp to print the ModuleSummaryIndex that I was using to play with the format. It currently prints 1 and 2 above. I’ve left the CFI related summary data structures as a TODO for now, until the format is at least conceptually agreed, but from looking at those I don’t see an issue with using the same format (with a note/question for Peter on CFI type test representation below).

I modeled the proposed format on metadata, with a few key differences noted below. Like metadata, I propose enumerating the entries with the SlotTracker, and prefixing them with a special character. Avoiding characters already used in some fashion (i.e. “!” for metadata and “#” for attributes), I initially have chosen “^”. Open to suggestions though.

Consider the following example:

extern void foo();
int X;
int bar() {
 foo();
 return X;
}
void barAlias() __attribute__ ((alias ("bar")));
int main() {
 barAlias();
 return bar();
}

The proposed format has one entry per ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks like:

^0 = module: {path: testA.o, hash: 5487197307045666224}
^1 = gv: {guid: 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags: {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}
^2 = gv: {guid: 6699318081062747564, name: foo}
^3 = gv: {guid: 15822663052811949562, name: main, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}
^4 = gv: {guid: 16434608426314478903, name: bar, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs: {^1}}}}
^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries: {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live: 0, dsoLocal: 1}, aliasee: ^4}}}

Like metadata, the fields are tagged (currently using lower camel case, maybe upper camel case would be preferable).

The proposed format has a structure that reflects the data structures in the summary index. For example, consider the entry “^4”. This corresponds to the function “bar”. The entry for that GUID in the GlobalValueMap contains a list of summaries. For per-module summaries such as this, there will be at most one summary (with no summary list for an external function like “foo”). In the combined summary there may be multiple, e.g. in the case of linkonce_odr functions which have definitions in multiple modules. The summary list for bar (“^4”) contains a FunctionSummary, so the summary is tagged “function:”. The FunctionSummary contains both a flags structure (inherited from the base GlobalValueSummary class), and a funcFlags structure (specific to FunctionSummary). It therefore contains a brace-enclosed list of flag tags/values for each.

Where a global value summary references another global value summary (e.g. via a call list, reference list, or aliasee), the entry is referenced by its slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as “^4”.

Note that in comparison metadata assembly entries tend to be much more decomposed since many metadata fields are themselves metadata (so then entries tend to be shorter with references to other metadata nodes).

Currently, I am emitting the summary entries at the end, after the metadata nodes. Note that the ModuleSummaryIndex is not currently referenced from the Module, and isn’t currently created when parsing the Module IR bitcode (there is a separate derived class for reading the ModuleSummaryIndex from bitcode). This is because they are not currently used at the same time. However, in the future there is no reason why we couldn’t tag the global values in the Module’s LLVM assembly with the corresponding summary entry if the ModuleSummaryIndex is available when printing the Module in the assembly writer. I.e. we could do the following for “main” from the above example when printing the IR definition (note the “^3” at the end):

define  dso_local i32 @main() #0 !dbg !17 ^3 {

For CFI data structures, the format would be similar. It appears that TypeIds are referred to by string name in the top level TypeIdMap (std::map indexed by std::string type identifier), whereas they are referenced by GUID within the FunctionSummary class (i.e. the TypeTests vector and the VFuncId structure). For the LLVM assembly I think there should be a top level entry for each TypeIdMap, which lists both the type identifier string and its GUID (followed by its associated information stored in the map), and the TypeTests/VFuncId references on the FunctionSummary entries can reference it by summary slot number. I.e. something like:

^1 = typeid: {guid: 12345, identifier: name_of_type, …
^2 = gv: {... {function: {.... typeTests: {^1, …

Peter - is that correct and does that sound ok?

I don't think that would work because the purpose of the top-level TypeIdMap is to contain resolutions for each type identifier, and per-module summaries do not contain resolutions (only the combined summary does). What that means in practice is that we would not be able to recover and write out a type identifier name for per-module summaries as part of ^1 in your example (well, we could in principle, because the name is stored somewhere in the function's IR, but that could get complicated).

Ah ok. I guess the top-level map then is generated by the regular LTO portion of the link (since it presumably requires IR during the Thin Link to get into the combined summary)?

Yes, we fill in the map during the LowerTypeTests and WholeProgramDevirt passes in the regular LTO part of the link, e.g. here:


Probably the easiest thing to do is to keep the type identifiers as GUIDs in the function summaries and write out the mapping of type identifiers as a top-level entity.

To confirm, you mean during the compile step create a top-level entity that maps GUID -> identifier?

I mean that you could represent this with something like:

^typeids = {^1, ^2, ^3}
^1 = typeid: {identifier: typeid1, ...}
^2 = typeid: {identifier: typeid2, ...}
^3 = typeid: {identifier: typeid3, ...}

There's no need to store the GUIDs here because they can be computed from the type identifiers. The GUIDs would only be stored in the typeTests (etc.) fields in each function summary.

I suppose we don't need to store the GUIDs at the top level in the in-memory summary. But I think it would be good to emit the GUIDs in the typeid assembly entries because it makes the association in the assembly much more obvious. I.e. going back to my original example:

^1 = typeid: {guid: 12345, identifier: name_of_type, …
^2 = gv: {... {function: {.... typeTests: {^1, …

If we didn't include the GUID in the typeid entry, but rather just the identifier, and put the GUID in the typeTest list in the GV's entry, it wouldn't be obvious at all from the assembly listing which typeid goes with which typeTest. It's also less compact to include the GUID in each typeTests list.

I get that, but my point was that in a per-module summary the TypeIdMap is empty, so there will be no names, only GUIDs.

For "making the association more obvious" we might just want to have the assembly writer emit the GUID of a name as a comment.
 
Or perhaps we are saying the same thing - I can't tell from your above example if the GUID is also emitted in the "typeid:" entries.

No, it wouldn't be.

I'm not sure there is a need for the:
^typeids = {^1, ^2, ^3}
We can just build the typeids list on the fly as " = typeid: " entries are read in.

That's true. Given that nothing actually needs to refer to them, we can just represent the typeids as something like 
typeid: {identifier: typeid1, ...} ; guid = 123
typeid: {identifier: typeid2, ...} ; guid = 456
typeid: {identifier: typeid3, ...} ; guid = 789
without an associated number.

Peter

 


Teresa


Peter

Thanks,
Teresa
 

Peter


Issues when Parsing of Summaries from Assembly
--------------------------------------------------------------------

When reading an LLVM assembly file containing module summary entries, a ModuleSummaryIndex will be created from the entries.

Things to consider are the behavior when:
  • Invoked with “opt -module-summary” (which currently builds a new summary index from the IR). Options:
    1. recompute summary and throw away summary in the assembly file
    2. ignore -module-summary and build the summary from the LLVM assembly
    3. give an error
    4. compare the two summaries (one created from the assembly and the new one created by the analysis phase from the IR), and error if they are different.
My opinion is to do a),  so that the behavior using -module-summary doesn’t change. We also need a way to force building of a fresh module summary for cases where the user has modified the LLVM assembly of the IR (see below).

  • How to handle older LLVM assembly files that don’t contain new summary fields. Options:
    1. Force the LLVM assembly file to be recreated with a new summary. I.e. “opt -module-summary -o - | llvm-dis”.
    2. Auto-upgrade, by silently creating conservative values for the new summary entries.
I lean towards b) (when possible) for user-friendliness and to reduce required churn on test inputs.

  • How to handle partial or incorrect LLVM assembly summary entries. How to handle partial summaries depends in part on how we answer the prior question about auto-upgrading. I think the best option like there is to handle it automatically when possible. However, I do think we should error on glaring errors like obviously missing information. For example, when there is summary data in the LLVM assembly, but summary entries are missing for some global values. E.g. if the user modified the assembly to add a function but forgot to add a corresponding summary entry. We could still have subtle issues (e.g. user adds a new call but forgets to update the caller’s summary call list), but it will be harder to detect those.


-- 
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



-- 
-- 
Peter


-- 
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



-- 
-- 
Peter


-- 
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



-- 
-- 
Peter



-- 
Teresa Johnson | Software Engineer | [hidden email] | <a href="tel:408-460-2413" target="_blank">408-460-2413
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev




--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Joel E. Denny via llvm-dev
In reply to this post by Joel E. Denny via llvm-dev
Hi Teresa,

I have re-read your proposal, and I'm not getting how you plan to represent combined summaries with this. Unless I'm missing something, there doesn't seem to be a way to write out summaries that is independent of the global values that they relate to. Is that something that you plan to address later?

Peter

On Tue, Apr 24, 2018 at 7:43 AM, Teresa Johnson <[hidden email]> wrote:
Hi everyone,

I started working on a long-standing request to have the summary dumped in a readable format to text, and specifically to emit to LLVM assembly. Proposal below, please let me know your thoughts.

Thanks,
Teresa

RFC: LLVM Assembly format for ThinLTO Summary

========================================


Background

-----------------


ThinLTO operates on small summaries computed during the compile step (i.e. with “-c -flto=thin”), which are then analyzed and updated during the Thin Link stage, and utilized to perform IR updates during the post-link ThinLTO backends. The summaries are emitted as LLVM Bitcode, however, not currently in the LLVM assembly.


There are two ways to generate a bitcode file containing summary records for a module:

  1. Compile with “clang -c -flto=thin”

  2. Build from LLVM assembly using “opt -module-summary”

Either of these will result in the ModuleSummaryIndex analysis pass (which builds the summary index in memory for a module) to be added to the pipeline just before bitcode emission.


Additionally, a combined index is created by merging all the per-module indexes during the Thin Link, which is optionally emitted as a bitcode file.


Currently, the only way to view these records is via “llvm-bcanalyzer -dump”, then manually decoding the raw bitcode dumps.


Relatedly, there is YAML reader/writer support for CFI related summary fields (-wholeprogramdevirt-read-summary and -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles Saternos implemented support to dump the summary in YAML from llvm-lto2 (D34080), including the rest of the summary fields (D34063), however, there was pushback on the related RFC for dumping via YAML or another format rather than emitting as LLVM assembly.


Goals:


  1. Define LLVM assembly format for summary index

  2. Define interaction between parsing of summary from LLVM assembly and synthesis of new summary index from IR.

  3. Implement printing and parsing of summary index LLVM assembly


Proposed LLVM Assembly Format

----------------------------------------------


There are several top level data structures within the ModuleSummaryIndex:

  1. ModulePathStringTable: Holds the paths to the modules summarized in the index (only one entry for per-module indexes and multiple in the combined index), along with their hashes (for incremental builds and global promotion).

  2. GlobalValueMap: A map from global value GUIDs to the corresponding function/variable/alias summary (or summaries for the combined index and weak linkage).

  3. CFI-related data structures (TypeIdMap, CfiFunctionDefs, and CfiFunctionDecls)


I have a WIP patch to AsmWriter.cpp to print the ModuleSummaryIndex that I was using to play with the format. It currently prints 1 and 2 above. I’ve left the CFI related summary data structures as a TODO for now, until the format is at least conceptually agreed, but from looking at those I don’t see an issue with using the same format (with a note/question for Peter on CFI type test representation below).


I modeled the proposed format on metadata, with a few key differences noted below. Like metadata, I propose enumerating the entries with the SlotTracker, and prefixing them with a special character. Avoiding characters already used in some fashion (i.e. “!” for metadata and “#” for attributes), I initially have chosen “^”. Open to suggestions though.


Consider the following example:


extern void foo();
int X;
int bar() {
 foo();
 return X;
}
void barAlias() __attribute__ ((alias ("bar")));
int main() {
 barAlias();
 return bar();
}

The proposed format has one entry per ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks like:


^0 = module: {path: testA.o, hash: 5487197307045666224}
^1 = gv: {guid: 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags: {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}
^2 = gv: {guid: 6699318081062747564, name: foo}
^3 = gv: {guid: 15822663052811949562, name: main, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}
^4 = gv: {guid: 16434608426314478903, name: bar, summaries: {function: {module: ^0, flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs: {^1}}}}
^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries: {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live: 0, dsoLocal: 1}, aliasee: ^4}}}

Like metadata, the fields are tagged (currently using lower camel case, maybe upper camel case would be preferable).


The proposed format has a structure that reflects the data structures in the summary index. For example, consider the entry “^4”. This corresponds to the function “bar”. The entry for that GUID in the GlobalValueMap contains a list of summaries. For per-module summaries such as this, there will be at most one summary (with no summary list for an external function like “foo”). In the combined summary there may be multiple, e.g. in the case of linkonce_odr functions which have definitions in multiple modules. The summary list for bar (“^4”) contains a FunctionSummary, so the summary is tagged “function:”. The FunctionSummary contains both a flags structure (inherited from the base GlobalValueSummary class), and a funcFlags structure (specific to FunctionSummary). It therefore contains a brace-enclosed list of flag tags/values for each.


Where a global value summary references another global value summary (e.g. via a call list, reference list, or aliasee), the entry is referenced by its slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as “^4”.


Note that in comparison metadata assembly entries tend to be much more decomposed since many metadata fields are themselves metadata (so then entries tend to be shorter with references to other metadata nodes).

Currently, I am emitting the summary entries at the end, after the metadata nodes. Note that the ModuleSummaryIndex is not currently referenced from the Module, and isn’t currently created when parsing the Module IR bitcode (there is a separate derived class for reading the ModuleSummaryIndex from bitcode). This is because they are not currently used at the same time. However, in the future there is no reason why we couldn’t tag the global values in the Module’s LLVM assembly with the corresponding summary entry if the ModuleSummaryIndex is available when printing the Module in the assembly writer. I.e. we could do the following for “main” from the above example when printing the IR definition (note the “^3” at the end):


define  dso_local i32 @main() #0 !dbg !17 ^3 {


For CFI data structures, the format would be similar. It appears that TypeIds are referred to by string name in the top level TypeIdMap (std::map indexed by std::string type identifier), whereas they are referenced by GUID within the FunctionSummary class (i.e. the TypeTests vector and the VFuncId structure). For the LLVM assembly I think there should be a top level entry for each TypeIdMap, which lists both the type identifier string and its GUID (followed by its associated information stored in the map), and the TypeTests/VFuncId references on the FunctionSummary entries can reference it by summary slot number. I.e. something like:


^1 = typeid: {guid: 12345, identifier: name_of_type, …

^2 = gv: {... {function: {.... typeTests: {^1, …


Peter - is that correct and does that sound ok?


Issues when Parsing of Summaries from Assembly

--------------------------------------------------------------------


When reading an LLVM assembly file containing module summary entries, a ModuleSummaryIndex will be created from the entries.


Things to consider are the behavior when:

  • Invoked with “opt -module-summary” (which currently builds a new summary index from the IR). Options:

    1. recompute summary and throw away summary in the assembly file

    2. ignore -module-summary and build the summary from the LLVM assembly

    3. give an error

    4. compare the two summaries (one created from the assembly and the new one created by the analysis phase from the IR), and error if they are different.

My opinion is to do a),  so that the behavior using -module-summary doesn’t change. We also need a way to force building of a fresh module summary for cases where the user has modified the LLVM assembly of the IR (see below).


  • How to handle older LLVM assembly files that don’t contain new summary fields. Options:

    1. Force the LLVM assembly file to be recreated with a new summary. I.e. “opt -module-summary -o - | llvm-dis”.

    2. Auto-upgrade, by silently creating conservative values for the new summary entries.

I lean towards b) (when possible) for user-friendliness and to reduce required churn on test inputs.


  • How to handle partial or incorrect LLVM assembly summary entries. How to handle partial summaries depends in part on how we answer the prior question about auto-upgrading. I think the best option like there is to handle it automatically when possible. However, I do think we should error on glaring errors like obviously missing information. For example, when there is summary data in the LLVM assembly, but summary entries are missing for some global values. E.g. if the user modified the assembly to add a function but forgot to add a corresponding summary entry. We could still have subtle issues (e.g. user adds a new call but forgets to update the caller’s summary call list), but it will be harder to detect those.



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413



--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Joel E. Denny via llvm-dev
In reply to this post by Joel E. Denny via llvm-dev
Ugh, my earlier response is awaiting moderator approval. Apparently with the response and all the other earlier emails quoted it was too long. Resending with a bunch of old quoted emails trimmed.
Teresa

On Thu, May 3, 2018 at 2:58 PM, Teresa Johnson <[hidden email]> wrote:


On Thu, May 3, 2018 at 2:16 PM, Steven Wu <[hidden email]> wrote:


On May 1, 2018, at 4:50 PM, Teresa Johnson via llvm-dev <[hidden email]> wrote:

Hi Mehdi, thanks for the comments, responses and a tweaked proposal below. Teresa

On Tue, May 1, 2018 at 11:37 AM, Mehdi AMINI <[hidden email]> wrote:
Hi,

My main concern is this one:

> Currently, I am emitting the summary entries at the end, after the metadata nodes. Note that the ModuleSummaryIndex is not currently referenced from the Module, and isn’t currently created when parsing the Module IR bitcode (there is a separate derived class for reading the ModuleSummaryIndex from bitcode). This is because they are not currently used at the same time. However, in the future there is no reason why we couldn’t tag the global values in the Module’s LLVM assembly with the corresponding summary entry if the ModuleSummaryIndex is available when printing the Module in the assembly writer. I.e. we could do the following for “main” from the above example when printing the IR definition (note the “^3” at the end):


I believe the reason that the ModuleSummaryIndex is not attached to the Module is that it is fundamentally not a piece of IR, but it is conceptually really an Analysis result.
Usually other analyses don't serialize their result, we happen to serialize this one for an optimization purpose (reloading it and making the thin-link faster).

True. My understanding is that the push for having it serialized via assembly is due to the fact that it is emitted into the bitcode. I know there is disagreement on this reasoning, I am hoping to have a proposal that is acceptable to everyone. =)

I totally agree with Mehdi’s concern. It is probably much easier to treat module summary as analysis result, rather than part of the module. Making it part of IR might be overkill just to fix the debugging and readability of the summary. 


 
The fundamental problem is that an analysis result has to be able to be invalidated with IR changes, attaching this directly to the module wouldn't achieve this. The risk is that when the IR and the summary get out-of-sync (`clang -O2 my_module_with_summaries.ll -emit-llvm -o my_optimized module_with_summaries.ll`) the summaries would be badly wrong.

Have you looked into what it'd take to make it a "real" analysis in the pass manager?

Thanks for raising this issue specifically, I hadn't addressed it in my proposal and it is a big one. I am not proposing that we attempt to maintain the summary through optimization passes, and definitely don't think we should do that. IMO deserializing it should be for testing the thin link and the combined summaries in the backends only. To that end, I have an idea (below some background first).

Note that in some cases the module summary analysis is an analysis pass. I.e. when invoked by "opt -module-summary=". However, some time ago when Peter added the support for splitting the bitcode (for CFI purposes) and therefore needed to generate a summary in each partition (Thin and Regular), he added the ThinLTOBitcodeWriterPass, which invokes the module summary builder directly (twice). This writer is what gets invoked now when building via "clang -flto=thin", and with "opt -thinlto-bc". So there it is not invoked/maintained as an analysis pass/result. It would be tricky to figure out how to even split rather than recompute the module summary index in that case. Even in the case where we are still invoking as an analysis pass (opt -module-summary), we would need to figure out how to read in the module summary to use as the analysis result when available (so that it could be invalidated and recomputed when stale).

Rather than add this plumbing, and just have it discarded if opt does any optimization, I think we should focus at least for the time being on supporting reading the summary from assembly exactly where we currently read in the summary from bitcode:
1) For the thin link (e.g. tools such as llvm-lto2 or llvm-lto, which currently have to be preceded by "opt -module-summary/-thinlto-bc" to generate an index, but could just build it from assembly instead).
2) For the LTO backends (e.g. tools such as llvm-lto which can consume a combined index and invoke the backends, or "clang -fthinlto-index=" for distributed ThinLTO backend testing), where we could build the combined summary index from assembly instead.

This greatly simplifies the reading side, as there are no optimizations performed on the IR after the index is read in these cases that would require invalidation. It also simplifies adding the parsing support, since it gets invoked exactly where we expect to build an index currently (i.e. since we don't currently build or store the ModuleSummaryIndex when parsing the Module from bitcode). It doesn't preclude someone from figuring out how to compute the module summary analysis result from the assembly, and invalidating it after optimization, when reading the Module IR via 'opt' in the future.

Does this seem like a reasonable proposal to everyone?

I would +1 for making this an analysis pass. How about a ModuleSummaryAnalysis that knows how to serialize its result into bitcode and YAML format? We can also add a module flag to indicate if the module should contain module summary or not.

* For thinLTO compilation, bitcode writer runs the analysis pass if the module flag is set, and emit module summary into bitcode.
* When reading bitcode with module summary, the summary is parsed into ModuleSummaryAnalysis result. 

Note the reason we don't do this now is that we don't ever need to use a the summary at the same time as when we use the IR. We only read the summary for the thin link (no IR involved), or for distributed ThinLTO build backends (but only the combined index is read, the summary attached to the module is not read as it is not needed at all). Which is why I suggested the simplification in my proposal above of only reading the summary from assembly where we currently read the bitcode summary (which is not when we normally read the IR).

Also, note as I mentioned above that the default path may split the IR into regular LTO and ThinLTO partitions, which is why we stopped invoking it as analysis pass on the default path, and instead invoke the index builder directly for each partition. Figuring out how to split a single module summary analysis result into two parts is not something I think we should block this on.


* If there are any transform invalidates the analysis, module summary will be recomputed automatically when writing bitcode, otherwise, it will serialize out the same result as input (probably with some auto upgrade).
* For debugging, ModuleSummaryAnalysis can be serialized out to YAML
* For testing, ModuleSummaryAnalysis result can be created from YAML

I suggested YAML but it can be any other format, including LLVM assembly. For readability, I would prefer YAML.

That is similar to the original proposal from last year, but there was not agreement from other upstream maintainers to dump as something other than assembly that could be serialized in.

That's why I'm proposing the version above - serialize in as assembly exactly where we consume the bitcode version of the module summary index today. The advantage is that it doesn't require analysis pass work and figuring out how to make the analysis pass work with split LTO partitions. At the same time, it doesn't preclude doing that in the future either. I think my proposal enables testing of the ThinLTO pipeline via textual assembly, which is one of the main goals of being able to serialize it back in.

Teresa
 
Thanks

Steven



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Joel E. Denny via llvm-dev
In reply to this post by Joel E. Denny via llvm-dev


On Thu, May 3, 2018 at 2:44 PM, Mehdi AMINI <[hidden email]> wrote:


Le mar. 1 mai 2018 à 16:50, Teresa Johnson <[hidden email]> a écrit :
Hi Mehdi, thanks for the comments, responses and a tweaked proposal below. Teresa

On Tue, May 1, 2018 at 11:37 AM, Mehdi AMINI <[hidden email]> wrote:
Hi,

My main concern is this one:

> Currently, I am emitting the summary entries at the end, after the metadata nodes. Note that the ModuleSummaryIndex is not currently referenced from the Module, and isn’t currently created when parsing the Module IR bitcode (there is a separate derived class for reading the ModuleSummaryIndex from bitcode). This is because they are not currently used at the same time. However, in the future there is no reason why we couldn’t tag the global values in the Module’s LLVM assembly with the corresponding summary entry if the ModuleSummaryIndex is available when printing the Module in the assembly writer. I.e. we could do the following for “main” from the above example when printing the IR definition (note the “^3” at the end):


I believe the reason that the ModuleSummaryIndex is not attached to the Module is that it is fundamentally not a piece of IR, but it is conceptually really an Analysis result.
Usually other analyses don't serialize their result, we happen to serialize this one for an optimization purpose (reloading it and making the thin-link faster).

True. My understanding is that the push for having it serialized via assembly is due to the fact that it is emitted into the bitcode. I know there is disagreement on this reasoning, I am hoping to have a proposal that is acceptable to everyone. =)

 
The fundamental problem is that an analysis result has to be able to be invalidated with IR changes, attaching this directly to the module wouldn't achieve this. The risk is that when the IR and the summary get out-of-sync (`clang -O2 my_module_with_summaries.ll -emit-llvm -o my_optimized module_with_summaries.ll`) the summaries would be badly wrong.

Have you looked into what it'd take to make it a "real" analysis in the pass manager?

Thanks for raising this issue specifically, I hadn't addressed it in my proposal and it is a big one. I am not proposing that we attempt to maintain the summary through optimization passes, and definitely don't think we should do that. IMO deserializing it should be for testing the thin link and the combined summaries in the backends only. To that end, I have an idea (below some background first).

Note that in some cases the module summary analysis is an analysis pass. I.e. when invoked by "opt -module-summary=". However, some time ago when Peter added the support for splitting the bitcode (for CFI purposes) and therefore needed to generate a summary in each partition (Thin and Regular), he added the ThinLTOBitcodeWriterPass, which invokes the module summary builder directly (twice). This writer is what gets invoked now when building via "clang -flto=thin", and with "opt -thinlto-bc". So there it is not invoked/maintained as an analysis pass/result. It would be tricky to figure out how to even split rather than recompute the module summary index in that case. Even in the case where we are still invoking as an analysis pass (opt -module-summary), we would need to figure out how to read in the module summary to use as the analysis result when available (so that it could be invalidated and recomputed when stale).

Rather than add this plumbing, and just have it discarded if opt does any optimization, I think we should focus at least for the time being on supporting reading the summary from assembly exactly where we currently read in the summary from bitcode:
1) For the thin link (e.g. tools such as llvm-lto2 or llvm-lto, which currently have to be preceded by "opt -module-summary/-thinlto-bc" to generate an index, but could just build it from assembly instead).
2) For the LTO backends (e.g. tools such as llvm-lto which can consume a combined index and invoke the backends, or "clang -fthinlto-index=" for distributed ThinLTO backend testing), where we could build the combined summary index from assembly instead.

This greatly simplifies the reading side, as there are no optimizations performed on the IR after the index is read in these cases that would require invalidation. It also simplifies adding the parsing support, since it gets invoked exactly where we expect to build an index currently (i.e. since we don't currently build or store the ModuleSummaryIndex when parsing the Module from bitcode). It doesn't preclude someone from figuring out how to compute the module summary analysis result from the assembly, and invalidating it after optimization, when reading the Module IR via 'opt' in the future.

Does this seem like a reasonable proposal to everyone?

Sounds good to me.

That would make .ll files quite convenient during debugging I think? We could disassemble, manually change summaries, and re-assemble a bitcode file before running the (thin-)link again.

Great, yep, that's exactly the idea. Although the tools that invoke the thin link could just consume the manually changed assembly summary files directly (or have a mechanism to re-assemble to a summary-only bitcode file).

Teresa


Thanks,

-- 
Mehdi



--
Teresa Johnson | Software Engineer | [hidden email] | 408-460-2413

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: