[llvm-dev] RFC: Safe Whole Program Devirtualization Enablement

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] RFC: Safe Whole Program Devirtualization Enablement

Hiroshi Yamauchi via llvm-dev
Please send any comments. As mentioned at the end I will follow up with some patches as soon as they are cleaned up and I create some test cases.

RFC: Safe Whole Program Devirtualization Enablement
===================================================

High Level Summary
------------------

The goal of the changes described in this RFC is to support aggressive Whole Program Devirtualization without requiring -fvisibility=hidden at compile time, by pre-enabling bitcode for whole program devirtualization, but delaying the decision on whether to apply devirtualization until LTO link time. This is needed both because we may not know whether the link mode is safe for hidden LTO visibility until link time, and also to allow bitcode objects to be shared between links of targets with differing valid LTO visibility. This utilizes the !vcall_visibility metadata added for Dead Virtual Function Elimination.

The summary of changes required are (these are described in more detail later):

1) When -fwhole-program-vtables is specified, always insert type test assumes for virtual calls, and additionally add !vcall_visibility metadata to vtable definitions (which will be summarized in the ThinLTO index).

2) At LTO link time, apply hidden LTO visibility to vtable definition vcall_visibility metadata (or summary) when specified by a new link option (-lto-whole-program-visibility).

3) During the LTO link time Whole Program Devirtualization analysis, only allow devirtualization when the associated vtable definitions have hidden LTO visibility, as derived from the !vcall_visibility metadata (summarized in the index for index-only WPD).

4) Modify the Virtual Function Elimination application in GlobalDCE to ignore vtables with !vcall_visibility when they are associated with type tests (and not just type checked loads).

Background
----------

Whole Program Devirtualization is supported for LTO (both regular and Thin) via the -fwhole-program-vtables option. However, it can only be safely applied to classes for which LTO can analyze the entire class hierarchy, and therefore is restricted to those classes with hidden LTO visibility. See https://clang.llvm.org/docs/LTOVisibility.html for more information.

The LTO visibility of a class is derived at compile time from the class’s symbol visibility. Generally, only classes that are internal at the source level (e.g. declared in an anonymous namespace) receive hidden LTO visibility. Compiling with -fvisibility=hidden tells the compiler that, unless otherwise marked, symbols are assumed to have hidden visibility, which also implies that all classes have hidden LTO visibility (unless decorated with a public visibility attribute). This results in much more aggressive devirtualization.

However, compiling with -fvisibility=hidden is only safe when we know we are LTO linking with full view of the class hierarchy. Specifically, this is true when a binary is being LTO linked with either all sources being bitcode (so that the LTO unit is the same as the linkage unit), or when the only translation units being linked as native code are known to not derive any classes defined in the LTO unit (e.g. system libraries). Additionally, the binary may not dlopen any libraries at runtime that contain classes derived from those defined in the main binary.

Assuming we are building and linking a binary that satisfies the above constraints (we are LTO linking all translation units as bitcode, except certain (e.g. system) libraries or other native objects known to be safe by the user or build system, and the binary will not dlopen any libraries deriving from the binary’s classes), then it should be safe to compile with -fvisibility=hidden, along with -fwhole-program-vtables.

However, there are cases where it is unknown until link time whether we are building a target that meets the above constraints. Additionally, we may want to build additional targets that do not meet the criteria for safe application of -fvisibility=hidden during the same build invocation (specifically, because subsets of the code will be linked into shared libraries instead of linking all code directly into the binary). Even if possible to build two sets of bitcode object files (one with default visibility for the unsafely linked targets and one with hidden visibility for the safely linked targets), this causes duplication in both time and space, which is prohibitive in an environment where it is common to build targets with tens of thousands of sources, and multiple targets with different link modes simultaneously.

The goals of the changes described in this RFC are to essentially delay the application 
of -fvisibility=hidden until LTO link time, and allow bitcode objects to be shared between links of targets with differing link modes and therefore differing valid LTO visibility.

Type Information for Devirtualization
-------------------------------------

LTO whole program devirtualization is driven off of type information in the IR. This includes type metadata (on vtable definitions), as well as type test intrinsics before virtual calls. The former is safe to emit into the IR in all cases, but the latter is currently not. The virtual call sites are decorated with an llvm.assume(llvm.type.test(ptr, typeid)) sequence, which drives the LTO analysis of virtual calls. This sequence is an assertion that the given pointer is associated with the given type identifier (https://llvm.org/docs/LangRef.html#llvm-type-test-intrinsic). It is currently inserted only for classes with hidden LTO visibility as the implication of this sequence is that we have full visibility of that type’s class hierarchy, and may devirtualize the call based on that knowledge. This assumption is not valid if the class does not have hidden LTO visibility.

In order to drive later devirtualization, we still need the type compatibility information provided by the llvm.type.test, but want to delay a decision on whether it is valid to assume that we have full class hierarchy visibility, and thus whether devirtualization of that target can be safely applied.

Specifically, what we want to know at LTO time is whether the vtable has hidden LTO visibility or not, and use that to guide the application of devirtualization to the type tested virtual call sites. By default, only those with statically guaranteed hidden LTO visibility should be marked as such. And as described later, at LTO link time we can optionally decide to convert vtables to hidden LTO visibility for more aggressive devirtualization when appropriate.

There is already a mechanism in the compiler to describe the vtable visibility, which was recently added for Dead Virtual Function Elimination (D63932): !vcall_visibility metadata, documented at https://llvm.org/docs/TypeMetadata.html#vcall-visibility-metadata. This metadata is attached to vtable definitions, currently only when VFE is enabled. As described in the documentation, because this is currently only used for VFE, it also requires that the corresponding function pointer loads use the llvm.type.checked.load intrinsic. This would not be required for devirtualization (although the VFE support in GlobalDCE will need modification to ignore the metadata when type checked loads not used, more on that later).

This RFC proposes adding the !vcall_visibility metadata to vtable definitions when -fwhole-program-vtables is specified. Unlike for VFE, the function pointer loads can still use normal loads with corresponding type test assume sequences (better for optimization).

Additional changes to the LTO compilation steps are detailed below.

Pre-Link LTO Compile
--------------------

First, type test assume sequences will be inserted when -fwhole-program-vtables is specified, and not just for classes with hidden LTO visibility.

Second, as mentioned earlier, the !vcall_visibility metadata will be inserted under -fwhole-program-vtables. For the purposes of index-only WPD, a single-bit flag indicating whether or not the vtable def has hidden LTO visibility is added to the GVarFlags on the GlobalVarSummary. Note that we can collapse the 3 enum values of the metadata down to a single bit, because for the purposes of devirtualization, both VCallVisibilityLinkageUnit and VCallVisibilityTranslationUnit can be treated the same (we only need to have at least VCallVisibilityLinkageUnit to devirtualize). The ModuleSummaryIndex builder will set this new flag from the !vcall_visibility metadata on vtable definitions.

Finally, the VFE support in GlobalDCE (which is enabled by default and currently triggers automatically in the presence of this metadata), will need to be modified to ignore !vcall_visibility metadata inserted for devirtualization only, i.e. when there are any type test assume sequences for that Type ID. This should be straightforward, as we can scan the type tests and remove any vtables decorated with compatible type ids from VFESafeVTables. Note that this change will affect the invocation of GlobalDCE both here in the pre-link LTO compile as well as later in the LTO Backend (where it is applied to a broader set of vtables).

LTO Link Handling
-----------------

During Whole Program Devirtualization analysis, when looking at the vtables corresponding to the summarized virtual calls during tryFindVirtualCallTargets, we must consult the vcall_visibility information. For hybrid (regular+thin) LTO, the vtable definitions are in the regular LTO partition and so the IR can be consulted directly. For index-only WPD, we instead consult the flag on the vtable’s GlobalVarSummary.

If any of the vtable definitions compatible with a given virtual call have public LTO visibility, the devirtualization must be skipped.

By default, only classes that have statically determined hidden LTO visibility would be allowed to devirtualize. However, as noted earlier, we want to enable more aggressive devirtualization at LTO link time when we know that the linking mode guarantees full LTO visibility of any code that may derive classes from the bitcode being linked. To do so, we will add a new linker option:

For lld, the proposed option is: -lto-whole-program-visibility.
For gold, the corresponding plugin option would be “whole-program-visibility”.

When this option is set, LTO will convert all vtable definitions to have hidden LTO visibility before invoking Whole Program Devirtualization. In the hybrid LTO case this would mean changing the metadata on the IR. In the index-only case this would be done in the summaries.

LTO Backend Handling
--------------------

No changes are required in the LTO backend’s invocation of Whole Program Devirtualization, since any visibility constraints are enforced at LTO link time, and the loosening of visibility under the new link option only needs to affect the LTO WPD invocation.

As mentioned earlier when describing the pre-link LTO compile changes, GlobalDCE will be changed to ignore vtables with !vcall_visibility metadata corresponding to type tests (and not just type checked loads).

Status
------

These changes have been prototyped and tested with index-only WPD (with the exception of the proposed changes to GlobalDCE, at the moment I have been testing with -enable-vfe=false). I will be cleaning up the changes and sending patches for review in the coming days.


--
Teresa Johnson | Software Engineer | [hidden email] |

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Safe Whole Program Devirtualization Enablement

Hiroshi Yamauchi via llvm-dev

Specifically, what we want to know at LTO time is whether the vtable has hidden LTO visibility or not


I can be missing something, but why can't we use type metadata instead of !vcall_visibility to identify vtable pointers? ​We can skip emission of !type for vtables having [[clang::lto_visibility_public]] attribute and postpone decision on other vtables in the way you suggested.




От: Teresa Johnson <[hidden email]>
Отправлено: 11 декабря 2019 г. 17:21
Кому: llvm-dev
Копия: Peter Collingbourne; Steven Wu; Evgeny Leviant; David Li
Тема: RFC: Safe Whole Program Devirtualization Enablement
 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.  If you suspect potential phishing or spam email, report it to [hidden email]

Please send any comments. As mentioned at the end I will follow up with some patches as soon as they are cleaned up and I create some test cases.

RFC: Safe Whole Program Devirtualization Enablement
===================================================

High Level Summary
------------------

The goal of the changes described in this RFC is to support aggressive Whole Program Devirtualization without requiring -fvisibility=hidden at compile time, by pre-enabling bitcode for whole program devirtualization, but delaying the decision on whether to apply devirtualization until LTO link time. This is needed both because we may not know whether the link mode is safe for hidden LTO visibility until link time, and also to allow bitcode objects to be shared between links of targets with differing valid LTO visibility. This utilizes the !vcall_visibility metadata added for Dead Virtual Function Elimination.

The summary of changes required are (these are described in more detail later):

1) When -fwhole-program-vtables is specified, always insert type test assumes for virtual calls, and additionally add !vcall_visibility metadata to vtable definitions (which will be summarized in the ThinLTO index).

2) At LTO link time, apply hidden LTO visibility to vtable definition vcall_visibility metadata (or summary) when specified by a new link option (-lto-whole-program-visibility).

3) During the LTO link time Whole Program Devirtualization analysis, only allow devirtualization when the associated vtable definitions have hidden LTO visibility, as derived from the !vcall_visibility metadata (summarized in the index for index-only WPD).

4) Modify the Virtual Function Elimination application in GlobalDCE to ignore vtables with !vcall_visibility when they are associated with type tests (and not just type checked loads).

Background
----------

Whole Program Devirtualization is supported for LTO (both regular and Thin) via the -fwhole-program-vtables option. However, it can only be safely applied to classes for which LTO can analyze the entire class hierarchy, and therefore is restricted to those classes with hidden LTO visibility. See https://clang.llvm.org/docs/LTOVisibility.html for more information.

The LTO visibility of a class is derived at compile time from the class’s symbol visibility. Generally, only classes that are internal at the source level (e.g. declared in an anonymous namespace) receive hidden LTO visibility. Compiling with -fvisibility=hidden tells the compiler that, unless otherwise marked, symbols are assumed to have hidden visibility, which also implies that all classes have hidden LTO visibility (unless decorated with a public visibility attribute). This results in much more aggressive devirtualization.

However, compiling with -fvisibility=hidden is only safe when we know we are LTO linking with full view of the class hierarchy. Specifically, this is true when a binary is being LTO linked with either all sources being bitcode (so that the LTO unit is the same as the linkage unit), or when the only translation units being linked as native code are known to not derive any classes defined in the LTO unit (e.g. system libraries). Additionally, the binary may not dlopen any libraries at runtime that contain classes derived from those defined in the main binary.

Assuming we are building and linking a binary that satisfies the above constraints (we are LTO linking all translation units as bitcode, except certain (e.g. system) libraries or other native objects known to be safe by the user or build system, and the binary will not dlopen any libraries deriving from the binary’s classes), then it should be safe to compile with -fvisibility=hidden, along with -fwhole-program-vtables.

However, there are cases where it is unknown until link time whether we are building a target that meets the above constraints. Additionally, we may want to build additional targets that do not meet the criteria for safe application of -fvisibility=hidden during the same build invocation (specifically, because subsets of the code will be linked into shared libraries instead of linking all code directly into the binary). Even if possible to build two sets of bitcode object files (one with default visibility for the unsafely linked targets and one with hidden visibility for the safely linked targets), this causes duplication in both time and space, which is prohibitive in an environment where it is common to build targets with tens of thousands of sources, and multiple targets with different link modes simultaneously.

The goals of the changes described in this RFC are to essentially delay the application 
of -fvisibility=hidden until LTO link time, and allow bitcode objects to be shared between links of targets with differing link modes and therefore differing valid LTO visibility.

Type Information for Devirtualization
-------------------------------------

LTO whole program devirtualization is driven off of type information in the IR. This includes type metadata (on vtable definitions), as well as type test intrinsics before virtual calls. The former is safe to emit into the IR in all cases, but the latter is currently not. The virtual call sites are decorated with an llvm.assume(llvm.type.test(ptr, typeid)) sequence, which drives the LTO analysis of virtual calls. This sequence is an assertion that the given pointer is associated with the given type identifier (https://llvm.org/docs/LangRef.html#llvm-type-test-intrinsic). It is currently inserted only for classes with hidden LTO visibility as the implication of this sequence is that we have full visibility of that type’s class hierarchy, and may devirtualize the call based on that knowledge. This assumption is not valid if the class does not have hidden LTO visibility.

In order to drive later devirtualization, we still need the type compatibility information provided by the llvm.type.test, but want to delay a decision on whether it is valid to assume that we have full class hierarchy visibility, and thus whether devirtualization of that target can be safely applied.

Specifically, what we want to know at LTO time is whether the vtable has hidden LTO visibility or not, and use that to guide the application of devirtualization to the type tested virtual call sites. By default, only those with statically guaranteed hidden LTO visibility should be marked as such. And as described later, at LTO link time we can optionally decide to convert vtables to hidden LTO visibility for more aggressive devirtualization when appropriate.

There is already a mechanism in the compiler to describe the vtable visibility, which was recently added for Dead Virtual Function Elimination (D63932): !vcall_visibility metadata, documented at https://llvm.org/docs/TypeMetadata.html#vcall-visibility-metadata. This metadata is attached to vtable definitions, currently only when VFE is enabled. As described in the documentation, because this is currently only used for VFE, it also requires that the corresponding function pointer loads use the llvm.type.checked.load intrinsic. This would not be required for devirtualization (although the VFE support in GlobalDCE will need modification to ignore the metadata when type checked loads not used, more on that later).

This RFC proposes adding the !vcall_visibility metadata to vtable definitions when -fwhole-program-vtables is specified. Unlike for VFE, the function pointer loads can still use normal loads with corresponding type test assume sequences (better for optimization).

Additional changes to the LTO compilation steps are detailed below.

Pre-Link LTO Compile
--------------------

First, type test assume sequences will be inserted when -fwhole-program-vtables is specified, and not just for classes with hidden LTO visibility.

Second, as mentioned earlier, the !vcall_visibility metadata will be inserted under -fwhole-program-vtables. For the purposes of index-only WPD, a single-bit flag indicating whether or not the vtable def has hidden LTO visibility is added to the GVarFlags on the GlobalVarSummary. Note that we can collapse the 3 enum values of the metadata down to a single bit, because for the purposes of devirtualization, both VCallVisibilityLinkageUnit and VCallVisibilityTranslationUnit can be treated the same (we only need to have at least VCallVisibilityLinkageUnit to devirtualize). The ModuleSummaryIndex builder will set this new flag from the !vcall_visibility metadata on vtable definitions.

Finally, the VFE support in GlobalDCE (which is enabled by default and currently triggers automatically in the presence of this metadata), will need to be modified to ignore !vcall_visibility metadata inserted for devirtualization only, i.e. when there are any type test assume sequences for that Type ID. This should be straightforward, as we can scan the type tests and remove any vtables decorated with compatible type ids from VFESafeVTables. Note that this change will affect the invocation of GlobalDCE both here in the pre-link LTO compile as well as later in the LTO Backend (where it is applied to a broader set of vtables).

LTO Link Handling
-----------------

During Whole Program Devirtualization analysis, when looking at the vtables corresponding to the summarized virtual calls during tryFindVirtualCallTargets, we must consult the vcall_visibility information. For hybrid (regular+thin) LTO, the vtable definitions are in the regular LTO partition and so the IR can be consulted directly. For index-only WPD, we instead consult the flag on the vtable’s GlobalVarSummary.

If any of the vtable definitions compatible with a given virtual call have public LTO visibility, the devirtualization must be skipped.

By default, only classes that have statically determined hidden LTO visibility would be allowed to devirtualize. However, as noted earlier, we want to enable more aggressive devirtualization at LTO link time when we know that the linking mode guarantees full LTO visibility of any code that may derive classes from the bitcode being linked. To do so, we will add a new linker option:

For lld, the proposed option is: -lto-whole-program-visibility.
For gold, the corresponding plugin option would be “whole-program-visibility”.

When this option is set, LTO will convert all vtable definitions to have hidden LTO visibility before invoking Whole Program Devirtualization. In the hybrid LTO case this would mean changing the metadata on the IR. In the index-only case this would be done in the summaries.

LTO Backend Handling
--------------------

No changes are required in the LTO backend’s invocation of Whole Program Devirtualization, since any visibility constraints are enforced at LTO link time, and the loosening of visibility under the new link option only needs to affect the LTO WPD invocation.

As mentioned earlier when describing the pre-link LTO compile changes, GlobalDCE will be changed to ignore vtables with !vcall_visibility metadata corresponding to type tests (and not just type checked loads).

Status
------

These changes have been prototyped and tested with index-only WPD (with the exception of the proposed changes to GlobalDCE, at the moment I have been testing with -enable-vfe=false). I will be cleaning up the changes and sending patches for review in the coming days.


--
Teresa Johnson | Software Engineer | [hidden email] |

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Safe Whole Program Devirtualization Enablement

Hiroshi Yamauchi via llvm-dev


On Fri, Dec 13, 2019 at 8:56 AM Evgeny Leviant <[hidden email]> wrote:

Specifically, what we want to know at LTO time is whether the vtable has hidden LTO visibility or not


I can be missing something, but why can't we use type metadata instead of !vcall_visibility to identify vtable pointers? We can skip emission of !type for vtables having [[clang::lto_visibility_public]] attribute and postpone decision on other vtables in the way you suggested.

I'm not sure if you mean the vtables that have received this attribute manually, or just the ones that by default would get public LTO visibility (the latter is the vast bulk of the interesting case). Regardless, it is the same reason. At LTO link time we want to optionally treat these as hidden (i.e. delay the effects of what would have been done at compile time under -fvisibility=hidden). If we don't emit the !type metadata, then we cannot do this as we lose the class hierarchy info necessary for WPD. The vcall_visibility attribute just tells us which vtables we must treat conservatively as public without the LTO link time assertion provided by the proposed new link option that we can safely treat public classes as hidden due to the link mode.

Teresa 




От: Teresa Johnson <[hidden email]>
Отправлено: 11 декабря 2019 г. 17:21
Кому: llvm-dev
Копия: Peter Collingbourne; Steven Wu; Evgeny Leviant; David Li
Тема: RFC: Safe Whole Program Devirtualization Enablement
 

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.  If you suspect potential phishing or spam email, report it to [hidden email]

Please send any comments. As mentioned at the end I will follow up with some patches as soon as they are cleaned up and I create some test cases.

RFC: Safe Whole Program Devirtualization Enablement
===================================================

High Level Summary
------------------

The goal of the changes described in this RFC is to support aggressive Whole Program Devirtualization without requiring -fvisibility=hidden at compile time, by pre-enabling bitcode for whole program devirtualization, but delaying the decision on whether to apply devirtualization until LTO link time. This is needed both because we may not know whether the link mode is safe for hidden LTO visibility until link time, and also to allow bitcode objects to be shared between links of targets with differing valid LTO visibility. This utilizes the !vcall_visibility metadata added for Dead Virtual Function Elimination.

The summary of changes required are (these are described in more detail later):

1) When -fwhole-program-vtables is specified, always insert type test assumes for virtual calls, and additionally add !vcall_visibility metadata to vtable definitions (which will be summarized in the ThinLTO index).

2) At LTO link time, apply hidden LTO visibility to vtable definition vcall_visibility metadata (or summary) when specified by a new link option (-lto-whole-program-visibility).

3) During the LTO link time Whole Program Devirtualization analysis, only allow devirtualization when the associated vtable definitions have hidden LTO visibility, as derived from the !vcall_visibility metadata (summarized in the index for index-only WPD).

4) Modify the Virtual Function Elimination application in GlobalDCE to ignore vtables with !vcall_visibility when they are associated with type tests (and not just type checked loads).

Background
----------

Whole Program Devirtualization is supported for LTO (both regular and Thin) via the -fwhole-program-vtables option. However, it can only be safely applied to classes for which LTO can analyze the entire class hierarchy, and therefore is restricted to those classes with hidden LTO visibility. See https://clang.llvm.org/docs/LTOVisibility.html for more information.

The LTO visibility of a class is derived at compile time from the class’s symbol visibility. Generally, only classes that are internal at the source level (e.g. declared in an anonymous namespace) receive hidden LTO visibility. Compiling with -fvisibility=hidden tells the compiler that, unless otherwise marked, symbols are assumed to have hidden visibility, which also implies that all classes have hidden LTO visibility (unless decorated with a public visibility attribute). This results in much more aggressive devirtualization.

However, compiling with -fvisibility=hidden is only safe when we know we are LTO linking with full view of the class hierarchy. Specifically, this is true when a binary is being LTO linked with either all sources being bitcode (so that the LTO unit is the same as the linkage unit), or when the only translation units being linked as native code are known to not derive any classes defined in the LTO unit (e.g. system libraries). Additionally, the binary may not dlopen any libraries at runtime that contain classes derived from those defined in the main binary.

Assuming we are building and linking a binary that satisfies the above constraints (we are LTO linking all translation units as bitcode, except certain (e.g. system) libraries or other native objects known to be safe by the user or build system, and the binary will not dlopen any libraries deriving from the binary’s classes), then it should be safe to compile with -fvisibility=hidden, along with -fwhole-program-vtables.

However, there are cases where it is unknown until link time whether we are building a target that meets the above constraints. Additionally, we may want to build additional targets that do not meet the criteria for safe application of -fvisibility=hidden during the same build invocation (specifically, because subsets of the code will be linked into shared libraries instead of linking all code directly into the binary). Even if possible to build two sets of bitcode object files (one with default visibility for the unsafely linked targets and one with hidden visibility for the safely linked targets), this causes duplication in both time and space, which is prohibitive in an environment where it is common to build targets with tens of thousands of sources, and multiple targets with different link modes simultaneously.

The goals of the changes described in this RFC are to essentially delay the application 
of -fvisibility=hidden until LTO link time, and allow bitcode objects to be shared between links of targets with differing link modes and therefore differing valid LTO visibility.

Type Information for Devirtualization
-------------------------------------

LTO whole program devirtualization is driven off of type information in the IR. This includes type metadata (on vtable definitions), as well as type test intrinsics before virtual calls. The former is safe to emit into the IR in all cases, but the latter is currently not. The virtual call sites are decorated with an llvm.assume(llvm.type.test(ptr, typeid)) sequence, which drives the LTO analysis of virtual calls. This sequence is an assertion that the given pointer is associated with the given type identifier (https://llvm.org/docs/LangRef.html#llvm-type-test-intrinsic). It is currently inserted only for classes with hidden LTO visibility as the implication of this sequence is that we have full visibility of that type’s class hierarchy, and may devirtualize the call based on that knowledge. This assumption is not valid if the class does not have hidden LTO visibility.

In order to drive later devirtualization, we still need the type compatibility information provided by the llvm.type.test, but want to delay a decision on whether it is valid to assume that we have full class hierarchy visibility, and thus whether devirtualization of that target can be safely applied.

Specifically, what we want to know at LTO time is whether the vtable has hidden LTO visibility or not, and use that to guide the application of devirtualization to the type tested virtual call sites. By default, only those with statically guaranteed hidden LTO visibility should be marked as such. And as described later, at LTO link time we can optionally decide to convert vtables to hidden LTO visibility for more aggressive devirtualization when appropriate.

There is already a mechanism in the compiler to describe the vtable visibility, which was recently added for Dead Virtual Function Elimination (D63932): !vcall_visibility metadata, documented at https://llvm.org/docs/TypeMetadata.html#vcall-visibility-metadata. This metadata is attached to vtable definitions, currently only when VFE is enabled. As described in the documentation, because this is currently only used for VFE, it also requires that the corresponding function pointer loads use the llvm.type.checked.load intrinsic. This would not be required for devirtualization (although the VFE support in GlobalDCE will need modification to ignore the metadata when type checked loads not used, more on that later).

This RFC proposes adding the !vcall_visibility metadata to vtable definitions when -fwhole-program-vtables is specified. Unlike for VFE, the function pointer loads can still use normal loads with corresponding type test assume sequences (better for optimization).

Additional changes to the LTO compilation steps are detailed below.

Pre-Link LTO Compile
--------------------

First, type test assume sequences will be inserted when -fwhole-program-vtables is specified, and not just for classes with hidden LTO visibility.

Second, as mentioned earlier, the !vcall_visibility metadata will be inserted under -fwhole-program-vtables. For the purposes of index-only WPD, a single-bit flag indicating whether or not the vtable def has hidden LTO visibility is added to the GVarFlags on the GlobalVarSummary. Note that we can collapse the 3 enum values of the metadata down to a single bit, because for the purposes of devirtualization, both VCallVisibilityLinkageUnit and VCallVisibilityTranslationUnit can be treated the same (we only need to have at least VCallVisibilityLinkageUnit to devirtualize). The ModuleSummaryIndex builder will set this new flag from the !vcall_visibility metadata on vtable definitions.

Finally, the VFE support in GlobalDCE (which is enabled by default and currently triggers automatically in the presence of this metadata), will need to be modified to ignore !vcall_visibility metadata inserted for devirtualization only, i.e. when there are any type test assume sequences for that Type ID. This should be straightforward, as we can scan the type tests and remove any vtables decorated with compatible type ids from VFESafeVTables. Note that this change will affect the invocation of GlobalDCE both here in the pre-link LTO compile as well as later in the LTO Backend (where it is applied to a broader set of vtables).

LTO Link Handling
-----------------

During Whole Program Devirtualization analysis, when looking at the vtables corresponding to the summarized virtual calls during tryFindVirtualCallTargets, we must consult the vcall_visibility information. For hybrid (regular+thin) LTO, the vtable definitions are in the regular LTO partition and so the IR can be consulted directly. For index-only WPD, we instead consult the flag on the vtable’s GlobalVarSummary.

If any of the vtable definitions compatible with a given virtual call have public LTO visibility, the devirtualization must be skipped.

By default, only classes that have statically determined hidden LTO visibility would be allowed to devirtualize. However, as noted earlier, we want to enable more aggressive devirtualization at LTO link time when we know that the linking mode guarantees full LTO visibility of any code that may derive classes from the bitcode being linked. To do so, we will add a new linker option:

For lld, the proposed option is: -lto-whole-program-visibility.
For gold, the corresponding plugin option would be “whole-program-visibility”.

When this option is set, LTO will convert all vtable definitions to have hidden LTO visibility before invoking Whole Program Devirtualization. In the hybrid LTO case this would mean changing the metadata on the IR. In the index-only case this would be done in the summaries.

LTO Backend Handling
--------------------

No changes are required in the LTO backend’s invocation of Whole Program Devirtualization, since any visibility constraints are enforced at LTO link time, and the loosening of visibility under the new link option only needs to affect the LTO WPD invocation.

As mentioned earlier when describing the pre-link LTO compile changes, GlobalDCE will be changed to ignore vtables with !vcall_visibility metadata corresponding to type tests (and not just type checked loads).

Status
------

These changes have been prototyped and tested with index-only WPD (with the exception of the proposed changes to GlobalDCE, at the moment I have been testing with -enable-vfe=false). I will be cleaning up the changes and sending patches for review in the coming days.


--
Teresa Johnson | Software Engineer | [hidden email] |


--
Teresa Johnson | Software Engineer | [hidden email] |

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Safe Whole Program Devirtualization Enablement

Hiroshi Yamauchi via llvm-dev
In reply to this post by Hiroshi Yamauchi via llvm-dev
(cc list this time)

Hi Teresa,

Apologies if this has been discussed before but ...

> The LTO visibility of a class is derived at compile time from the class’s symbol visibility.
> Generally, only classes that are internal at the source level (e.g. declared in an anonymous namespace) receive hidden LTO visibility.
> Compiling with -fvisibility=hidden tells the compiler that, unless
> otherwise marked, symbols are assumed to have hidden visibility, which
> also implies that all classes have hidden LTO visibility (unless decorated with a public visibility attribute).
> This results in much more aggressive devirtualization.

Note that by default, unlike GCC, LLVM is liberal on visibility-constrained optimizations. In particular it freely performs inlining, IPA and cloning on them (see https://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html which also suggested adding -fsemantic-interposition to actually respect visibility in optimizations). It's unclear why devirtualization should behave differently than other optimizations (at least by default).

-I

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Safe Whole Program Devirtualization Enablement

Hiroshi Yamauchi via llvm-dev


On Tue, Dec 17, 2019 at 7:36 AM Iurii Gribov <[hidden email]> wrote:
(cc list this time)

Hi Teresa,

Apologies if this has been discussed before but ...

> The LTO visibility of a class is derived at compile time from the class’s symbol visibility.
> Generally, only classes that are internal at the source level (e.g. declared in an anonymous namespace) receive hidden LTO visibility.
> Compiling with -fvisibility=hidden tells the compiler that, unless
> otherwise marked, symbols are assumed to have hidden visibility, which
> also implies that all classes have hidden LTO visibility (unless decorated with a public visibility attribute).
> This results in much more aggressive devirtualization.

Note that by default, unlike GCC, LLVM is liberal on visibility-constrained optimizations. In particular it freely performs inlining, IPA and cloning on them (see https://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html which also suggested adding -fsemantic-interposition to actually respect visibility in optimizations). It's unclear why devirtualization should behave differently than other optimizations (at least by default).

Are you suggesting that we should be more aggressive by default (i.e. without -fvisibility=hidden or any new options)? I believe that will be too aggressive for class LTO visibility.  It is common to override a virtual functions across shared library boundaries (e.g. a test may override a virtual function from a shared library with a mock class). But with what I am proposing we will assume it is safe under the proposed LTO link option, which should be applied when linking statically other than e.g. system libraries.

Thanks,
Teresa


-I



--
Teresa Johnson | Software Engineer | [hidden email] |

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Safe Whole Program Devirtualization Enablement

Hiroshi Yamauchi via llvm-dev
> Are you suggesting that we should be more aggressive by default (i.e. without -fvisibility=hidden or any new options)?
> I believe that will be too aggressive for class LTO visibility.
> It is common to override a virtual functions across shared library boundaries

I'm myself a fan of GCC's conservative approach so I'd agree with you on that. But regardless of my personal opinions, LLVM already optimizes normal functions in shared libraries (thus mercilessly breaking overrides via LD_PRELOAD, etc.) and I don't see how virtual functions are any different. I think default LLVM behavior needs to be consistent for all inter-procedural optimizations (be it inlining, devirtualization or cloning).

Maybe it's time to resurrect Hal's -fsemantic-interposition flag and use it consistently throughout compiler? Users who need GCC-like semantics will be able to employ this flag to prevent unsafe optimizations.

-I
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Safe Whole Program Devirtualization Enablement

Hiroshi Yamauchi via llvm-dev
(Readding Hal)

> Are you suggesting that we should be more aggressive by default (i.e. without -fvisibility=hidden or any new options)?
> I believe that will be too aggressive for class LTO visibility.
> It is common to override a virtual functions across shared library boundaries

I'm myself a fan of GCC's conservative approach so I'd agree with you on that. But regardless of my personal opinions, LLVM already optimizes normal functions in shared libraries (thus mercilessly breaking overrides via LD_PRELOAD, etc.) and I don't see how virtual functions are any different. I think default LLVM behavior needs to be consistent for all inter-procedural optimizations (be it inlining, devirtualization or cloning).

Maybe it's time to resurrect Hal's -fsemantic-interposition flag and use it consistently throughout compiler? Users who need GCC-like semantics will be able to employ this flag to prevent unsafe optimizations.

-I
       
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Safe Whole Program Devirtualization Enablement

Hiroshi Yamauchi via llvm-dev
On Wed, Dec 18, 2019 at 3:38 AM Iurii Gribov via llvm-dev <[hidden email]> wrote:
(Readding Hal)

> Are you suggesting that we should be more aggressive by default (i.e. without -fvisibility=hidden or any new options)?
> I believe that will be too aggressive for class LTO visibility.
> It is common to override a virtual functions across shared library boundaries

I'm myself a fan of GCC's conservative approach so I'd agree with you on that. But regardless of my personal opinions, LLVM already optimizes normal functions in shared libraries (thus mercilessly breaking overrides via LD_PRELOAD, etc.) and I don't see how virtual functions are any different. I think default LLVM behavior needs to be consistent for all inter-procedural optimizations (be it inlining, devirtualization or cloning).

Maybe it's time to resurrect Hal's -fsemantic-interposition flag and use it consistently throughout compiler? Users who need GCC-like semantics will be able to employ this flag to prevent unsafe optimizations.

-fsemantic-interposition controls whether the compiler may assume that symbols are not interposed, and it has nothing to do with the optimization proposed here. The concern here is whether the user may derive from a class defined in another shared object, and that does not involve symbol interposition.

Peter


-I

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Safe Whole Program Devirtualization Enablement

Hiroshi Yamauchi via llvm-dev
> -fsemantic-interposition controls whether the compiler may assume that symbols are not interposed,
> and it has nothing to do with the optimization proposed here.

Thanks Peter, you are probably right. I've overestimated the information that's available to LTO optimizer at link time.

Leaving shared libraries aside, one might argue that when LTOptimizing main executable file compiler/linker is aware of all participating libraries and so can decide whether class is not derived from and apply devirtualization based on that information (assuming that run-time library implementations or dlopen calls do not introduce new inherited classes at runtime, that's where -fno-semantic-interposition assumption comes into play). But the "decide whether class is not derived from" part here is problematic - derived classes in libraries may have hidden visibility and will go undetected.

-I
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Safe Whole Program Devirtualization Enablement

Hiroshi Yamauchi via llvm-dev


On Thu, Dec 19, 2019 at 1:27 AM Iurii Gribov <[hidden email]> wrote:
> -fsemantic-interposition controls whether the compiler may assume that symbols are not interposed,
> and it has nothing to do with the optimization proposed here.

Thanks Peter, you are probably right. I've overestimated the information that's available to LTO optimizer at link time.

Leaving shared libraries aside, one might argue that when LTOptimizing main executable file compiler/linker is aware of all participating libraries and so can decide whether class is not derived from and apply devirtualization based on that information (assuming that run-time library implementations or dlopen calls do not introduce new inherited classes at runtime, that's where -fno-semantic-interposition assumption comes into play). But the "decide whether class is not derived from" part here is problematic - derived classes in libraries may have hidden visibility and will go undetected.

Right, this is why we need some guarantee from the user that the LTO link will see all derived classes. Currently at head the only way to do so is via -fvisibility=hidden when compiling to bitcode. This proposal adds another mechanism, pre-enabling the bitcode for WPD and delaying the guarantee until LTO link time with an option there.

Thanks,
Teresa


-I


--
Teresa Johnson | Software Engineer | [hidden email] |

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Safe Whole Program Devirtualization Enablement

Hiroshi Yamauchi via llvm-dev
In reply to this post by Hiroshi Yamauchi via llvm-dev
FYI I mailed 3 patches this morning that together implement the RFC. PTAL:

D71907: [WPD/VFE] Always emit vcall_visibility metadata for -fwhole-program-vtables
D71911: [ThinLTO] Summarize vcall_visibility metadata
D71913: [LTO/WPD] Enable aggressive WPD under LTO option

Teresa

On Wed, Dec 11, 2019 at 6:21 AM Teresa Johnson <[hidden email]> wrote:
Please send any comments. As mentioned at the end I will follow up with some patches as soon as they are cleaned up and I create some test cases.

RFC: Safe Whole Program Devirtualization Enablement
===================================================

High Level Summary
------------------

The goal of the changes described in this RFC is to support aggressive Whole Program Devirtualization without requiring -fvisibility=hidden at compile time, by pre-enabling bitcode for whole program devirtualization, but delaying the decision on whether to apply devirtualization until LTO link time. This is needed both because we may not know whether the link mode is safe for hidden LTO visibility until link time, and also to allow bitcode objects to be shared between links of targets with differing valid LTO visibility. This utilizes the !vcall_visibility metadata added for Dead Virtual Function Elimination.

The summary of changes required are (these are described in more detail later):

1) When -fwhole-program-vtables is specified, always insert type test assumes for virtual calls, and additionally add !vcall_visibility metadata to vtable definitions (which will be summarized in the ThinLTO index).

2) At LTO link time, apply hidden LTO visibility to vtable definition vcall_visibility metadata (or summary) when specified by a new link option (-lto-whole-program-visibility).

3) During the LTO link time Whole Program Devirtualization analysis, only allow devirtualization when the associated vtable definitions have hidden LTO visibility, as derived from the !vcall_visibility metadata (summarized in the index for index-only WPD).

4) Modify the Virtual Function Elimination application in GlobalDCE to ignore vtables with !vcall_visibility when they are associated with type tests (and not just type checked loads).

Background
----------

Whole Program Devirtualization is supported for LTO (both regular and Thin) via the -fwhole-program-vtables option. However, it can only be safely applied to classes for which LTO can analyze the entire class hierarchy, and therefore is restricted to those classes with hidden LTO visibility. See https://clang.llvm.org/docs/LTOVisibility.html for more information.

The LTO visibility of a class is derived at compile time from the class’s symbol visibility. Generally, only classes that are internal at the source level (e.g. declared in an anonymous namespace) receive hidden LTO visibility. Compiling with -fvisibility=hidden tells the compiler that, unless otherwise marked, symbols are assumed to have hidden visibility, which also implies that all classes have hidden LTO visibility (unless decorated with a public visibility attribute). This results in much more aggressive devirtualization.

However, compiling with -fvisibility=hidden is only safe when we know we are LTO linking with full view of the class hierarchy. Specifically, this is true when a binary is being LTO linked with either all sources being bitcode (so that the LTO unit is the same as the linkage unit), or when the only translation units being linked as native code are known to not derive any classes defined in the LTO unit (e.g. system libraries). Additionally, the binary may not dlopen any libraries at runtime that contain classes derived from those defined in the main binary.

Assuming we are building and linking a binary that satisfies the above constraints (we are LTO linking all translation units as bitcode, except certain (e.g. system) libraries or other native objects known to be safe by the user or build system, and the binary will not dlopen any libraries deriving from the binary’s classes), then it should be safe to compile with -fvisibility=hidden, along with -fwhole-program-vtables.

However, there are cases where it is unknown until link time whether we are building a target that meets the above constraints. Additionally, we may want to build additional targets that do not meet the criteria for safe application of -fvisibility=hidden during the same build invocation (specifically, because subsets of the code will be linked into shared libraries instead of linking all code directly into the binary). Even if possible to build two sets of bitcode object files (one with default visibility for the unsafely linked targets and one with hidden visibility for the safely linked targets), this causes duplication in both time and space, which is prohibitive in an environment where it is common to build targets with tens of thousands of sources, and multiple targets with different link modes simultaneously.

The goals of the changes described in this RFC are to essentially delay the application 
of -fvisibility=hidden until LTO link time, and allow bitcode objects to be shared between links of targets with differing link modes and therefore differing valid LTO visibility.

Type Information for Devirtualization
-------------------------------------

LTO whole program devirtualization is driven off of type information in the IR. This includes type metadata (on vtable definitions), as well as type test intrinsics before virtual calls. The former is safe to emit into the IR in all cases, but the latter is currently not. The virtual call sites are decorated with an llvm.assume(llvm.type.test(ptr, typeid)) sequence, which drives the LTO analysis of virtual calls. This sequence is an assertion that the given pointer is associated with the given type identifier (https://llvm.org/docs/LangRef.html#llvm-type-test-intrinsic). It is currently inserted only for classes with hidden LTO visibility as the implication of this sequence is that we have full visibility of that type’s class hierarchy, and may devirtualize the call based on that knowledge. This assumption is not valid if the class does not have hidden LTO visibility.

In order to drive later devirtualization, we still need the type compatibility information provided by the llvm.type.test, but want to delay a decision on whether it is valid to assume that we have full class hierarchy visibility, and thus whether devirtualization of that target can be safely applied.

Specifically, what we want to know at LTO time is whether the vtable has hidden LTO visibility or not, and use that to guide the application of devirtualization to the type tested virtual call sites. By default, only those with statically guaranteed hidden LTO visibility should be marked as such. And as described later, at LTO link time we can optionally decide to convert vtables to hidden LTO visibility for more aggressive devirtualization when appropriate.

There is already a mechanism in the compiler to describe the vtable visibility, which was recently added for Dead Virtual Function Elimination (D63932): !vcall_visibility metadata, documented at https://llvm.org/docs/TypeMetadata.html#vcall-visibility-metadata. This metadata is attached to vtable definitions, currently only when VFE is enabled. As described in the documentation, because this is currently only used for VFE, it also requires that the corresponding function pointer loads use the llvm.type.checked.load intrinsic. This would not be required for devirtualization (although the VFE support in GlobalDCE will need modification to ignore the metadata when type checked loads not used, more on that later).

This RFC proposes adding the !vcall_visibility metadata to vtable definitions when -fwhole-program-vtables is specified. Unlike for VFE, the function pointer loads can still use normal loads with corresponding type test assume sequences (better for optimization).

Additional changes to the LTO compilation steps are detailed below.

Pre-Link LTO Compile
--------------------

First, type test assume sequences will be inserted when -fwhole-program-vtables is specified, and not just for classes with hidden LTO visibility.

Second, as mentioned earlier, the !vcall_visibility metadata will be inserted under -fwhole-program-vtables. For the purposes of index-only WPD, a single-bit flag indicating whether or not the vtable def has hidden LTO visibility is added to the GVarFlags on the GlobalVarSummary. Note that we can collapse the 3 enum values of the metadata down to a single bit, because for the purposes of devirtualization, both VCallVisibilityLinkageUnit and VCallVisibilityTranslationUnit can be treated the same (we only need to have at least VCallVisibilityLinkageUnit to devirtualize). The ModuleSummaryIndex builder will set this new flag from the !vcall_visibility metadata on vtable definitions.

Finally, the VFE support in GlobalDCE (which is enabled by default and currently triggers automatically in the presence of this metadata), will need to be modified to ignore !vcall_visibility metadata inserted for devirtualization only, i.e. when there are any type test assume sequences for that Type ID. This should be straightforward, as we can scan the type tests and remove any vtables decorated with compatible type ids from VFESafeVTables. Note that this change will affect the invocation of GlobalDCE both here in the pre-link LTO compile as well as later in the LTO Backend (where it is applied to a broader set of vtables).

LTO Link Handling
-----------------

During Whole Program Devirtualization analysis, when looking at the vtables corresponding to the summarized virtual calls during tryFindVirtualCallTargets, we must consult the vcall_visibility information. For hybrid (regular+thin) LTO, the vtable definitions are in the regular LTO partition and so the IR can be consulted directly. For index-only WPD, we instead consult the flag on the vtable’s GlobalVarSummary.

If any of the vtable definitions compatible with a given virtual call have public LTO visibility, the devirtualization must be skipped.

By default, only classes that have statically determined hidden LTO visibility would be allowed to devirtualize. However, as noted earlier, we want to enable more aggressive devirtualization at LTO link time when we know that the linking mode guarantees full LTO visibility of any code that may derive classes from the bitcode being linked. To do so, we will add a new linker option:

For lld, the proposed option is: -lto-whole-program-visibility.
For gold, the corresponding plugin option would be “whole-program-visibility”.

When this option is set, LTO will convert all vtable definitions to have hidden LTO visibility before invoking Whole Program Devirtualization. In the hybrid LTO case this would mean changing the metadata on the IR. In the index-only case this would be done in the summaries.

LTO Backend Handling
--------------------

No changes are required in the LTO backend’s invocation of Whole Program Devirtualization, since any visibility constraints are enforced at LTO link time, and the loosening of visibility under the new link option only needs to affect the LTO WPD invocation.

As mentioned earlier when describing the pre-link LTO compile changes, GlobalDCE will be changed to ignore vtables with !vcall_visibility metadata corresponding to type tests (and not just type checked loads).

Status
------

These changes have been prototyped and tested with index-only WPD (with the exception of the proposed changes to GlobalDCE, at the moment I have been testing with -enable-vfe=false). I will be cleaning up the changes and sending patches for review in the coming days.


--
Teresa Johnson | Software Engineer | [hidden email] |


--
Teresa Johnson | Software Engineer | [hidden email] |

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev