[llvm-dev] Proposal for address-significance tables for --icf=safe

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] Proposal for address-significance tables for --icf=safe

U.Mutlu via llvm-dev
Hi all,

Context: ld.gold has an --icf=safe flag which is intended to apply ICF only to sections which can be safely merged according to the guarantees provided by the language. It works using a set of heuristics (symbol name matching and relocation scanning). That's not only imprecise but it only works with certain languages and is slow due to the need to demangle symbols and scan relocations. It's also redundant with the (local_)unnamed_addr analysis already performed by LLVM.

I implemented an alternative to this approach in clang and lld. It works by adding a section to each object file containing the indexes of the symbols which are address-significant (i.e. not (local_)unnamed_addr in IR).

I used this implementation to link clang with release+asserts with each of --icf={none,safe,all}. The binary sizes were:

none: 109407184
safe: 108534736 (-0.8%)
all: 107281360 (-2%)

I measured the object file overhead of these sections in my clang build at 0.08%. That's almost nothing, and I think it's small enough that we can turn it on by default.

I've uploaded a patch series for this feature here: https://github.com/pcc/llvm-project/tree/llvm-addrsig
I intend to start sending it for review soon.

Thanks,
--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Proposal for address-significance tables for --icf=safe

U.Mutlu via llvm-dev
Hello,

I think that the approach of using a section to record address
significance is a good one. I'm guessing it will have its own section
type and format? If it does would it make sense to try and submit this
to the GABI https://groups.google.com/forum/#!forum/generic-abi as it
could be potentially useful for other linkers, for example gold?

Happy to help out with reviews.

Peter


On 22 May 2018 at 23:06, Peter Collingbourne via llvm-dev
<[hidden email]> wrote:

> Hi all,
>
> Context: ld.gold has an --icf=safe flag which is intended to apply ICF only
> to sections which can be safely merged according to the guarantees provided
> by the language. It works using a set of heuristics (symbol name matching
> and relocation scanning). That's not only imprecise but it only works with
> certain languages and is slow due to the need to demangle symbols and scan
> relocations. It's also redundant with the (local_)unnamed_addr analysis
> already performed by LLVM.
>
> I implemented an alternative to this approach in clang and lld. It works by
> adding a section to each object file containing the indexes of the symbols
> which are address-significant (i.e. not (local_)unnamed_addr in IR).
>
> I used this implementation to link clang with release+asserts with each of
> --icf={none,safe,all}. The binary sizes were:
>
> none: 109407184
> safe: 108534736 (-0.8%)
> all: 107281360 (-2%)
>
> I measured the object file overhead of these sections in my clang build at
> 0.08%. That's almost nothing, and I think it's small enough that we can turn
> it on by default.
>
> I've uploaded a patch series for this feature here:
> https://github.com/pcc/llvm-project/tree/llvm-addrsig
> I intend to start sending it for review soon.
>
> Thanks,
> --
> --
> Peter
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Proposal for address-significance tables for --icf=safe

U.Mutlu via llvm-dev
In reply to this post by U.Mutlu via llvm-dev
On Wed, May 23, 2018 at 12:06 AM, Peter Collingbourne via llvm-dev
<[hidden email]> wrote:

> Hi all,
>
> Context: ld.gold has an --icf=safe flag which is intended to apply ICF only
> to sections which can be safely merged according to the guarantees provided
> by the language. It works using a set of heuristics (symbol name matching
> and relocation scanning). That's not only imprecise but it only works with
> certain languages and is slow due to the need to demangle symbols and scan
> relocations. It's also redundant with the (local_)unnamed_addr analysis
> already performed by LLVM.
>
> I implemented an alternative to this approach in clang and lld. It works by
> adding a section to each object file containing the indexes of the symbols
> which are address-significant (i.e. not (local_)unnamed_addr in IR).
>
> I used this implementation to link clang with release+asserts with each of
> --icf={none,safe,all}. The binary sizes were:
>
> none: 109407184
> safe: 108534736 (-0.8%)
> all: 107281360 (-2%)
>
> I measured the object file overhead of these sections in my clang build at
> 0.08%. That's almost nothing, and I think it's small enough that we can turn
> it on by default.
>
> I've uploaded a patch series for this feature here:
> https://github.com/pcc/llvm-project/tree/llvm-addrsig
> I intend to start sending it for review soon.

Very nice!

I was going to ask how this plays with object files compiled with
something other than LLVM, but now I see you assume all symbols are
address-significant if there's no address-significance table in the
file. It all seems very sensible to me :-)
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Proposal for address-significance tables for --icf=safe

U.Mutlu via llvm-dev
In reply to this post by U.Mutlu via llvm-dev
On Tue, May 22, 2018 at 6:06 PM Peter Collingbourne via llvm-dev <[hidden email]> wrote:
Hi all,

Context: ld.gold has an --icf=safe flag which is intended to apply ICF only to sections which can be safely merged according to the guarantees provided by the language. It works using a set of heuristics (symbol name matching and relocation scanning). That's not only imprecise but it only works with certain languages and is slow due to the need to demangle symbols and scan relocations. It's also redundant with the (local_)unnamed_addr analysis already performed by LLVM.

I implemented an alternative to this approach in clang and lld. It works by adding a section to each object file containing the indexes of the symbols which are address-significant (i.e. not (local_)unnamed_addr in IR).

I used this implementation to link clang with release+asserts with each of --icf={none,safe,all}. The binary sizes were:

none: 109407184
safe: 108534736 (-0.8%)
all: 107281360 (-2%)

I measured the object file overhead of these sections in my clang build at 0.08%. That's almost nothing, and I think it's small enough that we can turn it on by default.

I've uploaded a patch series for this feature here: https://github.com/pcc/llvm-project/tree/llvm-addrsig
I intend to start sending it for review soon.

This sounds like a nice idea, but it'd be great to put in some effort to see if we can get this done in a cross-toolchain collaborative manner, instead of llvm-specific. The need is clearly generic, after all.

I'm a bit worried of the scheme of emitting symbol indexes into a section. AFAIK there is nothing else in ELF which puts symbol indexes in data at the moment (only in relocations and section headers). In particular, if anyone were to use a tool which rewrites the symbol table, it'll break things, unless that tool knows about this special section.

I wonder if it's possible to put the data in the symbol table. Unfortunately, there's not a whole lot of available space there...

The "st_other" field has some space available -- only the bottom 2 bits are currently used in general for visibility, so one could imagine adding a flag at 0x04, perhaps, for this indicator. Unfortunately, the bits of this field are widely used by a variety of architecture-specific things, which makes that rather more complicated. MIPS uses almost all the remaining bits in that field, but seemingly not bit 3. PowerPC uses bits 5-8 for the local-call optimization, leaving bits 3-4. Alpha uses bits 4 and 8 for what I think may be a similar optimization. IA64 OpenVMS uses bits 5-8 for additional function-type and linkage annotations. m68k uses bits 7-8 for identifying "far" functions and interrupt handlers.

So...that might be viable -- just barely --  but even after suggesting that, I'd not really want to argue for it. =)


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Proposal for address-significance tables for --icf=safe

U.Mutlu via llvm-dev


On Wed, May 23, 2018 at 8:15 AM, James Y Knight <[hidden email]> wrote:
On Tue, May 22, 2018 at 6:06 PM Peter Collingbourne via llvm-dev <[hidden email]> wrote:
Hi all,

Context: ld.gold has an --icf=safe flag which is intended to apply ICF only to sections which can be safely merged according to the guarantees provided by the language. It works using a set of heuristics (symbol name matching and relocation scanning). That's not only imprecise but it only works with certain languages and is slow due to the need to demangle symbols and scan relocations. It's also redundant with the (local_)unnamed_addr analysis already performed by LLVM.

I implemented an alternative to this approach in clang and lld. It works by adding a section to each object file containing the indexes of the symbols which are address-significant (i.e. not (local_)unnamed_addr in IR).

I used this implementation to link clang with release+asserts with each of --icf={none,safe,all}. The binary sizes were:

none: 109407184
safe: 108534736 (-0.8%)
all: 107281360 (-2%)

I measured the object file overhead of these sections in my clang build at 0.08%. That's almost nothing, and I think it's small enough that we can turn it on by default.

I've uploaded a patch series for this feature here: https://github.com/pcc/llvm-project/tree/llvm-addrsig
I intend to start sending it for review soon.

This sounds like a nice idea, but it'd be great to put in some effort to see if we can get this done in a cross-toolchain collaborative manner, instead of llvm-specific. The need is clearly generic, after all.


Peter Smith has suggested making a proposal to generic-abi and that certainly seems reasonable, but there seems to be a practical problem there: the generic-abi is unmaintained and there doesn't seem to be an authority responsible for assigning section numbers (see the recent SHT_RELR thread for example). Since this proposal requires a new section number I would suggest that we make the proposal to generic-abi, incorporate any design feedback from there and proceed with a section number in the LLVM namespace until the generic-abi gets a maintainer, at which point we can change lld to accept both section numbers or maybe just the generic one (since reading the section is optional).

I'm a bit worried of the scheme of emitting symbol indexes into a section. AFAIK there is nothing else in ELF which puts symbol indexes in data at the moment (only in relocations and section headers). In particular, if anyone were to use a tool which rewrites the symbol table, it'll break things, unless that tool knows about this special section.

The design accounts for this :)

To begin with, in practice I don't think we can get this right for every conceivable tool, because the tool could put something in the object file that would invalidate the metadata by making a symbol address-significant. For example, I can use ld -r to combine a metadata-containing object file defining a function foo with a non-metadata-containing object file defining a function bar that returns the address of foo, which would invalidate the metadata for foo. So I think the best that we can hope for is to arrange for most tools to "naturally" invalidate the metadata.

There turns out to be a way to do this: most tools will reset the sh_link field of an unrecognized section to zero if that sh_link field points to the .symtab section. GNU objcopy, ld.bfd -r, ld.gold -r and ld.lld -r all do this. (It looks like llvm-objcopy will preserve the sh_link, but we can fix that.) So what we can do is make the sh_link in our section point to .symtab and use sh_link=0 as a signal that a tool has operated on the object file, and therefore ignore the section. This resetting of sh_link for unrecognized sections doesn't appear to be required by the generic-abi, so this is probably something that we'd want to bring up there in addition to the section itself. Also, this should hopefully become a non-problem once this proposal makes its way into either the generic ABI or GNU-gABI and tools learn about the section.


I wonder if it's possible to put the data in the symbol table. Unfortunately, there's not a whole lot of available space there...

The "st_other" field has some space available -- only the bottom 2 bits are currently used in general for visibility, so one could imagine adding a flag at 0x04, perhaps, for this indicator. Unfortunately, the bits of this field are widely used by a variety of architecture-specific things, which makes that rather more complicated. MIPS uses almost all the remaining bits in that field, but seemingly not bit 3. PowerPC uses bits 5-8 for the local-call optimization, leaving bits 3-4. Alpha uses bits 4 and 8 for what I think may be a similar optimization. IA64 OpenVMS uses bits 5-8 for additional function-type and linkage annotations. m68k uses bits 7-8 for identifying "far" functions and interrupt handlers.

So...that might be viable -- just barely --  but even after suggesting that, I'd not really want to argue for it. =)


I thought about using st_other for this, but I came to the same conclusion that it would probably be too hard to stake out a bit with the architecture-specific things going on there. From the looks of things it looks like MIPS is (somewhat gratuitously in the case of STO_MIPS_MIPS16) using all of the "unused" bits: http://llvm-cs.pcc.me.uk/include/llvm/BinaryFormat/ELF.h#554
So if we did something with st_other we'd probably need to exclude MIPS and maybe other architectures and get tools to do the right thing (which seems harder than with the sh_link trick).

Thanks,
--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Proposal for address-significance tables for --icf=safe

U.Mutlu via llvm-dev
In reply to this post by U.Mutlu via llvm-dev
On Wed, May 23, 2018 at 3:25 AM, Peter Smith <[hidden email]> wrote:
Hello,

I think that the approach of using a section to record address
significance is a good one. I'm guessing it will have its own section
type and format? If it does would it make sense to try and submit this
to the GABI https://groups.google.com/forum/#!forum/generic-abi as it
could be potentially useful for other linkers, for example gold?

Yes, there is a new section type (SHT_LLVM_ADDRSIG) and format (a sequence of ULEB128-encoded symbol table indexes that are address-significant).

I think it makes sense for this to eventually be part of the generic ABI, and I will send a proposal to generic-abi. As I mentioned in my reply to James Knight, I don't think we should block on getting a section number assignment, but we can at least incorporate any design feedback from that proposal.

Peter



Happy to help out with reviews.

Peter


On 22 May 2018 at 23:06, Peter Collingbourne via llvm-dev
<[hidden email]> wrote:
> Hi all,
>
> Context: ld.gold has an --icf=safe flag which is intended to apply ICF only
> to sections which can be safely merged according to the guarantees provided
> by the language. It works using a set of heuristics (symbol name matching
> and relocation scanning). That's not only imprecise but it only works with
> certain languages and is slow due to the need to demangle symbols and scan
> relocations. It's also redundant with the (local_)unnamed_addr analysis
> already performed by LLVM.
>
> I implemented an alternative to this approach in clang and lld. It works by
> adding a section to each object file containing the indexes of the symbols
> which are address-significant (i.e. not (local_)unnamed_addr in IR).
>
> I used this implementation to link clang with release+asserts with each of
> --icf={none,safe,all}. The binary sizes were:
>
> none: 109407184
> safe: 108534736 (-0.8%)
> all: 107281360 (-2%)
>
> I measured the object file overhead of these sections in my clang build at
> 0.08%. That's almost nothing, and I think it's small enough that we can turn
> it on by default.
>
> I've uploaded a patch series for this feature here:
> https://github.com/pcc/llvm-project/tree/llvm-addrsig
> I intend to start sending it for review soon.
>
> Thanks,
> --
> --
> Peter
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>



--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Proposal for address-significance tables for --icf=safe

U.Mutlu via llvm-dev


On Wed, May 23, 2018 at 12:16 PM, Peter Collingbourne <[hidden email]> wrote:
On Wed, May 23, 2018 at 3:25 AM, Peter Smith <[hidden email]> wrote:
Hello,

I think that the approach of using a section to record address
significance is a good one. I'm guessing it will have its own section
type and format? If it does would it make sense to try and submit this
to the GABI https://groups.google.com/forum/#!forum/generic-abi as it
could be potentially useful for other linkers, for example gold?

Yes, there is a new section type (SHT_LLVM_ADDRSIG) and format (a sequence of ULEB128-encoded symbol table indexes that are address-significant).

I think it makes sense for this to eventually be part of the generic ABI, and I will send a proposal to generic-abi. As I mentioned in my reply to James Knight, I don't think we should block on getting a section number assignment, but we can at least incorporate any design feedback from that proposal.


Peter

Peter



Happy to help out with reviews.

Peter


On 22 May 2018 at 23:06, Peter Collingbourne via llvm-dev
<[hidden email]> wrote:
> Hi all,
>
> Context: ld.gold has an --icf=safe flag which is intended to apply ICF only
> to sections which can be safely merged according to the guarantees provided
> by the language. It works using a set of heuristics (symbol name matching
> and relocation scanning). That's not only imprecise but it only works with
> certain languages and is slow due to the need to demangle symbols and scan
> relocations. It's also redundant with the (local_)unnamed_addr analysis
> already performed by LLVM.
>
> I implemented an alternative to this approach in clang and lld. It works by
> adding a section to each object file containing the indexes of the symbols
> which are address-significant (i.e. not (local_)unnamed_addr in IR).
>
> I used this implementation to link clang with release+asserts with each of
> --icf={none,safe,all}. The binary sizes were:
>
> none: 109407184
> safe: 108534736 (-0.8%)
> all: 107281360 (-2%)
>
> I measured the object file overhead of these sections in my clang build at
> 0.08%. That's almost nothing, and I think it's small enough that we can turn
> it on by default.
>
> I've uploaded a patch series for this feature here:
> https://github.com/pcc/llvm-project/tree/llvm-addrsig
> I intend to start sending it for review soon.
>
> Thanks,
> --
> --
> Peter
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>



--
-- 
Peter



--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Proposal for address-significance tables for --icf=safe

U.Mutlu via llvm-dev
On 23 May 2018 at 23:07, Peter Collingbourne <[hidden email]> wrote:

>
>
> On Wed, May 23, 2018 at 12:16 PM, Peter Collingbourne <[hidden email]>
> wrote:
>>
>> On Wed, May 23, 2018 at 3:25 AM, Peter Smith <[hidden email]>
>> wrote:
>>>
>>> Hello,
>>>
>>> I think that the approach of using a section to record address
>>> significance is a good one. I'm guessing it will have its own section
>>> type and format? If it does would it make sense to try and submit this
>>> to the GABI https://groups.google.com/forum/#!forum/generic-abi as it
>>> could be potentially useful for other linkers, for example gold?
>>
>>
>> Yes, there is a new section type (SHT_LLVM_ADDRSIG) and format (a sequence
>> of ULEB128-encoded symbol table indexes that are address-significant).
>>
>> I think it makes sense for this to eventually be part of the generic ABI,
>> and I will send a proposal to generic-abi. As I mentioned in my reply to
>> James Knight, I don't think we should block on getting a section number
>> assignment, but we can at least incorporate any design feedback from that
>> proposal.
>
>
> I've sent the proposal:
> https://groups.google.com/forum/#!topic/generic-abi/MPr8TVtnVn4
>
> Peter

Thanks. I agree we shouldn't require it to be in the GABI before
proceeding. I think making the proposal more visible to the wider
community will already have been worthwhile.

In the absence of a GABI defined way to tell if an ELF processing tool
may have broken the symbol table order, one possible option might be
to reserve the first entry as some hash of the symbol names that would
be different if the indexes into the symbol table were changed. This
does make it more complicated for an object processing tool to handle
the section though.

Peter
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Proposal for address-significance tables for --icf=safe

U.Mutlu via llvm-dev
In reply to this post by U.Mutlu via llvm-dev
Hi Peter,

Thanks for doing this!

On Wed, May 23, 2018 at 3:07 PM Peter Collingbourne via llvm-dev <[hidden email]> wrote:


On Wed, May 23, 2018 at 12:16 PM, Peter Collingbourne <[hidden email]> wrote:
On Wed, May 23, 2018 at 3:25 AM, Peter Smith <[hidden email]> wrote:
Hello,

I think that the approach of using a section to record address
significance is a good one. I'm guessing it will have its own section
type and format? If it does would it make sense to try and submit this
to the GABI https://groups.google.com/forum/#!forum/generic-abi as it
could be potentially useful for other linkers, for example gold?

Yes, there is a new section type (SHT_LLVM_ADDRSIG) and format (a sequence of ULEB128-encoded symbol table indexes that are address-significant).

I think it makes sense for this to eventually be part of the generic ABI, and I will send a proposal to generic-abi. As I mentioned in my reply to James Knight, I don't think we should block on getting a section number assignment, but we can at least incorporate any design feedback from that proposal.



The proposal looks decent. I'll probably comment more once I (like the others) get a chance to read more deeply. I've also taken a look at the basic patches - one request is that since this is newly standardizing bits that you make sure to comment or even put a detailed standards-esque description of the tables into the code?

-eric


 
Peter

Peter



Happy to help out with reviews.

Peter


On 22 May 2018 at 23:06, Peter Collingbourne via llvm-dev
<[hidden email]> wrote:
> Hi all,
>
> Context: ld.gold has an --icf=safe flag which is intended to apply ICF only
> to sections which can be safely merged according to the guarantees provided
> by the language. It works using a set of heuristics (symbol name matching
> and relocation scanning). That's not only imprecise but it only works with
> certain languages and is slow due to the need to demangle symbols and scan
> relocations. It's also redundant with the (local_)unnamed_addr analysis
> already performed by LLVM.
>
> I implemented an alternative to this approach in clang and lld. It works by
> adding a section to each object file containing the indexes of the symbols
> which are address-significant (i.e. not (local_)unnamed_addr in IR).
>
> I used this implementation to link clang with release+asserts with each of
> --icf={none,safe,all}. The binary sizes were:
>
> none: 109407184
> safe: 108534736 (-0.8%)
> all: 107281360 (-2%)
>
> I measured the object file overhead of these sections in my clang build at
> 0.08%. That's almost nothing, and I think it's small enough that we can turn
> it on by default.
>
> I've uploaded a patch series for this feature here:
> https://github.com/pcc/llvm-project/tree/llvm-addrsig
> I intend to start sending it for review soon.
>
> Thanks,
> --
> --
> Peter
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>



--
-- 
Peter



--
-- 
Peter
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Proposal for address-significance tables for --icf=safe

U.Mutlu via llvm-dev
In reply to this post by U.Mutlu via llvm-dev


On Thu, May 24, 2018 at 2:34 AM, Peter Smith <[hidden email]> wrote:
On 23 May 2018 at 23:07, Peter Collingbourne <[hidden email]> wrote:
>
>
> On Wed, May 23, 2018 at 12:16 PM, Peter Collingbourne <[hidden email]>
> wrote:
>>
>> On Wed, May 23, 2018 at 3:25 AM, Peter Smith <[hidden email]>
>> wrote:
>>>
>>> Hello,
>>>
>>> I think that the approach of using a section to record address
>>> significance is a good one. I'm guessing it will have its own section
>>> type and format? If it does would it make sense to try and submit this
>>> to the GABI https://groups.google.com/forum/#!forum/generic-abi as it
>>> could be potentially useful for other linkers, for example gold?
>>
>>
>> Yes, there is a new section type (SHT_LLVM_ADDRSIG) and format (a sequence
>> of ULEB128-encoded symbol table indexes that are address-significant).
>>
>> I think it makes sense for this to eventually be part of the generic ABI,
>> and I will send a proposal to generic-abi. As I mentioned in my reply to
>> James Knight, I don't think we should block on getting a section number
>> assignment, but we can at least incorporate any design feedback from that
>> proposal.
>
>
> I've sent the proposal:
> https://groups.google.com/forum/#!topic/generic-abi/MPr8TVtnVn4
>
> Peter

Thanks. I agree we shouldn't require it to be in the GABI before
proceeding. I think making the proposal more visible to the wider
community will already have been worthwhile.

In the absence of a GABI defined way to tell if an ELF processing tool
may have broken the symbol table order, one possible option might be
to reserve the first entry as some hash of the symbol names that would
be different if the indexes into the symbol table were changed. This
does make it more complicated for an object processing tool to handle
the section though.

That does seem reasonable if we can't get agreement on the gABI side. It can probably just be a hash of the contents of .symtab plus the contents of the linked .strtab, and we can store it in the addrsig section's sh_info to avoid taking up space in the section.

Thanks,
--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Proposal for address-significance tables for --icf=safe

U.Mutlu via llvm-dev
In reply to this post by U.Mutlu via llvm-dev


On Thu, May 24, 2018 at 10:24 AM, Eric Christopher <[hidden email]> wrote:
Hi Peter,

Thanks for doing this!

On Wed, May 23, 2018 at 3:07 PM Peter Collingbourne via llvm-dev <[hidden email]> wrote:


On Wed, May 23, 2018 at 12:16 PM, Peter Collingbourne <[hidden email]> wrote:
On Wed, May 23, 2018 at 3:25 AM, Peter Smith <[hidden email]> wrote:
Hello,

I think that the approach of using a section to record address
significance is a good one. I'm guessing it will have its own section
type and format? If it does would it make sense to try and submit this
to the GABI https://groups.google.com/forum/#!forum/generic-abi as it
could be potentially useful for other linkers, for example gold?

Yes, there is a new section type (SHT_LLVM_ADDRSIG) and format (a sequence of ULEB128-encoded symbol table indexes that are address-significant).

I think it makes sense for this to eventually be part of the generic ABI, and I will send a proposal to generic-abi. As I mentioned in my reply to James Knight, I don't think we should block on getting a section number assignment, but we can at least incorporate any design feedback from that proposal.



The proposal looks decent. I'll probably comment more once I (like the others) get a chance to read more deeply. I've also taken a look at the basic patches - one request is that since this is newly standardizing bits that you make sure to comment or even put a detailed standards-esque description of the tables into the code?

Sure, I'll document the assembly and object file extensions here: http://llvm.org/docs/Extensions.html
and reference it from the code.

Peter



-eric


 
Peter

Peter



Happy to help out with reviews.

Peter


On 22 May 2018 at 23:06, Peter Collingbourne via llvm-dev
<[hidden email]> wrote:
> Hi all,
>
> Context: ld.gold has an --icf=safe flag which is intended to apply ICF only
> to sections which can be safely merged according to the guarantees provided
> by the language. It works using a set of heuristics (symbol name matching
> and relocation scanning). That's not only imprecise but it only works with
> certain languages and is slow due to the need to demangle symbols and scan
> relocations. It's also redundant with the (local_)unnamed_addr analysis
> already performed by LLVM.
>
> I implemented an alternative to this approach in clang and lld. It works by
> adding a section to each object file containing the indexes of the symbols
> which are address-significant (i.e. not (local_)unnamed_addr in IR).
>
> I used this implementation to link clang with release+asserts with each of
> --icf={none,safe,all}. The binary sizes were:
>
> none: 109407184
> safe: 108534736 (-0.8%)
> all: 107281360 (-2%)
>
> I measured the object file overhead of these sections in my clang build at
> 0.08%. That's almost nothing, and I think it's small enough that we can turn
> it on by default.
>
> I've uploaded a patch series for this feature here:
> https://github.com/pcc/llvm-project/tree/llvm-addrsig
> I intend to start sending it for review soon.
>
> Thanks,
> --
> --
> Peter
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>



--
-- 
Peter



--
-- 
Peter
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Proposal for address-significance tables for --icf=safe

U.Mutlu via llvm-dev
Great, thanks!

On Thu, May 24, 2018, 11:02 AM Peter Collingbourne <[hidden email]> wrote:


On Thu, May 24, 2018 at 10:24 AM, Eric Christopher <[hidden email]> wrote:
Hi Peter,

Thanks for doing this!

On Wed, May 23, 2018 at 3:07 PM Peter Collingbourne via llvm-dev <[hidden email]> wrote:


On Wed, May 23, 2018 at 12:16 PM, Peter Collingbourne <[hidden email]> wrote:
On Wed, May 23, 2018 at 3:25 AM, Peter Smith <[hidden email]> wrote:
Hello,

I think that the approach of using a section to record address
significance is a good one. I'm guessing it will have its own section
type and format? If it does would it make sense to try and submit this
to the GABI https://groups.google.com/forum/#!forum/generic-abi as it
could be potentially useful for other linkers, for example gold?

Yes, there is a new section type (SHT_LLVM_ADDRSIG) and format (a sequence of ULEB128-encoded symbol table indexes that are address-significant).

I think it makes sense for this to eventually be part of the generic ABI, and I will send a proposal to generic-abi. As I mentioned in my reply to James Knight, I don't think we should block on getting a section number assignment, but we can at least incorporate any design feedback from that proposal.



The proposal looks decent. I'll probably comment more once I (like the others) get a chance to read more deeply. I've also taken a look at the basic patches - one request is that since this is newly standardizing bits that you make sure to comment or even put a detailed standards-esque description of the tables into the code?

Sure, I'll document the assembly and object file extensions here: http://llvm.org/docs/Extensions.html
and reference it from the code.

Peter



-eric


 
Peter

Peter



Happy to help out with reviews.

Peter


On 22 May 2018 at 23:06, Peter Collingbourne via llvm-dev
<[hidden email]> wrote:
> Hi all,
>
> Context: ld.gold has an --icf=safe flag which is intended to apply ICF only
> to sections which can be safely merged according to the guarantees provided
> by the language. It works using a set of heuristics (symbol name matching
> and relocation scanning). That's not only imprecise but it only works with
> certain languages and is slow due to the need to demangle symbols and scan
> relocations. It's also redundant with the (local_)unnamed_addr analysis
> already performed by LLVM.
>
> I implemented an alternative to this approach in clang and lld. It works by
> adding a section to each object file containing the indexes of the symbols
> which are address-significant (i.e. not (local_)unnamed_addr in IR).
>
> I used this implementation to link clang with release+asserts with each of
> --icf={none,safe,all}. The binary sizes were:
>
> none: 109407184
> safe: 108534736 (-0.8%)
> all: 107281360 (-2%)
>
> I measured the object file overhead of these sections in my clang build at
> 0.08%. That's almost nothing, and I think it's small enough that we can turn
> it on by default.
>
> I've uploaded a patch series for this feature here:
> https://github.com/pcc/llvm-project/tree/llvm-addrsig
> I intend to start sending it for review soon.
>
> Thanks,
> --
> --
> Peter
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>



--
-- 
Peter



--
-- 
Peter
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Proposal for address-significance tables for --icf=safe

U.Mutlu via llvm-dev
In reply to this post by U.Mutlu via llvm-dev
Hi Peter, This is a great proposal, thanks!. 

If you were worried about making the abi change have you
thought about just going for an array of symbol names
or hashes of symbol names where any matching symbol is
considered address significant? This would sidestep the
problem of keeping the symbol table indices in sync.

It would be pessimistic for local symbols if the input 
SHT_ADDRSIG sections were combined by e.g. "ld -r" but
in my experience this should not have too much of an
impact on --icf. 

Might be worth considering in the short term whilst you
work on getting gabi acceptance.

On Tue, May 22, 2018 at 11:06 PM, Peter Collingbourne via llvm-dev <[hidden email]> wrote:
Hi all,

Context: ld.gold has an --icf=safe flag which is intended to apply ICF only to sections which can be safely merged according to the guarantees provided by the language. It works using a set of heuristics (symbol name matching and relocation scanning). That's not only imprecise but it only works with certain languages and is slow due to the need to demangle symbols and scan relocations. It's also redundant with the (local_)unnamed_addr analysis already performed by LLVM.

I implemented an alternative to this approach in clang and lld. It works by adding a section to each object file containing the indexes of the symbols which are address-significant (i.e. not (local_)unnamed_addr in IR).

I used this implementation to link clang with release+asserts with each of --icf={none,safe,all}. The binary sizes were:

none: 109407184
safe: 108534736 (-0.8%)
all: 107281360 (-2%)

I measured the object file overhead of these sections in my clang build at 0.08%. That's almost nothing, and I think it's small enough that we can turn it on by default.

I've uploaded a patch series for this feature here: https://github.com/pcc/llvm-project/tree/llvm-addrsig
I intend to start sending it for review soon.

Thanks,
--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Proposal for address-significance tables for --icf=safe

U.Mutlu via llvm-dev
The hash approach was suggested by others as well, but I think for now we can use the sh_link field until the tools are updated -- that was the recommended approach on the generic-abi thread as well. Keep in mind that updating the gABI is really orthogonal to the compatibility issue: even with an updated gABI we'd still have the practical problem of needing to deal with old tools somehow.

Agreed that we'd be fine pessimising ld -r -- the main thing that I'm concerned about is avoiding miscompiles caused by shuffling symbols.

Peter

On Thu, May 31, 2018 at 1:52 PM, bd1976 llvm <[hidden email]> wrote:
Hi Peter, This is a great proposal, thanks!. 

If you were worried about making the abi change have you
thought about just going for an array of symbol names
or hashes of symbol names where any matching symbol is
considered address significant? This would sidestep the
problem of keeping the symbol table indices in sync.

It would be pessimistic for local symbols if the input 
SHT_ADDRSIG sections were combined by e.g. "ld -r" but
in my experience this should not have too much of an
impact on --icf. 

Might be worth considering in the short term whilst you
work on getting gabi acceptance.

On Tue, May 22, 2018 at 11:06 PM, Peter Collingbourne via llvm-dev <[hidden email]> wrote:
Hi all,

Context: ld.gold has an --icf=safe flag which is intended to apply ICF only to sections which can be safely merged according to the guarantees provided by the language. It works using a set of heuristics (symbol name matching and relocation scanning). That's not only imprecise but it only works with certain languages and is slow due to the need to demangle symbols and scan relocations. It's also redundant with the (local_)unnamed_addr analysis already performed by LLVM.

I implemented an alternative to this approach in clang and lld. It works by adding a section to each object file containing the indexes of the symbols which are address-significant (i.e. not (local_)unnamed_addr in IR).

I used this implementation to link clang with release+asserts with each of --icf={none,safe,all}. The binary sizes were:

none: 109407184
safe: 108534736 (-0.8%)
all: 107281360 (-2%)

I measured the object file overhead of these sections in my clang build at 0.08%. That's almost nothing, and I think it's small enough that we can turn it on by default.

I've uploaded a patch series for this feature here: https://github.com/pcc/llvm-project/tree/llvm-addrsig
I intend to start sending it for review soon.

Thanks,
--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev





--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Proposal for address-significance tables for --icf=safe

U.Mutlu via llvm-dev
In reply to this post by U.Mutlu via llvm-dev
On Wed, May 23, 2018 at 3:07 PM, Peter Collingbourne <[hidden email]> wrote:


On Wed, May 23, 2018 at 12:16 PM, Peter Collingbourne <[hidden email]> wrote:
On Wed, May 23, 2018 at 3:25 AM, Peter Smith <[hidden email]> wrote:
Hello,

I think that the approach of using a section to record address
significance is a good one. I'm guessing it will have its own section
type and format? If it does would it make sense to try and submit this
to the GABI https://groups.google.com/forum/#!forum/generic-abi as it
could be potentially useful for other linkers, for example gold?

Yes, there is a new section type (SHT_LLVM_ADDRSIG) and format (a sequence of ULEB128-encoded symbol table indexes that are address-significant).

I think it makes sense for this to eventually be part of the generic ABI, and I will send a proposal to generic-abi. As I mentioned in my reply to James Knight, I don't think we should block on getting a section number assignment, but we can at least incorporate any design feedback from that proposal.


Based on feedback from the generic-abi thread I looked at the object file size and link time impact of a few other representations for the address-significance table. My results are here: https://groups.google.com/d/msg/generic-abi/MPr8TVtnVn4/30Z0_KHMAQAJ

One of the proposals (a compressed array of symbol attributes) is slightly more expensive than what I originally proposed (+0.03% object file size, +1% link time) but it would allow for future expansion and therefore seems more appropriate for the gABI. That's not really a concern for us though since the initial implementation will use a platform-specific section number, and we can always switch to the gABI representation at the same time as we adopt the gABI section numbers. There's also the possibility that we will end up doing something completely different in the gABI.

Does anyone have a strong opinion that we should do something more aligned with the gABI? If not, I'll start upstreaming my original proposal.

Peter

Peter

Peter



Happy to help out with reviews.

Peter


On 22 May 2018 at 23:06, Peter Collingbourne via llvm-dev
<[hidden email]> wrote:
> Hi all,
>
> Context: ld.gold has an --icf=safe flag which is intended to apply ICF only
> to sections which can be safely merged according to the guarantees provided
> by the language. It works using a set of heuristics (symbol name matching
> and relocation scanning). That's not only imprecise but it only works with
> certain languages and is slow due to the need to demangle symbols and scan
> relocations. It's also redundant with the (local_)unnamed_addr analysis
> already performed by LLVM.
>
> I implemented an alternative to this approach in clang and lld. It works by
> adding a section to each object file containing the indexes of the symbols
> which are address-significant (i.e. not (local_)unnamed_addr in IR).
>
> I used this implementation to link clang with release+asserts with each of
> --icf={none,safe,all}. The binary sizes were:
>
> none: 109407184
> safe: 108534736 (-0.8%)
> all: 107281360 (-2%)
>
> I measured the object file overhead of these sections in my clang build at
> 0.08%. That's almost nothing, and I think it's small enough that we can turn
> it on by default.
>
> I've uploaded a patch series for this feature here:
> https://github.com/pcc/llvm-project/tree/llvm-addrsig
> I intend to start sending it for review soon.
>
> Thanks,
> --
> --
> Peter
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>



--
-- 
Peter



--
-- 
Peter



--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Proposal for address-significance tables for --icf=safe

U.Mutlu via llvm-dev


On Thu, May 31, 2018 at 5:06 PM Peter Collingbourne via llvm-dev <[hidden email]> wrote:
On Wed, May 23, 2018 at 3:07 PM, Peter Collingbourne <[hidden email]> wrote:


On Wed, May 23, 2018 at 12:16 PM, Peter Collingbourne <[hidden email]> wrote:
On Wed, May 23, 2018 at 3:25 AM, Peter Smith <[hidden email]> wrote:
Hello,

I think that the approach of using a section to record address
significance is a good one. I'm guessing it will have its own section
type and format? If it does would it make sense to try and submit this
to the GABI https://groups.google.com/forum/#!forum/generic-abi as it
could be potentially useful for other linkers, for example gold?

Yes, there is a new section type (SHT_LLVM_ADDRSIG) and format (a sequence of ULEB128-encoded symbol table indexes that are address-significant).

I think it makes sense for this to eventually be part of the generic ABI, and I will send a proposal to generic-abi. As I mentioned in my reply to James Knight, I don't think we should block on getting a section number assignment, but we can at least incorporate any design feedback from that proposal.


Based on feedback from the generic-abi thread I looked at the object file size and link time impact of a few other representations for the address-significance table. My results are here: https://groups.google.com/d/msg/generic-abi/MPr8TVtnVn4/30Z0_KHMAQAJ

One of the proposals (a compressed array of symbol attributes) is slightly more expensive than what I originally proposed (+0.03% object file size, +1% link time) but it would allow for future expansion and therefore seems more appropriate for the gABI. That's not really a concern for us though since the initial implementation will use a platform-specific section number, and we can always switch to the gABI representation at the same time as we adopt the gABI section numbers. There's also the possibility that we will end up doing something completely different in the gABI.

Does anyone have a strong opinion that we should do something more aligned with the gABI? If not, I'll start upstreaming my original proposal.


I suppose it depends on whether or not you're going to be the one to reconcile your current implementation and that one in the future or if it'll wait for someone else?

-eric
 
Peter

Peter

Peter



Happy to help out with reviews.

Peter


On 22 May 2018 at 23:06, Peter Collingbourne via llvm-dev
<[hidden email]> wrote:
> Hi all,
>
> Context: ld.gold has an --icf=safe flag which is intended to apply ICF only
> to sections which can be safely merged according to the guarantees provided
> by the language. It works using a set of heuristics (symbol name matching
> and relocation scanning). That's not only imprecise but it only works with
> certain languages and is slow due to the need to demangle symbols and scan
> relocations. It's also redundant with the (local_)unnamed_addr analysis
> already performed by LLVM.
>
> I implemented an alternative to this approach in clang and lld. It works by
> adding a section to each object file containing the indexes of the symbols
> which are address-significant (i.e. not (local_)unnamed_addr in IR).
>
> I used this implementation to link clang with release+asserts with each of
> --icf={none,safe,all}. The binary sizes were:
>
> none: 109407184
> safe: 108534736 (-0.8%)
> all: 107281360 (-2%)
>
> I measured the object file overhead of these sections in my clang build at
> 0.08%. That's almost nothing, and I think it's small enough that we can turn
> it on by default.
>
> I've uploaded a patch series for this feature here:
> https://github.com/pcc/llvm-project/tree/llvm-addrsig
> I intend to start sending it for review soon.
>
> Thanks,
> --
> --
> Peter
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>



--
-- 
Peter



--
-- 
Peter



--
-- 
Peter
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Proposal for address-significance tables for --icf=safe

U.Mutlu via llvm-dev


On Thu, May 31, 2018 at 5:13 PM, Eric Christopher <[hidden email]> wrote:


On Thu, May 31, 2018 at 5:06 PM Peter Collingbourne via llvm-dev <[hidden email]> wrote:
On Wed, May 23, 2018 at 3:07 PM, Peter Collingbourne <[hidden email]> wrote:


On Wed, May 23, 2018 at 12:16 PM, Peter Collingbourne <[hidden email]> wrote:
On Wed, May 23, 2018 at 3:25 AM, Peter Smith <[hidden email]> wrote:
Hello,

I think that the approach of using a section to record address
significance is a good one. I'm guessing it will have its own section
type and format? If it does would it make sense to try and submit this
to the GABI https://groups.google.com/forum/#!forum/generic-abi as it
could be potentially useful for other linkers, for example gold?

Yes, there is a new section type (SHT_LLVM_ADDRSIG) and format (a sequence of ULEB128-encoded symbol table indexes that are address-significant).

I think it makes sense for this to eventually be part of the generic ABI, and I will send a proposal to generic-abi. As I mentioned in my reply to James Knight, I don't think we should block on getting a section number assignment, but we can at least incorporate any design feedback from that proposal.


Based on feedback from the generic-abi thread I looked at the object file size and link time impact of a few other representations for the address-significance table. My results are here: https://groups.google.com/d/msg/generic-abi/MPr8TVtnVn4/30Z0_KHMAQAJ

One of the proposals (a compressed array of symbol attributes) is slightly more expensive than what I originally proposed (+0.03% object file size, +1% link time) but it would allow for future expansion and therefore seems more appropriate for the gABI. That's not really a concern for us though since the initial implementation will use a platform-specific section number, and we can always switch to the gABI representation at the same time as we adopt the gABI section numbers. There's also the possibility that we will end up doing something completely different in the gABI.

Does anyone have a strong opinion that we should do something more aligned with the gABI? If not, I'll start upstreaming my original proposal.


I suppose it depends on whether or not you're going to be the one to reconcile your current implementation and that one in the future or if it'll wait for someone else?

I'm happy to volunteer to work on that if the proposal ever gets accepted into the gABI and I'm still working on LLVM at that time.

Peter


-eric
 
Peter

Peter

Peter



Happy to help out with reviews.

Peter


On 22 May 2018 at 23:06, Peter Collingbourne via llvm-dev
<[hidden email]> wrote:
> Hi all,
>
> Context: ld.gold has an --icf=safe flag which is intended to apply ICF only
> to sections which can be safely merged according to the guarantees provided
> by the language. It works using a set of heuristics (symbol name matching
> and relocation scanning). That's not only imprecise but it only works with
> certain languages and is slow due to the need to demangle symbols and scan
> relocations. It's also redundant with the (local_)unnamed_addr analysis
> already performed by LLVM.
>
> I implemented an alternative to this approach in clang and lld. It works by
> adding a section to each object file containing the indexes of the symbols
> which are address-significant (i.e. not (local_)unnamed_addr in IR).
>
> I used this implementation to link clang with release+asserts with each of
> --icf={none,safe,all}. The binary sizes were:
>
> none: 109407184
> safe: 108534736 (-0.8%)
> all: 107281360 (-2%)
>
> I measured the object file overhead of these sections in my clang build at
> 0.08%. That's almost nothing, and I think it's small enough that we can turn
> it on by default.
>
> I've uploaded a patch series for this feature here:
> https://github.com/pcc/llvm-project/tree/llvm-addrsig
> I intend to start sending it for review soon.
>
> Thanks,
> --
> --
> Peter
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>



--
-- 
Peter



--
-- 
Peter



--
-- 
Peter
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Proposal for address-significance tables for --icf=safe

U.Mutlu via llvm-dev


On Thu, May 31, 2018 at 5:29 PM Peter Collingbourne <[hidden email]> wrote:
On Thu, May 31, 2018 at 5:13 PM, Eric Christopher <[hidden email]> wrote:


On Thu, May 31, 2018 at 5:06 PM Peter Collingbourne via llvm-dev <[hidden email]> wrote:
On Wed, May 23, 2018 at 3:07 PM, Peter Collingbourne <[hidden email]> wrote:


On Wed, May 23, 2018 at 12:16 PM, Peter Collingbourne <[hidden email]> wrote:
On Wed, May 23, 2018 at 3:25 AM, Peter Smith <[hidden email]> wrote:
Hello,

I think that the approach of using a section to record address
significance is a good one. I'm guessing it will have its own section
type and format? If it does would it make sense to try and submit this
to the GABI https://groups.google.com/forum/#!forum/generic-abi as it
could be potentially useful for other linkers, for example gold?

Yes, there is a new section type (SHT_LLVM_ADDRSIG) and format (a sequence of ULEB128-encoded symbol table indexes that are address-significant).

I think it makes sense for this to eventually be part of the generic ABI, and I will send a proposal to generic-abi. As I mentioned in my reply to James Knight, I don't think we should block on getting a section number assignment, but we can at least incorporate any design feedback from that proposal.


Based on feedback from the generic-abi thread I looked at the object file size and link time impact of a few other representations for the address-significance table. My results are here: https://groups.google.com/d/msg/generic-abi/MPr8TVtnVn4/30Z0_KHMAQAJ

One of the proposals (a compressed array of symbol attributes) is slightly more expensive than what I originally proposed (+0.03% object file size, +1% link time) but it would allow for future expansion and therefore seems more appropriate for the gABI. That's not really a concern for us though since the initial implementation will use a platform-specific section number, and we can always switch to the gABI representation at the same time as we adopt the gABI section numbers. There's also the possibility that we will end up doing something completely different in the gABI.

Does anyone have a strong opinion that we should do something more aligned with the gABI? If not, I'll start upstreaming my original proposal.


I suppose it depends on whether or not you're going to be the one to reconcile your current implementation and that one in the future or if it'll wait for someone else?

I'm happy to volunteer to work on that if the proposal ever gets accepted into the gABI and I'm still working on LLVM at that time.


Then I think whatever is easier for you. I'd have a vague preference of something closer to the gABI proposal in case you're hit by a bus, but understand that you've already got the current code written as well.

-eric
 
Peter


-eric
 
Peter

Peter

Peter



Happy to help out with reviews.

Peter


On 22 May 2018 at 23:06, Peter Collingbourne via llvm-dev
<[hidden email]> wrote:
> Hi all,
>
> Context: ld.gold has an --icf=safe flag which is intended to apply ICF only
> to sections which can be safely merged according to the guarantees provided
> by the language. It works using a set of heuristics (symbol name matching
> and relocation scanning). That's not only imprecise but it only works with
> certain languages and is slow due to the need to demangle symbols and scan
> relocations. It's also redundant with the (local_)unnamed_addr analysis
> already performed by LLVM.
>
> I implemented an alternative to this approach in clang and lld. It works by
> adding a section to each object file containing the indexes of the symbols
> which are address-significant (i.e. not (local_)unnamed_addr in IR).
>
> I used this implementation to link clang with release+asserts with each of
> --icf={none,safe,all}. The binary sizes were:
>
> none: 109407184
> safe: 108534736 (-0.8%)
> all: 107281360 (-2%)
>
> I measured the object file overhead of these sections in my clang build at
> 0.08%. That's almost nothing, and I think it's small enough that we can turn
> it on by default.
>
> I've uploaded a patch series for this feature here:
> https://github.com/pcc/llvm-project/tree/llvm-addrsig
> I intend to start sending it for review soon.
>
> Thanks,
> --
> --
> Peter
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>



--
-- 
Peter



--
-- 
Peter



--
-- 
Peter
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



--
-- 
Peter

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev