[llvm-dev] Increasing address pool reuse/reducing .o file size in DWARFv5

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] Increasing address pool reuse/reducing .o file size in DWARFv5

Adrian Prantl via llvm-dev
tl;dr: in DWARFv5, using DW_AT_ranges even when the range is contiguous reduces linked, uncompressed debug_addr size for optimized builds by 93% and reduces total .o file size (with compression and split) by 15%. It does grow .dwo file size a bit - DWARFv5, no compression, not split shows the net effect if all bytes are equal: -O3 clang binary grows by 0.4%, -O0 clang binary shrinks by 0.1%
Should we enable this strategy by default for DWARFv5, for DWARFv5+Split DWARF, or not by default at all/only under a flag?



So, I've brought this up a few times before - that DWARFv5 does a pretty good job of reducing relocations (& reducing .o file size with Split DWARF) by allowing many uses of addresses to include some kind of address+offset (debug_rnglists and loclists allowing "base_address" then offset_pairs (an improvement over similar functionality in DWARFv4 because the offset pairs can be uleb encoded - so they can be quite compact))

But one place that DWARFv5 misses to reduce relocations further is direct addresses from debug_info, such as DW_AT_low_pc.

For a while I've wondered if we could use an extension form for addr+offset, and I prototyped this without an extension attribute, but instead using exprloc. This has slightly higher overhead to express the... expression. (it's 9 bytes in total, could be as few as 5 with a custom form)

But I had another idea that's more instantly deployable: Why not use DW_AT_ranges even when the range is contiguous? That way the low_pc that previously couldn't use an existing address pool entry + offset, could use the rnglist support for base address. 

The only unnecessary address pool entries that remain that I've found are DW_AT_low_pc for DW_TAG_labels - but there's only a handful of those in most code. So the "ranges everywhere" strategy gets the addresses for optimized clang down from 4758 (v4 address pool used 9923 addresses... ) to 342, with about ~4 "extra" addresses for DW_TAG_labels. 

This could also be a bit less costly if DWARFv5 rnglists didn't use a separate offset table (instead encoding the offsets directly in debug_info, rather than using indexes)

I have patches for both the addr+offset exprloc and for the ranges-always, both with -mllvm flags - do people think they're both worth committing for experimentation? Neither? Default on in some cases (like Split DWARF)?

Thanks,
- Dave

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Increasing address pool reuse/reducing .o file size in DWARFv5

Adrian Prantl via llvm-dev
For the record I think the options being committed so we can get a better idea on how it looks would be fine. We should definitely figure out what seems to work best and leave it there, but in the meantime I think your plan sounds good.

-eric

On Mon, Dec 30, 2019 at 12:08 PM David Blaikie <[hidden email]> wrote:
tl;dr: in DWARFv5, using DW_AT_ranges even when the range is contiguous reduces linked, uncompressed debug_addr size for optimized builds by 93% and reduces total .o file size (with compression and split) by 15%. It does grow .dwo file size a bit - DWARFv5, no compression, not split shows the net effect if all bytes are equal: -O3 clang binary grows by 0.4%, -O0 clang binary shrinks by 0.1%
Should we enable this strategy by default for DWARFv5, for DWARFv5+Split DWARF, or not by default at all/only under a flag?



So, I've brought this up a few times before - that DWARFv5 does a pretty good job of reducing relocations (& reducing .o file size with Split DWARF) by allowing many uses of addresses to include some kind of address+offset (debug_rnglists and loclists allowing "base_address" then offset_pairs (an improvement over similar functionality in DWARFv4 because the offset pairs can be uleb encoded - so they can be quite compact))

But one place that DWARFv5 misses to reduce relocations further is direct addresses from debug_info, such as DW_AT_low_pc.

For a while I've wondered if we could use an extension form for addr+offset, and I prototyped this without an extension attribute, but instead using exprloc. This has slightly higher overhead to express the... expression. (it's 9 bytes in total, could be as few as 5 with a custom form)

But I had another idea that's more instantly deployable: Why not use DW_AT_ranges even when the range is contiguous? That way the low_pc that previously couldn't use an existing address pool entry + offset, could use the rnglist support for base address. 

The only unnecessary address pool entries that remain that I've found are DW_AT_low_pc for DW_TAG_labels - but there's only a handful of those in most code. So the "ranges everywhere" strategy gets the addresses for optimized clang down from 4758 (v4 address pool used 9923 addresses... ) to 342, with about ~4 "extra" addresses for DW_TAG_labels. 

This could also be a bit less costly if DWARFv5 rnglists didn't use a separate offset table (instead encoding the offsets directly in debug_info, rather than using indexes)

I have patches for both the addr+offset exprloc and for the ranges-always, both with -mllvm flags - do people think they're both worth committing for experimentation? Neither? Default on in some cases (like Split DWARF)?

Thanks,
- Dave

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Increasing address pool reuse/reducing .o file size in DWARFv5

Adrian Prantl via llvm-dev
In reply to this post by Adrian Prantl via llvm-dev
I think this sounds like a good plan for Linux. I would like to see the numbers for Darwin (= non-split DWARF) to decide whether we should just make that the default. Eric's suggestion of having this committed as an option first seems like a good step in that direction. If it is an advantage across the board we can remove the option and just make this the default behavior.

thanks,
adrian

> On Dec 30, 2019, at 12:08 PM, David Blaikie <[hidden email]> wrote:
>
> tl;dr: in DWARFv5, using DW_AT_ranges even when the range is contiguous reduces linked, uncompressed debug_addr size for optimized builds by 93% and reduces total .o file size (with compression and split) by 15%. It does grow .dwo file size a bit - DWARFv5, no compression, not split shows the net effect if all bytes are equal: -O3 clang binary grows by 0.4%, -O0 clang binary shrinks by 0.1%
> Should we enable this strategy by default for DWARFv5, for DWARFv5+Split DWARF, or not by default at all/only under a flag?
>
>
>
> So, I've brought this up a few times before - that DWARFv5 does a pretty good job of reducing relocations (& reducing .o file size with Split DWARF) by allowing many uses of addresses to include some kind of address+offset (debug_rnglists and loclists allowing "base_address" then offset_pairs (an improvement over similar functionality in DWARFv4 because the offset pairs can be uleb encoded - so they can be quite compact))
>
> But one place that DWARFv5 misses to reduce relocations further is direct addresses from debug_info, such as DW_AT_low_pc.
>
> For a while I've wondered if we could use an extension form for addr+offset, and I prototyped this without an extension attribute, but instead using exprloc. This has slightly higher overhead to express the... expression. (it's 9 bytes in total, could be as few as 5 with a custom form)
>
> But I had another idea that's more instantly deployable: Why not use DW_AT_ranges even when the range is contiguous? That way the low_pc that previously couldn't use an existing address pool entry + offset, could use the rnglist support for base address.
>
> The only unnecessary address pool entries that remain that I've found are DW_AT_low_pc for DW_TAG_labels - but there's only a handful of those in most code. So the "ranges everywhere" strategy gets the addresses for optimized clang down from 4758 (v4 address pool used 9923 addresses... ) to 342, with about ~4 "extra" addresses for DW_TAG_labels.
>
> This could also be a bit less costly if DWARFv5 rnglists didn't use a separate offset table (instead encoding the offsets directly in debug_info, rather than using indexes)
>
> I have patches for both the addr+offset exprloc and for the ranges-always, both with -mllvm flags - do people think they're both worth committing for experimentation? Neither? Default on in some cases (like Split DWARF)?
>
> Thanks,
> - Dave

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Increasing address pool reuse/reducing .o file size in DWARFv5

Adrian Prantl via llvm-dev
On some previous occasion that introduced additional indirection
(don't remember the details) my debugger people groused about the
additional performance cost of chasing down data in a different
object-file section.  So we (Sony) might be happier with low_pc as
expressions, than with a ranges-always solution.

But hard to say without data, and getting both modes in at least
as a temporary thing sounds like a good plan.
--paulr


> -----Original Message-----
> From: [hidden email] <[hidden email]>
> Sent: Wednesday, January 8, 2020 1:49 PM
> To: David Blaikie <[hidden email]>
> Cc: llvm-dev <[hidden email]>; Jonas Devlieghere
> <[hidden email]>; Robinson, Paul <[hidden email]>; Eric
> Christopher <[hidden email]>; Frederic Riss <[hidden email]>
> Subject: Re: Increasing address pool reuse/reducing .o file size in
> DWARFv5
>
> I think this sounds like a good plan for Linux. I would like to see the
> numbers for Darwin (= non-split DWARF) to decide whether we should just
> make that the default. Eric's suggestion of having this committed as an
> option first seems like a good step in that direction. If it is an
> advantage across the board we can remove the option and just make this the
> default behavior.
>
> thanks,
> adrian
>
> > On Dec 30, 2019, at 12:08 PM, David Blaikie <[hidden email]> wrote:
> >
> > tl;dr: in DWARFv5, using DW_AT_ranges even when the range is contiguous
> reduces linked, uncompressed debug_addr size for optimized builds by 93%
> and reduces total .o file size (with compression and split) by 15%. It
> does grow .dwo file size a bit - DWARFv5, no compression, not split shows
> the net effect if all bytes are equal: -O3 clang binary grows by 0.4%, -O0
> clang binary shrinks by 0.1%
> > Should we enable this strategy by default for DWARFv5, for DWARFv5+Split
> DWARF, or not by default at all/only under a flag?
> >
> >
> >
> > So, I've brought this up a few times before - that DWARFv5 does a pretty
> good job of reducing relocations (& reducing .o file size with Split
> DWARF) by allowing many uses of addresses to include some kind of
> address+offset (debug_rnglists and loclists allowing "base_address" then
> offset_pairs (an improvement over similar functionality in DWARFv4 because
> the offset pairs can be uleb encoded - so they can be quite compact))
> >
> > But one place that DWARFv5 misses to reduce relocations further is
> direct addresses from debug_info, such as DW_AT_low_pc.
> >
> > For a while I've wondered if we could use an extension form for
> addr+offset, and I prototyped this without an extension attribute, but
> instead using exprloc. This has slightly higher overhead to express the...
> expression. (it's 9 bytes in total, could be as few as 5 with a custom
> form)
> >
> > But I had another idea that's more instantly deployable: Why not use
> DW_AT_ranges even when the range is contiguous? That way the low_pc that
> previously couldn't use an existing address pool entry + offset, could use
> the rnglist support for base address.
> >
> > The only unnecessary address pool entries that remain that I've found
> are DW_AT_low_pc for DW_TAG_labels - but there's only a handful of those
> in most code. So the "ranges everywhere" strategy gets the addresses for
> optimized clang down from 4758 (v4 address pool used 9923 addresses... )
> to 342, with about ~4 "extra" addresses for DW_TAG_labels.
> >
> > This could also be a bit less costly if DWARFv5 rnglists didn't use a
> separate offset table (instead encoding the offsets directly in
> debug_info, rather than using indexes)
> >
> > I have patches for both the addr+offset exprloc and for the ranges-
> always, both with -mllvm flags - do people think they're both worth
> committing for experimentation? Neither? Default on in some cases (like
> Split DWARF)?
> >
> > Thanks,
> > - Dave

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Increasing address pool reuse/reducing .o file size in DWARFv5

Adrian Prantl via llvm-dev
Sounds good all round - I'll commit these two modes, and maybe even the third (given Sony's interest & possible interest in changing their consumer to handle it) of a custom form to eek out the last few bytes from the more direct addr+offset encoding.

I'll follow up here with flag names and revision numbers once they're in.

On Wed, Jan 8, 2020 at 1:26 PM Robinson, Paul <[hidden email]> wrote:
On some previous occasion that introduced additional indirection
(don't remember the details) my debugger people groused about the
additional performance cost of chasing down data in a different
object-file section.  So we (Sony) might be happier with low_pc as
expressions, than with a ranges-always solution.

But hard to say without data, and getting both modes in at least
as a temporary thing sounds like a good plan.
--paulr


> -----Original Message-----
> From: [hidden email] <[hidden email]>
> Sent: Wednesday, January 8, 2020 1:49 PM
> To: David Blaikie <[hidden email]>
> Cc: llvm-dev <[hidden email]>; Jonas Devlieghere
> <[hidden email]>; Robinson, Paul <[hidden email]>; Eric
> Christopher <[hidden email]>; Frederic Riss <[hidden email]>
> Subject: Re: Increasing address pool reuse/reducing .o file size in
> DWARFv5
>
> I think this sounds like a good plan for Linux. I would like to see the
> numbers for Darwin (= non-split DWARF) to decide whether we should just
> make that the default. Eric's suggestion of having this committed as an
> option first seems like a good step in that direction. If it is an
> advantage across the board we can remove the option and just make this the
> default behavior.
>
> thanks,
> adrian
>
> > On Dec 30, 2019, at 12:08 PM, David Blaikie <[hidden email]> wrote:
> >
> > tl;dr: in DWARFv5, using DW_AT_ranges even when the range is contiguous
> reduces linked, uncompressed debug_addr size for optimized builds by 93%
> and reduces total .o file size (with compression and split) by 15%. It
> does grow .dwo file size a bit - DWARFv5, no compression, not split shows
> the net effect if all bytes are equal: -O3 clang binary grows by 0.4%, -O0
> clang binary shrinks by 0.1%
> > Should we enable this strategy by default for DWARFv5, for DWARFv5+Split
> DWARF, or not by default at all/only under a flag?
> >
> >
> >
> > So, I've brought this up a few times before - that DWARFv5 does a pretty
> good job of reducing relocations (& reducing .o file size with Split
> DWARF) by allowing many uses of addresses to include some kind of
> address+offset (debug_rnglists and loclists allowing "base_address" then
> offset_pairs (an improvement over similar functionality in DWARFv4 because
> the offset pairs can be uleb encoded - so they can be quite compact))
> >
> > But one place that DWARFv5 misses to reduce relocations further is
> direct addresses from debug_info, such as DW_AT_low_pc.
> >
> > For a while I've wondered if we could use an extension form for
> addr+offset, and I prototyped this without an extension attribute, but
> instead using exprloc. This has slightly higher overhead to express the...
> expression. (it's 9 bytes in total, could be as few as 5 with a custom
> form)
> >
> > But I had another idea that's more instantly deployable: Why not use
> DW_AT_ranges even when the range is contiguous? That way the low_pc that
> previously couldn't use an existing address pool entry + offset, could use
> the rnglist support for base address.
> >
> > The only unnecessary address pool entries that remain that I've found
> are DW_AT_low_pc for DW_TAG_labels - but there's only a handful of those
> in most code. So the "ranges everywhere" strategy gets the addresses for
> optimized clang down from 4758 (v4 address pool used 9923 addresses... )
> to 342, with about ~4 "extra" addresses for DW_TAG_labels.
> >
> > This could also be a bit less costly if DWARFv5 rnglists didn't use a
> separate offset table (instead encoding the offsets directly in
> debug_info, rather than using indexes)
> >
> > I have patches for both the addr+offset exprloc and for the ranges-
> always, both with -mllvm flags - do people think they're both worth
> committing for experimentation? Neither? Default on in some cases (like
> Split DWARF)?
> >
> > Thanks,
> > - Dave


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Increasing address pool reuse/reducing .o file size in DWARFv5

Adrian Prantl via llvm-dev
I don't totally follow the proposed encoding change & would appreciate a small example.

Is the idea to replace e.g. an 'AT_low_pc (<direct address>) + relocation for <direct address>' with an 'AT_low_pc (<indirection into a pool of addresses> + offset)', s.t. the cost of a relocation for the address is paid down the more it's used? How do you figure the offset out?

thanks,
vedant

On Jan 8, 2020, at 1:33 PM, David Blaikie via llvm-dev <[hidden email]> wrote:

Sounds good all round - I'll commit these two modes, and maybe even the third (given Sony's interest & possible interest in changing their consumer to handle it) of a custom form to eek out the last few bytes from the more direct addr+offset encoding.

I'll follow up here with flag names and revision numbers once they're in.

On Wed, Jan 8, 2020 at 1:26 PM Robinson, Paul <[hidden email]> wrote:
On some previous occasion that introduced additional indirection
(don't remember the details) my debugger people groused about the
additional performance cost of chasing down data in a different
object-file section.  So we (Sony) might be happier with low_pc as
expressions, than with a ranges-always solution.

But hard to say without data, and getting both modes in at least
as a temporary thing sounds like a good plan.
--paulr


> -----Original Message-----
> From: [hidden email] <[hidden email]>
> Sent: Wednesday, January 8, 2020 1:49 PM
> To: David Blaikie <[hidden email]>
> Cc: llvm-dev <[hidden email]>; Jonas Devlieghere
> <[hidden email]>; Robinson, Paul <[hidden email]>; Eric
> Christopher <[hidden email]>; Frederic Riss <[hidden email]>
> Subject: Re: Increasing address pool reuse/reducing .o file size in
> DWARFv5
>
> I think this sounds like a good plan for Linux. I would like to see the
> numbers for Darwin (= non-split DWARF) to decide whether we should just
> make that the default. Eric's suggestion of having this committed as an
> option first seems like a good step in that direction. If it is an
> advantage across the board we can remove the option and just make this the
> default behavior.
>
> thanks,
> adrian
>
> > On Dec 30, 2019, at 12:08 PM, David Blaikie <[hidden email]> wrote:
> >
> > tl;dr: in DWARFv5, using DW_AT_ranges even when the range is contiguous
> reduces linked, uncompressed debug_addr size for optimized builds by 93%
> and reduces total .o file size (with compression and split) by 15%. It
> does grow .dwo file size a bit - DWARFv5, no compression, not split shows
> the net effect if all bytes are equal: -O3 clang binary grows by 0.4%, -O0
> clang binary shrinks by 0.1%
> > Should we enable this strategy by default for DWARFv5, for DWARFv5+Split
> DWARF, or not by default at all/only under a flag?
> >
> >
> >
> > So, I've brought this up a few times before - that DWARFv5 does a pretty
> good job of reducing relocations (& reducing .o file size with Split
> DWARF) by allowing many uses of addresses to include some kind of
> address+offset (debug_rnglists and loclists allowing "base_address" then
> offset_pairs (an improvement over similar functionality in DWARFv4 because
> the offset pairs can be uleb encoded - so they can be quite compact))
> >
> > But one place that DWARFv5 misses to reduce relocations further is
> direct addresses from debug_info, such as DW_AT_low_pc.
> >
> > For a while I've wondered if we could use an extension form for
> addr+offset, and I prototyped this without an extension attribute, but
> instead using exprloc. This has slightly higher overhead to express the...
> expression. (it's 9 bytes in total, could be as few as 5 with a custom
> form)
> >
> > But I had another idea that's more instantly deployable: Why not use
> DW_AT_ranges even when the range is contiguous? That way the low_pc that
> previously couldn't use an existing address pool entry + offset, could use
> the rnglist support for base address.
> >
> > The only unnecessary address pool entries that remain that I've found
> are DW_AT_low_pc for DW_TAG_labels - but there's only a handful of those
> in most code. So the "ranges everywhere" strategy gets the addresses for
> optimized clang down from 4758 (v4 address pool used 9923 addresses... )
> to 342, with about ~4 "extra" addresses for DW_TAG_labels.
> >
> > This could also be a bit less costly if DWARFv5 rnglists didn't use a
> separate offset table (instead encoding the offsets directly in
> debug_info, rather than using indexes)
> >
> > I have patches for both the addr+offset exprloc and for the ranges-
> always, both with -mllvm flags - do people think they're both worth
> committing for experimentation? Neither? Default on in some cases (like
> Split DWARF)?
> >
> > Thanks,
> > - Dave

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Increasing address pool reuse/reducing .o file size in DWARFv5

Adrian Prantl via llvm-dev


On Fri, Jan 10, 2020 at 12:57 PM Vedant Kumar <[hidden email]> wrote:
I don't totally follow the proposed encoding change & would appreciate a small example.

Is the idea to replace e.g. an 'AT_low_pc (<direct address>) + relocation for <direct address>' with an 'AT_low_pc (<indirection into a pool of addresses> + offset)',

With Split DWARF or with DWARFv5 in LLVM at the moment, all addresses are indirected already. So it's:

Replace "AT_low_pc (<indirection into a pool of addresses>)" with an "AT_low_pc (<indirection into a pool of addresses> + offset)".
 
s.t. the cost of a relocation for the address is paid down the more it's used?

Right - specifically to reduce the pool of addresses down to, ideally, one address per section/indivisible chunk of machine code (per subsection in MachO, for instance) (whereas currently there are many addresses per section)
 
How do you figure the offset out?

Label difference - same as is done for DW_AT_high_pc today in DWARFv4 and DWARFv5 in LLVM. high_pc currently uses the low_pc addresse to be relative to, in this proposed situation, we'd use a symbol that's in the first bit of debug info in the section (or subsection in MachO). So the low_pc of the subprogram/function, for instance, or if there are two functions in the same section with debug info for both, the low_pc of the first of those functions, etc... 
 

thanks,
vedant

On Jan 8, 2020, at 1:33 PM, David Blaikie via llvm-dev <[hidden email]> wrote:

Sounds good all round - I'll commit these two modes, and maybe even the third (given Sony's interest & possible interest in changing their consumer to handle it) of a custom form to eek out the last few bytes from the more direct addr+offset encoding.

I'll follow up here with flag names and revision numbers once they're in.

On Wed, Jan 8, 2020 at 1:26 PM Robinson, Paul <[hidden email]> wrote:
On some previous occasion that introduced additional indirection
(don't remember the details) my debugger people groused about the
additional performance cost of chasing down data in a different
object-file section.  So we (Sony) might be happier with low_pc as
expressions, than with a ranges-always solution.

But hard to say without data, and getting both modes in at least
as a temporary thing sounds like a good plan.
--paulr


> -----Original Message-----
> From: [hidden email] <[hidden email]>
> Sent: Wednesday, January 8, 2020 1:49 PM
> To: David Blaikie <[hidden email]>
> Cc: llvm-dev <[hidden email]>; Jonas Devlieghere
> <[hidden email]>; Robinson, Paul <[hidden email]>; Eric
> Christopher <[hidden email]>; Frederic Riss <[hidden email]>
> Subject: Re: Increasing address pool reuse/reducing .o file size in
> DWARFv5
>
> I think this sounds like a good plan for Linux. I would like to see the
> numbers for Darwin (= non-split DWARF) to decide whether we should just
> make that the default. Eric's suggestion of having this committed as an
> option first seems like a good step in that direction. If it is an
> advantage across the board we can remove the option and just make this the
> default behavior.
>
> thanks,
> adrian
>
> > On Dec 30, 2019, at 12:08 PM, David Blaikie <[hidden email]> wrote:
> >
> > tl;dr: in DWARFv5, using DW_AT_ranges even when the range is contiguous
> reduces linked, uncompressed debug_addr size for optimized builds by 93%
> and reduces total .o file size (with compression and split) by 15%. It
> does grow .dwo file size a bit - DWARFv5, no compression, not split shows
> the net effect if all bytes are equal: -O3 clang binary grows by 0.4%, -O0
> clang binary shrinks by 0.1%
> > Should we enable this strategy by default for DWARFv5, for DWARFv5+Split
> DWARF, or not by default at all/only under a flag?
> >
> >
> >
> > So, I've brought this up a few times before - that DWARFv5 does a pretty
> good job of reducing relocations (& reducing .o file size with Split
> DWARF) by allowing many uses of addresses to include some kind of
> address+offset (debug_rnglists and loclists allowing "base_address" then
> offset_pairs (an improvement over similar functionality in DWARFv4 because
> the offset pairs can be uleb encoded - so they can be quite compact))
> >
> > But one place that DWARFv5 misses to reduce relocations further is
> direct addresses from debug_info, such as DW_AT_low_pc.
> >
> > For a while I've wondered if we could use an extension form for
> addr+offset, and I prototyped this without an extension attribute, but
> instead using exprloc. This has slightly higher overhead to express the...
> expression. (it's 9 bytes in total, could be as few as 5 with a custom
> form)
> >
> > But I had another idea that's more instantly deployable: Why not use
> DW_AT_ranges even when the range is contiguous? That way the low_pc that
> previously couldn't use an existing address pool entry + offset, could use
> the rnglist support for base address.
> >
> > The only unnecessary address pool entries that remain that I've found
> are DW_AT_low_pc for DW_TAG_labels - but there's only a handful of those
> in most code. So the "ranges everywhere" strategy gets the addresses for
> optimized clang down from 4758 (v4 address pool used 9923 addresses... )
> to 342, with about ~4 "extra" addresses for DW_TAG_labels.
> >
> > This could also be a bit less costly if DWARFv5 rnglists didn't use a
> separate offset table (instead encoding the offsets directly in
> debug_info, rather than using indexes)
> >
> > I have patches for both the addr+offset exprloc and for the ranges-
> always, both with -mllvm flags - do people think they're both worth
> committing for experimentation? Neither? Default on in some cases (like
> Split DWARF)?
> >
> > Thanks,
> > - Dave

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Increasing address pool reuse/reducing .o file size in DWARFv5

Adrian Prantl via llvm-dev
I think I get it now, thanks for explaining!

On Jan 12, 2020, at 11:44 AM, David Blaikie via llvm-dev <[hidden email]> wrote:




On Fri, Jan 10, 2020 at 12:57 PM Vedant Kumar <[hidden email]> wrote:
I don't totally follow the proposed encoding change & would appreciate a small example.

Is the idea to replace e.g. an 'AT_low_pc (<direct address>) + relocation for <direct address>' with an 'AT_low_pc (<indirection into a pool of addresses> + offset)',

With Split DWARF or with DWARFv5 in LLVM at the moment, all addresses are indirected already. So it's:

Replace "AT_low_pc (<indirection into a pool of addresses>)" with an "AT_low_pc (<indirection into a pool of addresses> + offset)".
 
s.t. the cost of a relocation for the address is paid down the more it's used?

Right - specifically to reduce the pool of addresses down to, ideally, one address per section/indivisible chunk of machine code (per subsection in MachO, for instance) (whereas currently there are many addresses per section)
 
How do you figure the offset out?

Label difference - same as is done for DW_AT_high_pc today in DWARFv4 and DWARFv5 in LLVM. high_pc currently uses the low_pc addresse to be relative to, in this proposed situation, we'd use a symbol that's in the first bit of debug info in the section (or subsection in MachO). So the low_pc of the subprogram/function, for instance, or if there are two functions in the same section with debug info for both, the low_pc of the first of those functions, etc... 

If the label difference in a low_pc attribute is relative to the start of a section, could a linker orderfile pass break the dwarf unless it updates the offset? Ditto, I suppose, for an intra-function offset when something like propeller is used to reorder basic blocks (I’m thinking of At_call_return_pc now).

Apologies if this has been answered elsewhere, I suppose there must be a solution for this for At_high_pc to work.

vedant


 

thanks,
vedant

On Jan 8, 2020, at 1:33 PM, David Blaikie via llvm-dev <[hidden email]> wrote:

Sounds good all round - I'll commit these two modes, and maybe even the third (given Sony's interest & possible interest in changing their consumer to handle it) of a custom form to eek out the last few bytes from the more direct addr+offset encoding.

I'll follow up here with flag names and revision numbers once they're in.

On Wed, Jan 8, 2020 at 1:26 PM Robinson, Paul <[hidden email]> wrote:
On some previous occasion that introduced additional indirection
(don't remember the details) my debugger people groused about the
additional performance cost of chasing down data in a different
object-file section.  So we (Sony) might be happier with low_pc as
expressions, than with a ranges-always solution.

But hard to say without data, and getting both modes in at least
as a temporary thing sounds like a good plan.
--paulr


> -----Original Message-----
> From: [hidden email] <[hidden email]>
> Sent: Wednesday, January 8, 2020 1:49 PM
> To: David Blaikie <[hidden email]>
> Cc: llvm-dev <[hidden email]>; Jonas Devlieghere
> <[hidden email]>; Robinson, Paul <[hidden email]>; Eric
> Christopher <[hidden email]>; Frederic Riss <[hidden email]>
> Subject: Re: Increasing address pool reuse/reducing .o file size in
> DWARFv5
>
> I think this sounds like a good plan for Linux. I would like to see the
> numbers for Darwin (= non-split DWARF) to decide whether we should just
> make that the default. Eric's suggestion of having this committed as an
> option first seems like a good step in that direction. If it is an
> advantage across the board we can remove the option and just make this the
> default behavior.
>
> thanks,
> adrian
>
> > On Dec 30, 2019, at 12:08 PM, David Blaikie <[hidden email]> wrote:
> >
> > tl;dr: in DWARFv5, using DW_AT_ranges even when the range is contiguous
> reduces linked, uncompressed debug_addr size for optimized builds by 93%
> and reduces total .o file size (with compression and split) by 15%. It
> does grow .dwo file size a bit - DWARFv5, no compression, not split shows
> the net effect if all bytes are equal: -O3 clang binary grows by 0.4%, -O0
> clang binary shrinks by 0.1%
> > Should we enable this strategy by default for DWARFv5, for DWARFv5+Split
> DWARF, or not by default at all/only under a flag?
> >
> >
> >
> > So, I've brought this up a few times before - that DWARFv5 does a pretty
> good job of reducing relocations (& reducing .o file size with Split
> DWARF) by allowing many uses of addresses to include some kind of
> address+offset (debug_rnglists and loclists allowing "base_address" then
> offset_pairs (an improvement over similar functionality in DWARFv4 because
> the offset pairs can be uleb encoded - so they can be quite compact))
> >
> > But one place that DWARFv5 misses to reduce relocations further is
> direct addresses from debug_info, such as DW_AT_low_pc.
> >
> > For a while I've wondered if we could use an extension form for
> addr+offset, and I prototyped this without an extension attribute, but
> instead using exprloc. This has slightly higher overhead to express the...
> expression. (it's 9 bytes in total, could be as few as 5 with a custom
> form)
> >
> > But I had another idea that's more instantly deployable: Why not use
> DW_AT_ranges even when the range is contiguous? That way the low_pc that
> previously couldn't use an existing address pool entry + offset, could use
> the rnglist support for base address.
> >
> > The only unnecessary address pool entries that remain that I've found
> are DW_AT_low_pc for DW_TAG_labels - but there's only a handful of those
> in most code. So the "ranges everywhere" strategy gets the addresses for
> optimized clang down from 4758 (v4 address pool used 9923 addresses... )
> to 342, with about ~4 "extra" addresses for DW_TAG_labels.
> >
> > This could also be a bit less costly if DWARFv5 rnglists didn't use a
> separate offset table (instead encoding the offsets directly in
> debug_info, rather than using indexes)
> >
> > I have patches for both the addr+offset exprloc and for the ranges-
> always, both with -mllvm flags - do people think they're both worth
> committing for experimentation? Neither? Default on in some cases (like
> Split DWARF)?
> >
> > Thanks,
> > - Dave

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Increasing address pool reuse/reducing .o file size in DWARFv5

Adrian Prantl via llvm-dev


On Mon, Jan 13, 2020 at 9:03 AM Vedant Kumar <[hidden email]> wrote:
I think I get it now, thanks for explaining!

On Jan 12, 2020, at 11:44 AM, David Blaikie via llvm-dev <[hidden email]> wrote:




On Fri, Jan 10, 2020 at 12:57 PM Vedant Kumar <[hidden email]> wrote:
I don't totally follow the proposed encoding change & would appreciate a small example.

Is the idea to replace e.g. an 'AT_low_pc (<direct address>) + relocation for <direct address>' with an 'AT_low_pc (<indirection into a pool of addresses> + offset)',

With Split DWARF or with DWARFv5 in LLVM at the moment, all addresses are indirected already. So it's:

Replace "AT_low_pc (<indirection into a pool of addresses>)" with an "AT_low_pc (<indirection into a pool of addresses> + offset)".
 
s.t. the cost of a relocation for the address is paid down the more it's used?

Right - specifically to reduce the pool of addresses down to, ideally, one address per section/indivisible chunk of machine code (per subsection in MachO, for instance) (whereas currently there are many addresses per section)
 
How do you figure the offset out?

Label difference - same as is done for DW_AT_high_pc today in DWARFv4 and DWARFv5 in LLVM. high_pc currently uses the low_pc addresse to be relative to, in this proposed situation, we'd use a symbol that's in the first bit of debug info in the section (or subsection in MachO). So the low_pc of the subprogram/function, for instance, or if there are two functions in the same section with debug info for both, the low_pc of the first of those functions, etc... 

If the label difference in a low_pc attribute is relative to the start of a section, could a linker orderfile pass break the dwarf unless it updates the offset?

Nah - terminologically, ELF sections are indivisible - more akin to MachO subsections. ELF files can have multiple sections with the same name (as is used for comdat sections for inline functions, and for -ffunction-sections (roughly equivalent to MachO's "subsections via symbols", as I understand it) (or can use ".text.suffix" naming to give each separate .text section its own name - but the linker strips the suffixes and concatenates all these together into the final linked .text section)
 
Ditto, I suppose, for an intra-function offset when something like propeller is used to reorder basic blocks (I’m thinking of At_call_return_pc now).

Yeah - currently the "base address" for each section is determined by the first function with debug info being emitted in that section ( https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1787 ) - with PROPELLER we'd need to add similar code when function fragments are emitted. (I'm planning to check the PROPELLER work in progress tree soon and do another sanity pass over the debug info emitted to check this is working as intended - in part because this base address selection, coupled with DWARFv5 and maybe with the changes I'm suggesting in this thread (& will commit under flags "soon" (might take me a week or two judging by my review/bug/investigation load right now... *fingers crossed*)) might make PROPELLER less expensive in terms of debug info size, or more expensive relative to the significant improvements this provides)

Owing to the way MachO debug info distribution works differently & if I understand correctly doesn't need relocations in many cases due to DWARF-aware parsing/linking (& if it does use relocations, I've no knowledge of when/how and how big they are compared to the ELF relocations I've been measuring) it's quite possible MachO would have different tradeoffs in this space.
 
Apologies if this has been answered elsewhere, I suppose there must be a solution for this for At_high_pc to work.

vedant


 

thanks,
vedant

On Jan 8, 2020, at 1:33 PM, David Blaikie via llvm-dev <[hidden email]> wrote:

Sounds good all round - I'll commit these two modes, and maybe even the third (given Sony's interest & possible interest in changing their consumer to handle it) of a custom form to eek out the last few bytes from the more direct addr+offset encoding.

I'll follow up here with flag names and revision numbers once they're in.

On Wed, Jan 8, 2020 at 1:26 PM Robinson, Paul <[hidden email]> wrote:
On some previous occasion that introduced additional indirection
(don't remember the details) my debugger people groused about the
additional performance cost of chasing down data in a different
object-file section.  So we (Sony) might be happier with low_pc as
expressions, than with a ranges-always solution.

But hard to say without data, and getting both modes in at least
as a temporary thing sounds like a good plan.
--paulr


> -----Original Message-----
> From: [hidden email] <[hidden email]>
> Sent: Wednesday, January 8, 2020 1:49 PM
> To: David Blaikie <[hidden email]>
> Cc: llvm-dev <[hidden email]>; Jonas Devlieghere
> <[hidden email]>; Robinson, Paul <[hidden email]>; Eric
> Christopher <[hidden email]>; Frederic Riss <[hidden email]>
> Subject: Re: Increasing address pool reuse/reducing .o file size in
> DWARFv5
>
> I think this sounds like a good plan for Linux. I would like to see the
> numbers for Darwin (= non-split DWARF) to decide whether we should just
> make that the default. Eric's suggestion of having this committed as an
> option first seems like a good step in that direction. If it is an
> advantage across the board we can remove the option and just make this the
> default behavior.
>
> thanks,
> adrian
>
> > On Dec 30, 2019, at 12:08 PM, David Blaikie <[hidden email]> wrote:
> >
> > tl;dr: in DWARFv5, using DW_AT_ranges even when the range is contiguous
> reduces linked, uncompressed debug_addr size for optimized builds by 93%
> and reduces total .o file size (with compression and split) by 15%. It
> does grow .dwo file size a bit - DWARFv5, no compression, not split shows
> the net effect if all bytes are equal: -O3 clang binary grows by 0.4%, -O0
> clang binary shrinks by 0.1%
> > Should we enable this strategy by default for DWARFv5, for DWARFv5+Split
> DWARF, or not by default at all/only under a flag?
> >
> >
> >
> > So, I've brought this up a few times before - that DWARFv5 does a pretty
> good job of reducing relocations (& reducing .o file size with Split
> DWARF) by allowing many uses of addresses to include some kind of
> address+offset (debug_rnglists and loclists allowing "base_address" then
> offset_pairs (an improvement over similar functionality in DWARFv4 because
> the offset pairs can be uleb encoded - so they can be quite compact))
> >
> > But one place that DWARFv5 misses to reduce relocations further is
> direct addresses from debug_info, such as DW_AT_low_pc.
> >
> > For a while I've wondered if we could use an extension form for
> addr+offset, and I prototyped this without an extension attribute, but
> instead using exprloc. This has slightly higher overhead to express the...
> expression. (it's 9 bytes in total, could be as few as 5 with a custom
> form)
> >
> > But I had another idea that's more instantly deployable: Why not use
> DW_AT_ranges even when the range is contiguous? That way the low_pc that
> previously couldn't use an existing address pool entry + offset, could use
> the rnglist support for base address.
> >
> > The only unnecessary address pool entries that remain that I've found
> are DW_AT_low_pc for DW_TAG_labels - but there's only a handful of those
> in most code. So the "ranges everywhere" strategy gets the addresses for
> optimized clang down from 4758 (v4 address pool used 9923 addresses... )
> to 342, with about ~4 "extra" addresses for DW_TAG_labels.
> >
> > This could also be a bit less costly if DWARFv5 rnglists didn't use a
> separate offset table (instead encoding the offsets directly in
> debug_info, rather than using indexes)
> >
> > I have patches for both the addr+offset exprloc and for the ranges-
> always, both with -mllvm flags - do people think they're both worth
> committing for experimentation? Neither? Default on in some cases (like
> Split DWARF)?
> >
> > Thanks,
> > - Dave

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Increasing address pool reuse/reducing .o file size in DWARFv5

Adrian Prantl via llvm-dev


On Jan 13, 2020, at 9:20 AM, David Blaikie via llvm-dev <[hidden email]> wrote:



On Mon, Jan 13, 2020 at 9:03 AM Vedant Kumar <[hidden email]> wrote:
I think I get it now, thanks for explaining!

On Jan 12, 2020, at 11:44 AM, David Blaikie via llvm-dev <[hidden email]> wrote:




On Fri, Jan 10, 2020 at 12:57 PM Vedant Kumar <[hidden email]> wrote:
I don't totally follow the proposed encoding change & would appreciate a small example.

Is the idea to replace e.g. an 'AT_low_pc (<direct address>) + relocation for <direct address>' with an 'AT_low_pc (<indirection into a pool of addresses> + offset)',

With Split DWARF or with DWARFv5 in LLVM at the moment, all addresses are indirected already. So it's:

Replace "AT_low_pc (<indirection into a pool of addresses>)" with an "AT_low_pc (<indirection into a pool of addresses> + offset)".
 
s.t. the cost of a relocation for the address is paid down the more it's used?

Right - specifically to reduce the pool of addresses down to, ideally, one address per section/indivisible chunk of machine code (per subsection in MachO, for instance) (whereas currently there are many addresses per section)
 
How do you figure the offset out?

Label difference - same as is done for DW_AT_high_pc today in DWARFv4 and DWARFv5 in LLVM. high_pc currently uses the low_pc addresse to be relative to, in this proposed situation, we'd use a symbol that's in the first bit of debug info in the section (or subsection in MachO). So the low_pc of the subprogram/function, for instance, or if there are two functions in the same section with debug info for both, the low_pc of the first of those functions, etc... 

If the label difference in a low_pc attribute is relative to the start of a section, could a linker orderfile pass break the dwarf unless it updates the offset? 

Nah - terminologically, ELF sections are indivisible - more akin to MachO subsections. ELF files can have multiple sections with the same name (as is used for comdat sections for inline functions, and for -ffunction-sections (roughly equivalent to MachO's "subsections via symbols", as I understand it) (or can use ".text.suffix" naming to give each separate .text section its own name - but the linker strips the suffixes and concatenates all these together into the final linked .text section)

I see, so an ELF linker may reorder sections relative to each other, but not the contents of a section. (That matches up with what I've read elsewhere - you'd use -ffunction-sections to reorder function symbols, IIRC.)

And in this proposal to increase address pool reuse, label differences in a MachO would be relative to the subsection. In Propeller, is basic block reordering done after a .o is emitted? If so, I suppose I don't yet see how the proposed scheme is resilient to this reordering. OTOH if block reordering is done just before the label difference is evaluated, then there shouldn't be any issue.
 
Ditto, I suppose, for an intra-function offset when something like propeller is used to reorder basic blocks (I’m thinking of At_call_return_pc now).

Yeah - currently the "base address" for each section is determined by the first function with debug info being emitted in that section ( https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1787 ) - with PROPELLER we'd need to add similar code when function fragments are emitted. (I'm planning to check the PROPELLER work in progress tree soon and do another sanity pass over the debug info emitted to check this is working as intended - in part because this base address selection, coupled with DWARFv5 and maybe with the changes I'm suggesting in this thread (& will commit under flags "soon" (might take me a week or two judging by my review/bug/investigation load right now... *fingers crossed*)) might make PROPELLER less expensive in terms of debug info size, or more expensive relative to the significant improvements this provides)

Thanks for investigating!

Owing to the way MachO debug info distribution works differently & if I understand correctly doesn't need relocations in many cases due to DWARF-aware parsing/linking (& if it does use relocations, I've no knowledge of when/how and how big they are compared to the ELF relocations I've been measuring) it's quite possible MachO would have different tradeoffs in this space.

A linked .dSYM (analogous to an ELF .dwp, IIUC) doesn't contain relocations for AT_low_pc or AT_call_return_pc in the simple examples I tried out. We do emit relocations for those attributes in MachO object files (there isn't something analogous to a .dwo on MachO, the debug info just goes into a different set of sections in the .o). My understanding (based on the definition of `macho_relocation_info` in the ld64 sources) is that MachO relocations are 8 bytes in size. It looks like ELF rel/rela relocations are 16/24 bytes in size, but I'm not sure why (perhaps they're more extensible / encode more information).

Would a vanilla DWARFv4 .dwp (without your patches applied) contain a relocation for each 'AT_low_pc (<direct address>)'?

vedant

 
Apologies if this has been answered elsewhere, I suppose there must be a solution for this for At_high_pc to work.

vedant


 

thanks,
vedant

On Jan 8, 2020, at 1:33 PM, David Blaikie via llvm-dev <[hidden email]> wrote:

Sounds good all round - I'll commit these two modes, and maybe even the third (given Sony's interest & possible interest in changing their consumer to handle it) of a custom form to eek out the last few bytes from the more direct addr+offset encoding.

I'll follow up here with flag names and revision numbers once they're in.

On Wed, Jan 8, 2020 at 1:26 PM Robinson, Paul <[hidden email]> wrote:
On some previous occasion that introduced additional indirection
(don't remember the details) my debugger people groused about the
additional performance cost of chasing down data in a different 
object-file section.  So we (Sony) might be happier with low_pc as
expressions, than with a ranges-always solution.

But hard to say without data, and getting both modes in at least
as a temporary thing sounds like a good plan.
--paulr


> -----Original Message-----
> From: [hidden email] <[hidden email]>
> Sent: Wednesday, January 8, 2020 1:49 PM
> To: David Blaikie <[hidden email]>
> Cc: llvm-dev <[hidden email]>; Jonas Devlieghere
> <[hidden email]>; Robinson, Paul <[hidden email]>; Eric
> Christopher <[hidden email]>; Frederic Riss <[hidden email]>
> Subject: Re: Increasing address pool reuse/reducing .o file size in
> DWARFv5
> 
> I think this sounds like a good plan for Linux. I would like to see the
> numbers for Darwin (= non-split DWARF) to decide whether we should just
> make that the default. Eric's suggestion of having this committed as an
> option first seems like a good step in that direction. If it is an
> advantage across the board we can remove the option and just make this the
> default behavior.
> 
> thanks,
> adrian
> 
> > On Dec 30, 2019, at 12:08 PM, David Blaikie <[hidden email]> wrote:
> >
> > tl;dr: in DWARFv5, using DW_AT_ranges even when the range is contiguous
> reduces linked, uncompressed debug_addr size for optimized builds by 93%
> and reduces total .o file size (with compression and split) by 15%. It
> does grow .dwo file size a bit - DWARFv5, no compression, not split shows
> the net effect if all bytes are equal: -O3 clang binary grows by 0.4%, -O0
> clang binary shrinks by 0.1%
> > Should we enable this strategy by default for DWARFv5, for DWARFv5+Split
> DWARF, or not by default at all/only under a flag?
> >
> >
> >
> > So, I've brought this up a few times before - that DWARFv5 does a pretty
> good job of reducing relocations (& reducing .o file size with Split
> DWARF) by allowing many uses of addresses to include some kind of
> address+offset (debug_rnglists and loclists allowing "base_address" then
> offset_pairs (an improvement over similar functionality in DWARFv4 because
> the offset pairs can be uleb encoded - so they can be quite compact))
> >
> > But one place that DWARFv5 misses to reduce relocations further is
> direct addresses from debug_info, such as DW_AT_low_pc.
> >
> > For a while I've wondered if we could use an extension form for
> addr+offset, and I prototyped this without an extension attribute, but
> instead using exprloc. This has slightly higher overhead to express the...
> expression. (it's 9 bytes in total, could be as few as 5 with a custom
> form)
> >
> > But I had another idea that's more instantly deployable: Why not use
> DW_AT_ranges even when the range is contiguous? That way the low_pc that
> previously couldn't use an existing address pool entry + offset, could use
> the rnglist support for base address.
> >
> > The only unnecessary address pool entries that remain that I've found
> are DW_AT_low_pc for DW_TAG_labels - but there's only a handful of those
> in most code. So the "ranges everywhere" strategy gets the addresses for
> optimized clang down from 4758 (v4 address pool used 9923 addresses... )
> to 342, with about ~4 "extra" addresses for DW_TAG_labels.
> >
> > This could also be a bit less costly if DWARFv5 rnglists didn't use a
> separate offset table (instead encoding the offsets directly in
> debug_info, rather than using indexes)
> >
> > I have patches for both the addr+offset exprloc and for the ranges-
> always, both with -mllvm flags - do people think they're both worth
> committing for experimentation? Neither? Default on in some cases (like
> Split DWARF)?
> >
> > Thanks,
> > - Dave

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] Increasing address pool reuse/reducing .o file size in DWARFv5

Adrian Prantl via llvm-dev


On Mon, Jan 13, 2020 at 1:39 PM Vedant Kumar <[hidden email]> wrote:


On Jan 13, 2020, at 9:20 AM, David Blaikie via llvm-dev <[hidden email]> wrote:



On Mon, Jan 13, 2020 at 9:03 AM Vedant Kumar <[hidden email]> wrote:
I think I get it now, thanks for explaining!

On Jan 12, 2020, at 11:44 AM, David Blaikie via llvm-dev <[hidden email]> wrote:




On Fri, Jan 10, 2020 at 12:57 PM Vedant Kumar <[hidden email]> wrote:
I don't totally follow the proposed encoding change & would appreciate a small example.

Is the idea to replace e.g. an 'AT_low_pc (<direct address>) + relocation for <direct address>' with an 'AT_low_pc (<indirection into a pool of addresses> + offset)',

With Split DWARF or with DWARFv5 in LLVM at the moment, all addresses are indirected already. So it's:

Replace "AT_low_pc (<indirection into a pool of addresses>)" with an "AT_low_pc (<indirection into a pool of addresses> + offset)".
 
s.t. the cost of a relocation for the address is paid down the more it's used?

Right - specifically to reduce the pool of addresses down to, ideally, one address per section/indivisible chunk of machine code (per subsection in MachO, for instance) (whereas currently there are many addresses per section)
 
How do you figure the offset out?

Label difference - same as is done for DW_AT_high_pc today in DWARFv4 and DWARFv5 in LLVM. high_pc currently uses the low_pc addresse to be relative to, in this proposed situation, we'd use a symbol that's in the first bit of debug info in the section (or subsection in MachO). So the low_pc of the subprogram/function, for instance, or if there are two functions in the same section with debug info for both, the low_pc of the first of those functions, etc... 

If the label difference in a low_pc attribute is relative to the start of a section, could a linker orderfile pass break the dwarf unless it updates the offset? 

Nah - terminologically, ELF sections are indivisible - more akin to MachO subsections. ELF files can have multiple sections with the same name (as is used for comdat sections for inline functions, and for -ffunction-sections (roughly equivalent to MachO's "subsections via symbols", as I understand it) (or can use ".text.suffix" naming to give each separate .text section its own name - but the linker strips the suffixes and concatenates all these together into the final linked .text section)

I see, so an ELF linker may reorder sections relative to each other, but not the contents of a section. (That matches up with what I've read elsewhere - you'd use -ffunction-sections to reorder function symbols, IIRC.)

Right.
 
And in this proposal to increase address pool reuse, label differences in a MachO would be relative to the subsection.

Even before my proposal, there are already many cases where rnglists and loclists in DWARFv5 (& location lists in DWARFv4) will use selectively chosen base addresses and symbol differences as often as possible (insofar as I could do that when working/experimenting with ELF).

So without function sections, for instance - rnglists for sub-function ranges (ignoring PROPELLER for now/in this part of the discussion).

Perhaps an example would be helpful. Here's LLVM's current behavior with DWARFv5 and ELF, without function sections:

int f1();
void f2() {
  if (int i = f1()) {
    f1();
  }
}
void f3() {
  if (f1()) {
    int i = f1();
  }
}
__attribute__((section(".other"))) void f4() {
}


In this code there are only two ELF sections (".text" contains the definitions of f2 and f3, ".other" contains the definition of f4) and so we /should/ be able to only have 2 relocations in the debug info.

(I'm exploiting something of a bug/quirk in Clang/LLVM's debug info that causes, even at -O0, the lexical_block for the 'if' to have a hole in it, where the call to f1 is, so it has ranges rather than low/high pc)

In DWARFv4 this example would've used 10 relocations. (on the CU ranges, there would be begin/end for the ".text" range covering f2 and f3, and begin/end for the ".other" range covering f4, then the range list for the "if" lexical_block would contain another 2 pairs (4 addresses/relocations), one relocation for f2's low_pc, one for f3's 'if' lexical_block).

In DWARFv5, we see the following:

0x00000014: [DW_RLE_base_addressx]:  0x0000000000000000
0x00000016: [DW_RLE_offset_pair  ]:  0x0000000000000008, 0x0000000000000014
0x00000019: [DW_RLE_offset_pair  ]:  0x000000000000001a, 0x000000000000001f
0x0000001c: [DW_RLE_end_of_list  ]
0x0000001d: [DW_RLE_startx_length]:  0x0000000000000000, 0x0000000000000036
0x00000020: [DW_RLE_startx_length]:  0x0000000000000002, 0x0000000000000006
0x00000023: [DW_RLE_end_of_list  ]


The first location list is for the 'if' scope, the second is for the CU. Both are able to efficiently select encodings and base addresses.

But the debug_addr has 4 addresses in it - the address at index 1 (not used in the rnglists shown above - we see index 0 and index 2 are used there) is for the low_pc of f3's subprogram, and the address at index 2 is for the low_pc of f3's if block/scope.

That's the address/relocation that would be... addressed by the change I'm proposing. One way to avoid that relocation would be to encode f3's address range using a rnglist - this is fully backwards compatible, and would produce a rnglist like this:

[DW_RLE_base_addressx]:  0x0000000000000000
[DW_RLE_offset_pair  ]:  0x0000000000000030, 0x0000000000000036
[DW_RLE_end_of_list  ]

Similarly, f3's if block could use a rangelist like:

[DW_RLE_base_addressx]:  0x0000000000000000
[DW_RLE_offset_pair  ]:  0x0000000000000046, 0x0000000000000054
[DW_RLE_end_of_list  ]

As you can imagine, there are quite a few ranges (especially once you get inlining) that use low/high_pc, and could benefit from the reduction in relocations by using this strategy. Though it isn't optimal (the range list encoding isn't intended to be good for this use case) in terms of size cost - hence the possibility of using DWARF expressions for address class attributes, or a custom form that would more directly encode the <indirect address> + <offset>.

In Propeller, is basic block reordering done after a .o is emitted?

Yes.
 
If so, I suppose I don't yet see how the proposed scheme is resilient to this reordering.

With PROPELLER any function that is fragmented into reorderable sections must necessarily use ranges to describe the function's address range - but, again, choosing base addresses strategically & using relative references whenever possible, would help reduce the cost of PROPELLER's debug info.
 
OTOH if block reordering is done just before the label difference is evaluated, then there shouldn't be any issue.
 
Ditto, I suppose, for an intra-function offset when something like propeller is used to reorder basic blocks (I’m thinking of At_call_return_pc now).

Yeah - currently the "base address" for each section is determined by the first function with debug info being emitted in that section ( https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/AsmPrinter/DwarfDebug.cpp#L1787 ) - with PROPELLER we'd need to add similar code when function fragments are emitted. (I'm planning to check the PROPELLER work in progress tree soon and do another sanity pass over the debug info emitted to check this is working as intended - in part because this base address selection, coupled with DWARFv5 and maybe with the changes I'm suggesting in this thread (& will commit under flags "soon" (might take me a week or two judging by my review/bug/investigation load right now... *fingers crossed*)) might make PROPELLER less expensive in terms of debug info size, or more expensive relative to the significant improvements this provides)

Thanks for investigating!

Owing to the way MachO debug info distribution works differently & if I understand correctly doesn't need relocations in many cases due to DWARF-aware parsing/linking (& if it does use relocations, I've no knowledge of when/how and how big they are compared to the ELF relocations I've been measuring) it's quite possible MachO would have different tradeoffs in this space.

A linked .dSYM (analogous to an ELF .dwp, IIUC) doesn't contain relocations for AT_low_pc or AT_call_return_pc in the simple examples I tried out. We do emit relocations for those attributes in MachO object files (there isn't something analogous to a .dwo on MachO, the debug info just goes into a different set of sections in the .o). My understanding (based on the definition of `macho_relocation_info` in the ld64 sources) is that MachO relocations are 8 bytes in size. It looks like ELF rel/rela relocations are 16/24 bytes in size, but I'm not sure why (perhaps they're more extensible / encode more information).

OK *nod* with the smaller encoding it may be less of a pressing issue for you & the tradeoff may be different.
 
Would a vanilla DWARFv4 .dwp (without your patches applied) contain a relocation for each 'AT_low_pc (<direct address>)'?

DWP files contain no direct addresses - they are all indirect through the address pool. But, yes, for a DWARFv4 Split DWARF build, low_pcs don't have an opportunity to reuse a strategically chosen base address - they have to use an addrx form & the debug_addr section would have that specific address with a relocation for it.
 

vedant

 
Apologies if this has been answered elsewhere, I suppose there must be a solution for this for At_high_pc to work.

vedant


 

thanks,
vedant

On Jan 8, 2020, at 1:33 PM, David Blaikie via llvm-dev <[hidden email]> wrote:

Sounds good all round - I'll commit these two modes, and maybe even the third (given Sony's interest & possible interest in changing their consumer to handle it) of a custom form to eek out the last few bytes from the more direct addr+offset encoding.

I'll follow up here with flag names and revision numbers once they're in.

On Wed, Jan 8, 2020 at 1:26 PM Robinson, Paul <[hidden email]> wrote:
On some previous occasion that introduced additional indirection
(don't remember the details) my debugger people groused about the
additional performance cost of chasing down data in a different 
object-file section.  So we (Sony) might be happier with low_pc as
expressions, than with a ranges-always solution.

But hard to say without data, and getting both modes in at least
as a temporary thing sounds like a good plan.
--paulr


> -----Original Message-----
> From: [hidden email] <[hidden email]>
> Sent: Wednesday, January 8, 2020 1:49 PM
> To: David Blaikie <[hidden email]>
> Cc: llvm-dev <[hidden email]>; Jonas Devlieghere
> <[hidden email]>; Robinson, Paul <[hidden email]>; Eric
> Christopher <[hidden email]>; Frederic Riss <[hidden email]>
> Subject: Re: Increasing address pool reuse/reducing .o file size in
> DWARFv5
> 
> I think this sounds like a good plan for Linux. I would like to see the
> numbers for Darwin (= non-split DWARF) to decide whether we should just
> make that the default. Eric's suggestion of having this committed as an
> option first seems like a good step in that direction. If it is an
> advantage across the board we can remove the option and just make this the
> default behavior.
> 
> thanks,
> adrian
> 
> > On Dec 30, 2019, at 12:08 PM, David Blaikie <[hidden email]> wrote:
> >
> > tl;dr: in DWARFv5, using DW_AT_ranges even when the range is contiguous
> reduces linked, uncompressed debug_addr size for optimized builds by 93%
> and reduces total .o file size (with compression and split) by 15%. It
> does grow .dwo file size a bit - DWARFv5, no compression, not split shows
> the net effect if all bytes are equal: -O3 clang binary grows by 0.4%, -O0
> clang binary shrinks by 0.1%
> > Should we enable this strategy by default for DWARFv5, for DWARFv5+Split
> DWARF, or not by default at all/only under a flag?
> >
> >
> >
> > So, I've brought this up a few times before - that DWARFv5 does a pretty
> good job of reducing relocations (& reducing .o file size with Split
> DWARF) by allowing many uses of addresses to include some kind of
> address+offset (debug_rnglists and loclists allowing "base_address" then
> offset_pairs (an improvement over similar functionality in DWARFv4 because
> the offset pairs can be uleb encoded - so they can be quite compact))
> >
> > But one place that DWARFv5 misses to reduce relocations further is
> direct addresses from debug_info, such as DW_AT_low_pc.
> >
> > For a while I've wondered if we could use an extension form for
> addr+offset, and I prototyped this without an extension attribute, but
> instead using exprloc. This has slightly higher overhead to express the...
> expression. (it's 9 bytes in total, could be as few as 5 with a custom
> form)
> >
> > But I had another idea that's more instantly deployable: Why not use
> DW_AT_ranges even when the range is contiguous? That way the low_pc that
> previously couldn't use an existing address pool entry + offset, could use
> the rnglist support for base address.
> >
> > The only unnecessary address pool entries that remain that I've found
> are DW_AT_low_pc for DW_TAG_labels - but there's only a handful of those
> in most code. So the "ranges everywhere" strategy gets the addresses for
> optimized clang down from 4758 (v4 address pool used 9923 addresses... )
> to 342, with about ~4 "extra" addresses for DW_TAG_labels.
> >
> > This could also be a bit less costly if DWARFv5 rnglists didn't use a
> separate offset table (instead encoding the offsets directly in
> debug_info, rather than using indexes)
> >
> > I have patches for both the addr+offset exprloc and for the ranges-
> always, both with -mllvm flags - do people think they're both worth
> committing for experimentation? Neither? Default on in some cases (like
> Split DWARF)?
> >
> > Thanks,
> > - Dave

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev