RFC: Reduce the memory footprint of DIEs (and DIEValues)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

RFC: Reduce the memory footprint of DIEs (and DIEValues)

Duncan P. N. Exon Smith-2
Pete Cooper and I have been looking at memory profiles of running llc on
verify-uselistorder.lto.opt.bc (ld -save-temps dump just before CodeGen
of building verify-uselistorder with -flto -g).  I've attached
leak-backend.patch, which we're using to make Intrustruments more
accurate (instead of effectively leaking things onto BumpPtrAllocators,
really leak them with malloc()).  (I've collected this data on top of a
few not-yet-committed patches to cheapen `MCSymbol` and
`EmitLabelDifference()` that chop around 8% of memory off the top, but
otherwise these numbers should be reproducible in ToT.)

The `DIE` class is huge.  Directly, it accounts for about 15% of backend
memory:

    Bytes Used Count Symbol Name
      77.87 MB       8.4% 318960   llvm::DwarfUnit::createAndAddDIE(unsigned int, llvm::DIE&, llvm::DINode const*)
      46.34 MB       5.0% 189810   llvm::DwarfCompileUnit::constructVariableDIEImpl(llvm::DbgVariable const&, bool)
      25.57 MB       2.7% 104752   llvm::DwarfCompileUnit::constructInlinedScopeDIE(llvm::LexicalScope*)
       8.19 MB       0.8% 33547   llvm::DwarfCompileUnit::constructImportedEntityDIE(llvm::DIImportedEntity const*)

A lot of this is the pair of `SmallVector<, 12>` it has for its values
(look into `DIEAbbrev` for the second one).  Here's a histogram of how
many DIEs have each value count:

    # of Values  DIEs with #  with # or fewer
              0         3128             3128
              1       109522           112650
              2       180382           293032
              3        90836           383868
              4       115552           499420
              5        90713           590133
              6         4125           594258
              7        17211           611469
              8        18144           629613
              9        22805           652418
             10          325           652743
             11          203           652946
             12          245           653191

It's crazy that we're paying for 12 up front on every DIE.  (This is
a reformatted version of num-values-with-totals.txt, which I've
attached along with a few other histograms Pete collected.)

The `DIEValue`s themselves, which get leaked on the BumpPtrAllocator,
also take up a huge amount of memory (around 4%):

    Graph Category Persistent Bytes # Persistent # Transient Total Bytes # Total Transient/Total Bytes
    0 llvm::DIEInteger 19.91 MB 652389 0 19.91 MB 652389 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
    0 llvm::DIEString 13.83 MB 302181 0 13.83 MB 302181 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
    0 llvm::DIEEntry 10.91 MB 357506 0 10.91 MB 357506 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
    0 llvm::DIEDelta 10.03 MB 328542 0 10.03 MB 328542 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
    0 llvm::DIELabel 5.14 MB 168551 0 5.14 MB 168551 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
    0 llvm::DIELoc 3.41 MB 13154 0 3.41 MB 13154 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
    0 llvm::DIELocList 1.86 MB 61055 0 1.86 MB 61055 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
    0 llvm::DIEBlock 11.69 KB 44 0 11.69 KB 44 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
    0 llvm::DIEExpr 32 Bytes 1 0 32 Bytes 1 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00

We can do better.

 1. DIEValue should be a discriminated union that's passed by value
    instead of pointer.  Most types just have 1 pointer of data.  There
    are four "big" ones, which still need a side-allocation on the
    BumpPtrAllocator: DIELoc, DIEBlock, DIEString, and DIEDelta.
    Even for these, the side allocation just needs to store the data
    itself (skipping the discriminator and the vtable entry).
 2. The contents of DIE's Abbrev field should be integrated with the
    list of DIEValues.  In particular, DIEValue should contain a
    `dwarf::Form` and `dwarf::Attribute`.  In total, `sizeof(DIEValue)`
    will still be just two pointers (1st pointer: discriminator, Form,
    and Attribute; 2nd pointer: data).  DIE should stop storing a
    `DIEAbbrev` itself, instead constructing one on demand, renaming
    `DIE::getAbbrev()` to
    `DIE::getOrCreateAbbrev(FoldingSet<DIEAbbrev>&)` or some such.
 3. DIE's list of DIEValues is currently a `SmallVector<, 12>`, but a
    histogram Pete ran shows that half of DIEs have 2 or fewer values,
    and 85% have 4 or fewer values.  We're paying for 12 (!) upfront
    right now for each DIE.  Instead, we should optimize for 2-4
    DIEValues.  Not sure whether a std::forward_list would suffice, or if
    we should do something fancy like:

        struct List {
          DIEValue Values[2];
          PointerIntPair<List *, 1> NextAndSize;
        };

    Either way we should move the allocations to a BumpPtrAllocator
    (trivial if it's a list instead of vector).
 4. `DIEBlock` and `DIELoc` inherit both from `DIEValue` and `DIE`, but
    they're only ever used as the former.  This is just a convenience
    for building up and emitting their DIEValues.  Now that we've trimmed
    down and simplified that functionality in `DIE`, we can extract it
    out and make it reusable -- `DIELoc` should "have-a" DIEValue list,
    not "be-a" DIE.
 5. The children of DIE are stored in a `vector<unique_ptr<DIE>>`, which
    requires side allocations.  If we use an intrusively linked list,
    it'll be easy to avoid side allocations without hitting the
    pointer-validity problem highlighted in the header file.
 6. Now that DIE has no side allocations, we can move all the DIEs to a
    BumpPtrAllocator and remove the malloc traffic.


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

leak-backend.patch (5K) Download Attachment
num-children-by-tag.txt (18K) Download Attachment
num-values-by-tag.txt (2K) Download Attachment
num-values-with-totals.txt (328 bytes) Download Attachment
num-values.txt (289 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Reduce the memory footprint of DIEs (and DIEValues)

Frédéric Riss
This is awesome.

> On May 20, 2015, at 11:28 AM, Duncan P. N. Exon Smith <[hidden email]> wrote:
>
> Pete Cooper and I have been looking at memory profiles of running llc on
> verify-uselistorder.lto.opt.bc (ld -save-temps dump just before CodeGen
> of building verify-uselistorder with -flto -g).  I've attached
> leak-backend.patch, which we're using to make Intrustruments more
> accurate (instead of effectively leaking things onto BumpPtrAllocators,
> really leak them with malloc()).  (I've collected this data on top of a
> few not-yet-committed patches to cheapen `MCSymbol` and
> `EmitLabelDifference()` that chop around 8% of memory off the top, but
> otherwise these numbers should be reproducible in ToT.)
>
> The `DIE` class is huge.  Directly, it accounts for about 15% of backend
> memory:
>
>    Bytes Used Count Symbol Name
>      77.87 MB       8.4% 318960   llvm::DwarfUnit::createAndAddDIE(unsigned int, llvm::DIE&, llvm::DINode const*)
>      46.34 MB       5.0% 189810   llvm::DwarfCompileUnit::constructVariableDIEImpl(llvm::DbgVariable const&, bool)
>      25.57 MB       2.7% 104752   llvm::DwarfCompileUnit::constructInlinedScopeDIE(llvm::LexicalScope*)
>       8.19 MB       0.8% 33547   llvm::DwarfCompileUnit::constructImportedEntityDIE(llvm::DIImportedEntity const*)
>
> A lot of this is the pair of `SmallVector<, 12>` it has for its values
> (look into `DIEAbbrev` for the second one).  Here's a histogram of how
> many DIEs have each value count:
>
>    # of Values  DIEs with #  with # or fewer
>              0         3128             3128
>              1       109522           112650
>              2       180382           293032
>              3        90836           383868
>              4       115552           499420
>              5        90713           590133
>              6         4125           594258
>              7        17211           611469
>              8        18144           629613
>              9        22805           652418
>             10          325           652743
>             11          203           652946
>             12          245           653191
>
> It's crazy that we're paying for 12 up front on every DIE.  (This is
> a reformatted version of num-values-with-totals.txt, which I've
> attached along with a few other histograms Pete collected.)
>
> The `DIEValue`s themselves, which get leaked on the BumpPtrAllocator,
> also take up a huge amount of memory (around 4%):
>
>    Graph Category Persistent Bytes # Persistent # Transient Total Bytes # Total Transient/Total Bytes
>    0 llvm::DIEInteger 19.91 MB 652389 0 19.91 MB 652389 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>    0 llvm::DIEString 13.83 MB 302181 0 13.83 MB 302181 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>    0 llvm::DIEEntry 10.91 MB 357506 0 10.91 MB 357506 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>    0 llvm::DIEDelta 10.03 MB 328542 0 10.03 MB 328542 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>    0 llvm::DIELabel 5.14 MB 168551 0 5.14 MB 168551 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>    0 llvm::DIELoc 3.41 MB 13154 0 3.41 MB 13154 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>    0 llvm::DIELocList 1.86 MB 61055 0 1.86 MB 61055 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>    0 llvm::DIEBlock 11.69 KB 44 0 11.69 KB 44 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>    0 llvm::DIEExpr 32 Bytes 1 0 32 Bytes 1 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>
> We can do better.
>
> 1. DIEValue should be a discriminated union that's passed by value
>    instead of pointer.  Most types just have 1 pointer of data.  There
>    are four "big" ones, which still need a side-allocation on the
>    BumpPtrAllocator: DIELoc, DIEBlock, DIEString, and DIEDelta.
>    Even for these, the side allocation just needs to store the data
>    itself (skipping the discriminator and the vtable entry).
> 2. The contents of DIE's Abbrev field should be integrated with the
>    list of DIEValues.  In particular, DIEValue should contain a
>    `dwarf::Form` and `dwarf::Attribute`.  In total, `sizeof(DIEValue)`
>    will still be just two pointers (1st pointer: discriminator, Form,
>    and Attribute; 2nd pointer: data).  DIE should stop storing a
>    `DIEAbbrev` itself, instead constructing one on demand, renaming
>    `DIE::getAbbrev()` to
>    `DIE::getOrCreateAbbrev(FoldingSet<DIEAbbrev>&)` or some such.
> 3. DIE's list of DIEValues is currently a `SmallVector<, 12>`, but a
>    histogram Pete ran shows that half of DIEs have 2 or fewer values,
>    and 85% have 4 or fewer values.  We're paying for 12 (!) upfront
>    right now for each DIE.  Instead, we should optimize for 2-4
>    DIEValues.  Not sure whether a std::forward_list would suffice, or if
>    we should do something fancy like:

DIEValues are already allocated from a BumpPtrAllocator, thus using a
forward_list wouldn’t be practical. You’d need to use a ilist also or your
fancy alternative.

>        struct List {
>          DIEValue Values[2];
>          PointerIntPair<List *, 1> NextAndSize;
>        };
>
>    Either way we should move the allocations to a BumpPtrAllocator
>    (trivial if it's a list instead of vector).
> 4. `DIEBlock` and `DIELoc` inherit both from `DIEValue` and `DIE`, but
>    they're only ever used as the former.  This is just a convenience
>    for building up and emitting their DIEValues.  Now that we've trimmed
>    down and simplified that functionality in `DIE`, we can extract it
>    out and make it reusable -- `DIELoc` should "have-a" DIEValue list,
>    not "be-a" DIE.

Much needed cleanup!

> 5. The children of DIE are stored in a `vector<unique_ptr<DIE>>`, which
>    requires side allocations.  If we use an intrusively linked list,
>    it'll be easy to avoid side allocations without hitting the
>    pointer-validity problem highlighted in the header file.
> 6. Now that DIE has no side allocations, we can move all the DIEs to a
>    BumpPtrAllocator and remove the malloc traffic.

Thanks!
Fred

> <leak-backend.patch><num-children-by-tag.txt><num-values-by-tag.txt><num-values-with-totals.txt><num-values.txt>


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Reduce the memory footprint of DIEs (and DIEValues)

Duncan P. N. Exon Smith

> On 2015 May 20, at 11:47, Frédéric Riss <[hidden email]> wrote:
>
> This is awesome.
>
>> On May 20, 2015, at 11:28 AM, Duncan P. N. Exon Smith <[hidden email]> wrote:
>>
>> Pete Cooper and I have been looking at memory profiles of running llc on
>> verify-uselistorder.lto.opt.bc (ld -save-temps dump just before CodeGen
>> of building verify-uselistorder with -flto -g).  I've attached
>> leak-backend.patch, which we're using to make Intrustruments more
>> accurate (instead of effectively leaking things onto BumpPtrAllocators,
>> really leak them with malloc()).  (I've collected this data on top of a
>> few not-yet-committed patches to cheapen `MCSymbol` and
>> `EmitLabelDifference()` that chop around 8% of memory off the top, but
>> otherwise these numbers should be reproducible in ToT.)
>>
>> The `DIE` class is huge.  Directly, it accounts for about 15% of backend
>> memory:
>>
>>   Bytes Used Count Symbol Name
>>     77.87 MB       8.4% 318960   llvm::DwarfUnit::createAndAddDIE(unsigned int, llvm::DIE&, llvm::DINode const*)
>>     46.34 MB       5.0% 189810   llvm::DwarfCompileUnit::constructVariableDIEImpl(llvm::DbgVariable const&, bool)
>>     25.57 MB       2.7% 104752   llvm::DwarfCompileUnit::constructInlinedScopeDIE(llvm::LexicalScope*)
>>      8.19 MB       0.8% 33547   llvm::DwarfCompileUnit::constructImportedEntityDIE(llvm::DIImportedEntity const*)
>>
>> A lot of this is the pair of `SmallVector<, 12>` it has for its values
>> (look into `DIEAbbrev` for the second one).  Here's a histogram of how
>> many DIEs have each value count:
>>
>>   # of Values  DIEs with #  with # or fewer
>>             0         3128             3128
>>             1       109522           112650
>>             2       180382           293032
>>             3        90836           383868
>>             4       115552           499420
>>             5        90713           590133
>>             6         4125           594258
>>             7        17211           611469
>>             8        18144           629613
>>             9        22805           652418
>>            10          325           652743
>>            11          203           652946
>>            12          245           653191
>>
>> It's crazy that we're paying for 12 up front on every DIE.  (This is
>> a reformatted version of num-values-with-totals.txt, which I've
>> attached along with a few other histograms Pete collected.)
>>
>> The `DIEValue`s themselves, which get leaked on the BumpPtrAllocator,
>> also take up a huge amount of memory (around 4%):
>>
>>   Graph Category Persistent Bytes # Persistent # Transient Total Bytes # Total Transient/Total Bytes
>>   0 llvm::DIEInteger 19.91 MB 652389 0 19.91 MB 652389 <XRRatioObject: 0x608025658ea0> %0.00, %0.00
>>   0 llvm::DIEString 13.83 MB 302181 0 13.83 MB 302181 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>>   0 llvm::DIEEntry 10.91 MB 357506 0 10.91 MB 357506 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>>   0 llvm::DIEDelta 10.03 MB 328542 0 10.03 MB 328542 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>>   0 llvm::DIELabel 5.14 MB 168551 0 5.14 MB 168551 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>>   0 llvm::DIELoc 3.41 MB 13154 0 3.41 MB 13154 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>>   0 llvm::DIELocList 1.86 MB 61055 0 1.86 MB 61055 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>>   0 llvm::DIEBlock 11.69 KB 44 0 11.69 KB 44 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>>   0 llvm::DIEExpr 32 Bytes 1 0 32 Bytes 1 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>>
>> We can do better.
>>
>> 1. DIEValue should be a discriminated union that's passed by value
>>   instead of pointer.  Most types just have 1 pointer of data.  There
>>   are four "big" ones, which still need a side-allocation on the
>>   BumpPtrAllocator: DIELoc, DIEBlock, DIEString, and DIEDelta.
>>   Even for these, the side allocation just needs to store the data
>>   itself (skipping the discriminator and the vtable entry).
>> 2. The contents of DIE's Abbrev field should be integrated with the
>>   list of DIEValues.  In particular, DIEValue should contain a
>>   `dwarf::Form` and `dwarf::Attribute`.  In total, `sizeof(DIEValue)`
>>   will still be just two pointers (1st pointer: discriminator, Form,
>>   and Attribute; 2nd pointer: data).  DIE should stop storing a
>>   `DIEAbbrev` itself, instead constructing one on demand, renaming
>>   `DIE::getAbbrev()` to
>>   `DIE::getOrCreateAbbrev(FoldingSet<DIEAbbrev>&)` or some such.
>> 3. DIE's list of DIEValues is currently a `SmallVector<, 12>`, but a
>>   histogram Pete ran shows that half of DIEs have 2 or fewer values,
>>   and 85% have 4 or fewer values.  We're paying for 12 (!) upfront
>>   right now for each DIE.  Instead, we should optimize for 2-4
>>   DIEValues.  Not sure whether a std::forward_list would suffice, or if
>>   we should do something fancy like:
>
> DIEValues are already allocated from a BumpPtrAllocator, thus using a
> forward_list wouldn’t be practical. You’d need to use a ilist also or your
> fancy alternative.

Well, we could pass a BumpPtrAllocator to forward_list somehow, but
maybe an ilist would be better.  I'll look more closely once I get
there :).

>
>>       struct List {
>>         DIEValue Values[2];
>>         PointerIntPair<List *, 1> NextAndSize;
>>       };
>>
>>   Either way we should move the allocations to a BumpPtrAllocator
>>   (trivial if it's a list instead of vector).
>> 4. `DIEBlock` and `DIELoc` inherit both from `DIEValue` and `DIE`, but
>>   they're only ever used as the former.  This is just a convenience
>>   for building up and emitting their DIEValues.  Now that we've trimmed
>>   down and simplified that functionality in `DIE`, we can extract it
>>   out and make it reusable -- `DIELoc` should "have-a" DIEValue list,
>>   not "be-a" DIE.
>
> Much needed cleanup!
>
>> 5. The children of DIE are stored in a `vector<unique_ptr<DIE>>`, which
>>   requires side allocations.  If we use an intrusively linked list,
>>   it'll be easy to avoid side allocations without hitting the
>>   pointer-validity problem highlighted in the header file.
>> 6. Now that DIE has no side allocations, we can move all the DIEs to a
>>   BumpPtrAllocator and remove the malloc traffic.
>
> Thanks!
> Fred
>
>> <leak-backend.patch><num-children-by-tag.txt><num-values-by-tag.txt><num-values-with-totals.txt><num-values.txt>
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Reduce the memory footprint of DIEs (and DIEValues)

Duncan P. N. Exon Smith
In reply to this post by Duncan P. N. Exon Smith-2
To make this a little more concrete, I just hacked up a couple of
patches that achieve step #1.  (0004 is the key patch, and probably
should be split up somehow before commit.)  I'll collect some
results and report back.





> On 2015 May 20, at 11:28, Duncan P. N. Exon Smith <[hidden email]> wrote:
>
> Pete Cooper and I have been looking at memory profiles of running llc on
> verify-uselistorder.lto.opt.bc (ld -save-temps dump just before CodeGen
> of building verify-uselistorder with -flto -g).  I've attached
> leak-backend.patch, which we're using to make Intrustruments more
> accurate (instead of effectively leaking things onto BumpPtrAllocators,
> really leak them with malloc()).  (I've collected this data on top of a
> few not-yet-committed patches to cheapen `MCSymbol` and
> `EmitLabelDifference()` that chop around 8% of memory off the top, but
> otherwise these numbers should be reproducible in ToT.)
>
> The `DIE` class is huge.  Directly, it accounts for about 15% of backend
> memory:
>
>    Bytes Used Count Symbol Name
>      77.87 MB       8.4% 318960   llvm::DwarfUnit::createAndAddDIE(unsigned int, llvm::DIE&, llvm::DINode const*)
>      46.34 MB       5.0% 189810   llvm::DwarfCompileUnit::constructVariableDIEImpl(llvm::DbgVariable const&, bool)
>      25.57 MB       2.7% 104752   llvm::DwarfCompileUnit::constructInlinedScopeDIE(llvm::LexicalScope*)
>       8.19 MB       0.8% 33547   llvm::DwarfCompileUnit::constructImportedEntityDIE(llvm::DIImportedEntity const*)
>
> A lot of this is the pair of `SmallVector<, 12>` it has for its values
> (look into `DIEAbbrev` for the second one).  Here's a histogram of how
> many DIEs have each value count:
>
>    # of Values  DIEs with #  with # or fewer
>              0         3128             3128
>              1       109522           112650
>              2       180382           293032
>              3        90836           383868
>              4       115552           499420
>              5        90713           590133
>              6         4125           594258
>              7        17211           611469
>              8        18144           629613
>              9        22805           652418
>             10          325           652743
>             11          203           652946
>             12          245           653191
>
> It's crazy that we're paying for 12 up front on every DIE.  (This is
> a reformatted version of num-values-with-totals.txt, which I've
> attached along with a few other histograms Pete collected.)
>
> The `DIEValue`s themselves, which get leaked on the BumpPtrAllocator,
> also take up a huge amount of memory (around 4%):
>
>    Graph Category Persistent Bytes # Persistent # Transient Total Bytes # Total Transient/Total Bytes
>    0 llvm::DIEInteger 19.91 MB 652389 0 19.91 MB 652389 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>    0 llvm::DIEString 13.83 MB 302181 0 13.83 MB 302181 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>    0 llvm::DIEEntry 10.91 MB 357506 0 10.91 MB 357506 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>    0 llvm::DIEDelta 10.03 MB 328542 0 10.03 MB 328542 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>    0 llvm::DIELabel 5.14 MB 168551 0 5.14 MB 168551 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>    0 llvm::DIELoc 3.41 MB 13154 0 3.41 MB 13154 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>    0 llvm::DIELocList 1.86 MB 61055 0 1.86 MB 61055 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>    0 llvm::DIEBlock 11.69 KB 44 0 11.69 KB 44 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>    0 llvm::DIEExpr 32 Bytes 1 0 32 Bytes 1 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>
> We can do better.
>
> 1. DIEValue should be a discriminated union that's passed by value
>    instead of pointer.  Most types just have 1 pointer of data.  There
>    are four "big" ones, which still need a side-allocation on the
>    BumpPtrAllocator: DIELoc, DIEBlock, DIEString, and DIEDelta.
>    Even for these, the side allocation just needs to store the data
>    itself (skipping the discriminator and the vtable entry).
> 2. The contents of DIE's Abbrev field should be integrated with the
>    list of DIEValues.  In particular, DIEValue should contain a
>    `dwarf::Form` and `dwarf::Attribute`.  In total, `sizeof(DIEValue)`
>    will still be just two pointers (1st pointer: discriminator, Form,
>    and Attribute; 2nd pointer: data).  DIE should stop storing a
>    `DIEAbbrev` itself, instead constructing one on demand, renaming
>    `DIE::getAbbrev()` to
>    `DIE::getOrCreateAbbrev(FoldingSet<DIEAbbrev>&)` or some such.
> 3. DIE's list of DIEValues is currently a `SmallVector<, 12>`, but a
>    histogram Pete ran shows that half of DIEs have 2 or fewer values,
>    and 85% have 4 or fewer values.  We're paying for 12 (!) upfront
>    right now for each DIE.  Instead, we should optimize for 2-4
>    DIEValues.  Not sure whether a std::forward_list would suffice, or if
>    we should do something fancy like:
>
>        struct List {
>          DIEValue Values[2];
>          PointerIntPair<List *, 1> NextAndSize;
>        };
>
>    Either way we should move the allocations to a BumpPtrAllocator
>    (trivial if it's a list instead of vector).
> 4. `DIEBlock` and `DIELoc` inherit both from `DIEValue` and `DIE`, but
>    they're only ever used as the former.  This is just a convenience
>    for building up and emitting their DIEValues.  Now that we've trimmed
>    down and simplified that functionality in `DIE`, we can extract it
>    out and make it reusable -- `DIELoc` should "have-a" DIEValue list,
>    not "be-a" DIE.
> 5. The children of DIE are stored in a `vector<unique_ptr<DIE>>`, which
>    requires side allocations.  If we use an intrusively linked list,
>    it'll be easy to avoid side allocations without hitting the
>    pointer-validity problem highlighted in the header file.
> 6. Now that DIE has no side allocations, we can move all the DIEs to a
>    BumpPtrAllocator and remove the malloc traffic.
>
> <leak-backend.patch><num-children-by-tag.txt><num-values-by-tag.txt><num-values-with-totals.txt><num-values.txt>

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

all.patch (73K) Download Attachment
0001-CodeGen-Remove-redundant-DIETypeSignature-dump.patch (1K) Download Attachment
0002-CodeGen-Remove-the-vtable-entry-from-DIEValue.patch (33K) Download Attachment
0003-CodeGen-Make-DIEValue-Ty-private-NFC.patch (1009 bytes) Download Attachment
0004-WIP-Change-DIEValue-to-be-stored-by-value.patch (115K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Reduce the memory footprint of DIEs (and DIEValues)

Duncan P. N. Exon Smith
With just those four patches, memory usage went *up* slightly.  Add in
the 5th patch (which does #2 below), and we get an overall memory drop
of 4%.

The intermediate result of a memory increase makes sense.  While the
first four patches reduce the number of (and size of) `DIEValue`
allocations, they increase the cost of the `SmallVector` overhead.
0005 (attached) squeezes the abbreviation data into `DIEValue` for
free, next to the discriminator for the union.  The 5 patches together
are strictly an improvement to memory usage.

It's nice to see the 4% memory drop, but this is all prep work for #3,
where I expect the biggest memory usage improvements.



> On 2015 May 20, at 15:56, Duncan P. N. Exon Smith <[hidden email]> wrote:
>
> To make this a little more concrete, I just hacked up a couple of
> patches that achieve step #1.  (0004 is the key patch, and probably
> should be split up somehow before commit.)  I'll collect some
> results and report back.
>
>
> <all.patch><0001-CodeGen-Remove-redundant-DIETypeSignature-dump.patch><0002-CodeGen-Remove-the-vtable-entry-from-DIEValue.patch><0003-CodeGen-Make-DIEValue-Ty-private-NFC.patch><0004-WIP-Change-DIEValue-to-be-stored-by-value.patch>
>
>> On 2015 May 20, at 11:28, Duncan P. N. Exon Smith <[hidden email]> wrote:
>>
>> Pete Cooper and I have been looking at memory profiles of running llc on
>> verify-uselistorder.lto.opt.bc (ld -save-temps dump just before CodeGen
>> of building verify-uselistorder with -flto -g).  I've attached
>> leak-backend.patch, which we're using to make Intrustruments more
>> accurate (instead of effectively leaking things onto BumpPtrAllocators,
>> really leak them with malloc()).  (I've collected this data on top of a
>> few not-yet-committed patches to cheapen `MCSymbol` and
>> `EmitLabelDifference()` that chop around 8% of memory off the top, but
>> otherwise these numbers should be reproducible in ToT.)
>>
>> The `DIE` class is huge.  Directly, it accounts for about 15% of backend
>> memory:
>>
>>   Bytes Used Count Symbol Name
>>     77.87 MB       8.4% 318960   llvm::DwarfUnit::createAndAddDIE(unsigned int, llvm::DIE&, llvm::DINode const*)
>>     46.34 MB       5.0% 189810   llvm::DwarfCompileUnit::constructVariableDIEImpl(llvm::DbgVariable const&, bool)
>>     25.57 MB       2.7% 104752   llvm::DwarfCompileUnit::constructInlinedScopeDIE(llvm::LexicalScope*)
>>      8.19 MB       0.8% 33547   llvm::DwarfCompileUnit::constructImportedEntityDIE(llvm::DIImportedEntity const*)
>>
>> A lot of this is the pair of `SmallVector<, 12>` it has for its values
>> (look into `DIEAbbrev` for the second one).  Here's a histogram of how
>> many DIEs have each value count:
>>
>>   # of Values  DIEs with #  with # or fewer
>>             0         3128             3128
>>             1       109522           112650
>>             2       180382           293032
>>             3        90836           383868
>>             4       115552           499420
>>             5        90713           590133
>>             6         4125           594258
>>             7        17211           611469
>>             8        18144           629613
>>             9        22805           652418
>>            10          325           652743
>>            11          203           652946
>>            12          245           653191
>>
>> It's crazy that we're paying for 12 up front on every DIE.  (This is
>> a reformatted version of num-values-with-totals.txt, which I've
>> attached along with a few other histograms Pete collected.)
>>
>> The `DIEValue`s themselves, which get leaked on the BumpPtrAllocator,
>> also take up a huge amount of memory (around 4%):
>>
>>   Graph Category Persistent Bytes # Persistent # Transient Total Bytes # Total Transient/Total Bytes
>>   0 llvm::DIEInteger 19.91 MB 652389 0 19.91 MB 652389 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>>   0 llvm::DIEString 13.83 MB 302181 0 13.83 MB 302181 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>>   0 llvm::DIEEntry 10.91 MB 357506 0 10.91 MB 357506 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>>   0 llvm::DIEDelta 10.03 MB 328542 0 10.03 MB 328542 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>>   0 llvm::DIELabel 5.14 MB 168551 0 5.14 MB 168551 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>>   0 llvm::DIELoc 3.41 MB 13154 0 3.41 MB 13154 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>>   0 llvm::DIELocList 1.86 MB 61055 0 1.86 MB 61055 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>>   0 llvm::DIEBlock 11.69 KB 44 0 11.69 KB 44 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>>   0 llvm::DIEExpr 32 Bytes 1 0 32 Bytes 1 <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
>>
>> We can do better.
>>
>> 1. DIEValue should be a discriminated union that's passed by value
>>   instead of pointer.  Most types just have 1 pointer of data.  There
>>   are four "big" ones, which still need a side-allocation on the
>>   BumpPtrAllocator: DIELoc, DIEBlock, DIEString, and DIEDelta.
>>   Even for these, the side allocation just needs to store the data
>>   itself (skipping the discriminator and the vtable entry).
>> 2. The contents of DIE's Abbrev field should be integrated with the
>>   list of DIEValues.  In particular, DIEValue should contain a
>>   `dwarf::Form` and `dwarf::Attribute`.  In total, `sizeof(DIEValue)`
>>   will still be just two pointers (1st pointer: discriminator, Form,
>>   and Attribute; 2nd pointer: data).  DIE should stop storing a
>>   `DIEAbbrev` itself, instead constructing one on demand, renaming
>>   `DIE::getAbbrev()` to
>>   `DIE::getOrCreateAbbrev(FoldingSet<DIEAbbrev>&)` or some such.
>> 3. DIE's list of DIEValues is currently a `SmallVector<, 12>`, but a
>>   histogram Pete ran shows that half of DIEs have 2 or fewer values,
>>   and 85% have 4 or fewer values.  We're paying for 12 (!) upfront
>>   right now for each DIE.  Instead, we should optimize for 2-4
>>   DIEValues.  Not sure whether a std::forward_list would suffice, or if
>>   we should do something fancy like:
>>
>>       struct List {
>>         DIEValue Values[2];
>>         PointerIntPair<List *, 1> NextAndSize;
>>       };
>>
>>   Either way we should move the allocations to a BumpPtrAllocator
>>   (trivial if it's a list instead of vector).
>> 4. `DIEBlock` and `DIELoc` inherit both from `DIEValue` and `DIE`, but
>>   they're only ever used as the former.  This is just a convenience
>>   for building up and emitting their DIEValues.  Now that we've trimmed
>>   down and simplified that functionality in `DIE`, we can extract it
>>   out and make it reusable -- `DIELoc` should "have-a" DIEValue list,
>>   not "be-a" DIE.
>> 5. The children of DIE are stored in a `vector<unique_ptr<DIE>>`, which
>>   requires side allocations.  If we use an intrusively linked list,
>>   it'll be easy to avoid side allocations without hitting the
>>   pointer-validity problem highlighted in the header file.
>> 6. Now that DIE has no side allocations, we can move all the DIEs to a
>>   BumpPtrAllocator and remove the malloc traffic.
>>
>> <leak-backend.patch><num-children-by-tag.txt><num-values-by-tag.txt><num-values-with-totals.txt><num-values.txt>
>

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

0005-WIP-Store-abbreviation-data-directly-in-DIEValue.patch (36K) Download Attachment
all-2.patch (84K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Reduce the memory footprint of DIEs (and DIEValues)

Duncan P. N. Exon Smith

> On 2015 May 20, at 17:39, Duncan P. N. Exon Smith <[hidden email]> wrote:
>
> With just those four patches, memory usage went *up* slightly.  Add in
> the 5th patch (which does #2 below), and we get an overall memory drop
> of 4%.

Forgot to post numbers for this.  Peak memory was at 920 MB before
the five patches, and 884 MB after.  (These exact numbers won't quite
reproduce in ToT since I still haven't finished cleaning up and
committing the MCSymbol and emitLabelDiff() work I hacked on top of,
but the 36 MB drop should hold.)
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Reduce the memory footprint of DIEs (and DIEValues)

Sean Silva-2
In reply to this post by Duncan P. N. Exon Smith-2
Just wanted to say awesome data!

-- Sean Silva

On Wed, May 20, 2015 at 11:28 AM, Duncan P. N. Exon Smith <[hidden email]> wrote:
Pete Cooper and I have been looking at memory profiles of running llc on
verify-uselistorder.lto.opt.bc (ld -save-temps dump just before CodeGen
of building verify-uselistorder with -flto -g).  I've attached
leak-backend.patch, which we're using to make Intrustruments more
accurate (instead of effectively leaking things onto BumpPtrAllocators,
really leak them with malloc()).  (I've collected this data on top of a
few not-yet-committed patches to cheapen `MCSymbol` and
`EmitLabelDifference()` that chop around 8% of memory off the top, but
otherwise these numbers should be reproducible in ToT.)

The `DIE` class is huge.  Directly, it accounts for about 15% of backend
memory:

    Bytes Used  Count           Symbol Name
      77.87 MB       8.4%       318960             llvm::DwarfUnit::createAndAddDIE(unsigned int, llvm::DIE&, llvm::DINode const*)
      46.34 MB       5.0%       189810             llvm::DwarfCompileUnit::constructVariableDIEImpl(llvm::DbgVariable const&, bool)
      25.57 MB       2.7%       104752             llvm::DwarfCompileUnit::constructInlinedScopeDIE(llvm::LexicalScope*)
       8.19 MB       0.8%       33547              llvm::DwarfCompileUnit::constructImportedEntityDIE(llvm::DIImportedEntity const*)

A lot of this is the pair of `SmallVector<, 12>` it has for its values
(look into `DIEAbbrev` for the second one).  Here's a histogram of how
many DIEs have each value count:

    # of Values  DIEs with #  with # or fewer
              0         3128             3128
              1       109522           112650
              2       180382           293032
              3        90836           383868
              4       115552           499420
              5        90713           590133
              6         4125           594258
              7        17211           611469
              8        18144           629613
              9        22805           652418
             10          325           652743
             11          203           652946
             12          245           653191

It's crazy that we're paying for 12 up front on every DIE.  (This is
a reformatted version of num-values-with-totals.txt, which I've
attached along with a few other histograms Pete collected.)

The `DIEValue`s themselves, which get leaked on the BumpPtrAllocator,
also take up a huge amount of memory (around 4%):

    Graph        Category       Persistent Bytes        # Persistent    # Transient     Total Bytes     # Total Transient/Total Bytes
    0   llvm::DIEInteger        19.91 MB        652389  0       19.91 MB        652389  <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
    0   llvm::DIEString 13.83 MB        302181  0       13.83 MB        302181  <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
    0   llvm::DIEEntry  10.91 MB        357506  0       10.91 MB        357506  <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
    0   llvm::DIEDelta  10.03 MB        328542  0       10.03 MB        328542  <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
    0   llvm::DIELabel  5.14 MB 168551  0       5.14 MB 168551  <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
    0   llvm::DIELoc    3.41 MB 13154   0       3.41 MB 13154   <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
    0   llvm::DIELocList        1.86 MB 61055   0       1.86 MB 61055   <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
    0   llvm::DIEBlock  11.69 KB        44      0       11.69 KB        44      <XRRatioObject: 0x608025658ea0>  %0.00, %0.00
    0   llvm::DIEExpr   32 Bytes        1       0       32 Bytes        1       <XRRatioObject: 0x608025658ea0>  %0.00, %0.00

We can do better.

 1. DIEValue should be a discriminated union that's passed by value
    instead of pointer.  Most types just have 1 pointer of data.  There
    are four "big" ones, which still need a side-allocation on the
    BumpPtrAllocator: DIELoc, DIEBlock, DIEString, and DIEDelta.
    Even for these, the side allocation just needs to store the data
    itself (skipping the discriminator and the vtable entry).
 2. The contents of DIE's Abbrev field should be integrated with the
    list of DIEValues.  In particular, DIEValue should contain a
    `dwarf::Form` and `dwarf::Attribute`.  In total, `sizeof(DIEValue)`
    will still be just two pointers (1st pointer: discriminator, Form,
    and Attribute; 2nd pointer: data).  DIE should stop storing a
    `DIEAbbrev` itself, instead constructing one on demand, renaming
    `DIE::getAbbrev()` to
    `DIE::getOrCreateAbbrev(FoldingSet<DIEAbbrev>&)` or some such.
 3. DIE's list of DIEValues is currently a `SmallVector<, 12>`, but a
    histogram Pete ran shows that half of DIEs have 2 or fewer values,
    and 85% have 4 or fewer values.  We're paying for 12 (!) upfront
    right now for each DIE.  Instead, we should optimize for 2-4
    DIEValues.  Not sure whether a std::forward_list would suffice, or if
    we should do something fancy like:

        struct List {
          DIEValue Values[2];
          PointerIntPair<List *, 1> NextAndSize;
        };

    Either way we should move the allocations to a BumpPtrAllocator
    (trivial if it's a list instead of vector).
 4. `DIEBlock` and `DIELoc` inherit both from `DIEValue` and `DIE`, but
    they're only ever used as the former.  This is just a convenience
    for building up and emitting their DIEValues.  Now that we've trimmed
    down and simplified that functionality in `DIE`, we can extract it
    out and make it reusable -- `DIELoc` should "have-a" DIEValue list,
    not "be-a" DIE.
 5. The children of DIE are stored in a `vector<unique_ptr<DIE>>`, which
    requires side allocations.  If we use an intrusively linked list,
    it'll be easy to avoid side allocations without hitting the
    pointer-validity problem highlighted in the header file.
 6. Now that DIE has no side allocations, we can move all the DIEs to a
    BumpPtrAllocator and remove the malloc traffic.


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Reduce the memory footprint of DIEs (and DIEValues)

Duncan P. N. Exon Smith
In reply to this post by Duncan P. N. Exon Smith

> On 2015 May 20, at 17:51, Duncan P. N. Exon Smith <[hidden email]> wrote:
>
>
>> On 2015 May 20, at 17:39, Duncan P. N. Exon Smith <[hidden email]> wrote:
>>
>> With just those four patches, memory usage went *up* slightly.  Add in
>> the 5th patch (which does #2 below), and we get an overall memory drop
>> of 4%.
>
> Forgot to post numbers for this.  Peak memory was at 920 MB before
> the five patches, and 884 MB after.  (These exact numbers won't quite
> reproduce in ToT since I still haven't finished cleaning up and
> committing the MCSymbol and emitLabelDiff() work I hacked on top of,
> but the 36 MB drop should hold.)
I've cleaned all this up and committed the most obvious parts, as well
as a few unrelated memory improvements.  I'm attaching my (almost?)
ready-to-go patches, which have the following effects on peak memory:

  - 0000: 845 MB (baseline)
  - 0001: 845 MB - refactor
  - 0002: 879 MB - pass DIEValue by value (momentary setback)
  - 0003: 829 MB - merge DIEAbbrevData into DIEValue
  - 0004: 829 MB - refactor
  - 0005: 829 MB - refactor
  - 0006: 829 MB - refactor
  - 0007: 764 MB - change DIE::Values to a linked list
  - 0008: 756 MB - change DIE::Children to a linked list

(Still measuring memory on `llc` for `-flto -g`; details in r236629.)

@Eric, you mentioned offline you'd like to have a look at this proposal
before I proceed -- obviously I've been impatient ;).  Let me know if
I'm okay to move forward and start committing (modulo a couple of these
that I'll want a review from David and Fred on).


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

0001-AsmPrinter-Reorganize-DIE.h-NFC.patch (12K) Download Attachment
0002-AsmPrinter-Change-DIEValue-to-be-stored-by-value.patch (145K) Download Attachment
0003-AsmPrinter-Store-abbreviation-data-directly-in-DIE-a.patch (34K) Download Attachment
0004-AsmPrinter-Stop-exposing-underlying-DIEValue-list-NF.patch (18K) Download Attachment
0005-AsmPrinter-Return-added-DIE-from-DIE-addChild.patch (2K) Download Attachment
0006-AsmPrinter-Stop-exposing-underlying-DIE-children-lis.patch (5K) Download Attachment
0007-AsmPrinter-Convert-DIE-Values-to-a-linked-list.patch (86K) Download Attachment
0008-AsmPrinter-Use-an-intrusively-linked-list-for-DIE-Ch.patch (60K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Reduce the memory footprint of DIEs (and DIEValues)

Yaron Keren
>   - 0000: 845 MB (baseline)
>   - 0008: 756 MB - change DIE::Children to a linked list

That's really nice savings, more than 10% !


2015-05-24 21:55 GMT+03:00 Duncan P. N. Exon Smith <[hidden email]>:

>
>
> > On 2015 May 20, at 17:51, Duncan P. N. Exon Smith <[hidden email]> wrote:
> >
> >
> >> On 2015 May 20, at 17:39, Duncan P. N. Exon Smith <[hidden email]> wrote:
> >>
> >> With just those four patches, memory usage went *up* slightly.  Add in
> >> the 5th patch (which does #2 below), and we get an overall memory drop
> >> of 4%.
> >
> > Forgot to post numbers for this.  Peak memory was at 920 MB before
> > the five patches, and 884 MB after.  (These exact numbers won't quite
> > reproduce in ToT since I still haven't finished cleaning up and
> > committing the MCSymbol and emitLabelDiff() work I hacked on top of,
> > but the 36 MB drop should hold.)
>
> I've cleaned all this up and committed the most obvious parts, as well
> as a few unrelated memory improvements.  I'm attaching my (almost?)
> ready-to-go patches, which have the following effects on peak memory:
>
>   - 0000: 845 MB (baseline)
>   - 0001: 845 MB - refactor
>   - 0002: 879 MB - pass DIEValue by value (momentary setback)
>   - 0003: 829 MB - merge DIEAbbrevData into DIEValue
>   - 0004: 829 MB - refactor
>   - 0005: 829 MB - refactor
>   - 0006: 829 MB - refactor
>   - 0007: 764 MB - change DIE::Values to a linked list
>   - 0008: 756 MB - change DIE::Children to a linked list
>
> (Still measuring memory on `llc` for `-flto -g`; details in r236629.)
>
> @Eric, you mentioned offline you'd like to have a look at this proposal
> before I proceed -- obviously I've been impatient ;).  Let me know if
> I'm okay to move forward and start committing (modulo a couple of these
> that I'll want a review from David and Fred on).
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Reduce the memory footprint of DIEs (and DIEValues)

Eric Christopher
In reply to this post by Duncan P. N. Exon Smith
Hi Duncan,

Few random comments on things, in general looks pretty good.

0001 - LGTM.
0002 - LGTM.
0003 -

   struct AttrEntry {
     DIEValue Val;
-    const DIEAbbrevData *Desc;
   };

Most boring struct ever?

+  const DIEAbbrev &Abbrev = getAbbrev(Die.getAbbrevNumber());

This is a bit of an awkward construction. I understand where it's coming from, but can you go back and comment the various abbreviation bits with how it works now?

-  AssignAbbrev(Die->getAbbrev());
+  AssignAbbrev(NewAbbrev);
+  Die->setAbbrevNumber(NewAbbrev.getNumber());

To elaborate - the need for this sort of thing should be documented.

0004 - LGTM.
0005 - Better commit message saying why you're going to want this more later?
0006 - LGTM.
0007 - LGTM, might want to have Dave take a look at it as well.
0008 - reverseChildren and finalizeChildren is pretty gross. Also, I'm going to punt on the rest of this to Dave.

Thanks for all the work!

-eric

On Sun, May 24, 2015 at 11:55 AM Duncan P. N. Exon Smith <[hidden email]> wrote:

> On 2015 May 20, at 17:51, Duncan P. N. Exon Smith <[hidden email]> wrote:
>
>
>> On 2015 May 20, at 17:39, Duncan P. N. Exon Smith <[hidden email]> wrote:
>>
>> With just those four patches, memory usage went *up* slightly.  Add in
>> the 5th patch (which does #2 below), and we get an overall memory drop
>> of 4%.
>
> Forgot to post numbers for this.  Peak memory was at 920 MB before
> the five patches, and 884 MB after.  (These exact numbers won't quite
> reproduce in ToT since I still haven't finished cleaning up and
> committing the MCSymbol and emitLabelDiff() work I hacked on top of,
> but the 36 MB drop should hold.)

I've cleaned all this up and committed the most obvious parts, as well
as a few unrelated memory improvements.  I'm attaching my (almost?)
ready-to-go patches, which have the following effects on peak memory:

  - 0000: 845 MB (baseline)
  - 0001: 845 MB - refactor
  - 0002: 879 MB - pass DIEValue by value (momentary setback)
  - 0003: 829 MB - merge DIEAbbrevData into DIEValue
  - 0004: 829 MB - refactor
  - 0005: 829 MB - refactor
  - 0006: 829 MB - refactor
  - 0007: 764 MB - change DIE::Values to a linked list
  - 0008: 756 MB - change DIE::Children to a linked list

(Still measuring memory on `llc` for `-flto -g`; details in r236629.)

@Eric, you mentioned offline you'd like to have a look at this proposal
before I proceed -- obviously I've been impatient ;).  Let me know if
I'm okay to move forward and start committing (modulo a couple of these
that I'll want a review from David and Fred on).


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: RFC: Reduce the memory footprint of DIEs (and DIEValues)

Duncan P. N. Exon Smith
bcc:llvmdev, +llvm-commits

Thanks Eric!

David, let me know if you have any comments on the `DIE::Children`
stuff (patch 0008 in particular, although I think Eric wanted your
opinion on 0007 as well).

Fred, you mentioned in person you want to have a look.  I'll hold
off on committing until you have a chance too.

I've rebased the patches, removed some FIXMEs that I forgot to
delete, and added an 'all.patch' in case that's more convenient.

(I haven't addressed Eric's comments yet; I'm mainly reposting to
move this off of llvmdev.)




> On 2015-May-26, at 14:41, Eric Christopher <[hidden email]> wrote:
>
> Hi Duncan,
>
> Few random comments on things, in general looks pretty good.
>
> 0001 - LGTM.
> 0002 - LGTM.
> 0003 -
>
>    struct AttrEntry {
>      DIEValue Val;
> -    const DIEAbbrevData *Desc;
>    };
>
> Most boring struct ever?
>
> +  const DIEAbbrev &Abbrev = getAbbrev(Die.getAbbrevNumber());
>
> This is a bit of an awkward construction. I understand where it's coming from, but can you go back and comment the various abbreviation bits with how it works now?
>
> -  AssignAbbrev(Die->getAbbrev());
> +  AssignAbbrev(NewAbbrev);
> +  Die->setAbbrevNumber(NewAbbrev.getNumber());
>
> To elaborate - the need for this sort of thing should be documented.
>
> 0004 - LGTM.
> 0005 - Better commit message saying why you're going to want this more later?
> 0006 - LGTM.
> 0007 - LGTM, might want to have Dave take a look at it as well.
> 0008 - reverseChildren and finalizeChildren is pretty gross. Also, I'm going to punt on the rest of this to Dave.
>
> Thanks for all the work!
>
> -eric
>
> On Sun, May 24, 2015 at 11:55 AM Duncan P. N. Exon Smith <[hidden email]> wrote:
>
> > On 2015 May 20, at 17:51, Duncan P. N. Exon Smith <[hidden email]> wrote:
> >
> >
> >> On 2015 May 20, at 17:39, Duncan P. N. Exon Smith <[hidden email]> wrote:
> >>
> >> With just those four patches, memory usage went *up* slightly.  Add in
> >> the 5th patch (which does #2 below), and we get an overall memory drop
> >> of 4%.
> >
> > Forgot to post numbers for this.  Peak memory was at 920 MB before
> > the five patches, and 884 MB after.  (These exact numbers won't quite
> > reproduce in ToT since I still haven't finished cleaning up and
> > committing the MCSymbol and emitLabelDiff() work I hacked on top of,
> > but the 36 MB drop should hold.)
>
> I've cleaned all this up and committed the most obvious parts, as well
> as a few unrelated memory improvements.  I'm attaching my (almost?)
> ready-to-go patches, which have the following effects on peak memory:
>
>   - 0000: 845 MB (baseline)
>   - 0001: 845 MB - refactor
>   - 0002: 879 MB - pass DIEValue by value (momentary setback)
>   - 0003: 829 MB - merge DIEAbbrevData into DIEValue
>   - 0004: 829 MB - refactor
>   - 0005: 829 MB - refactor
>   - 0006: 829 MB - refactor
>   - 0007: 764 MB - change DIE::Values to a linked list
>   - 0008: 756 MB - change DIE::Children to a linked list
>
> (Still measuring memory on `llc` for `-flto -g`; details in r236629.)
>
> @Eric, you mentioned offline you'd like to have a look at this proposal
> before I proceed -- obviously I've been impatient ;).  Let me know if
> I'm okay to move forward and start committing (modulo a couple of these
> that I'll want a review from David and Fred on).
>

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

0001-AsmPrinter-Reorganize-DIE.h-NFC.patch (12K) Download Attachment
0002-AsmPrinter-Change-DIEValue-to-be-stored-by-value.patch (145K) Download Attachment
0003-AsmPrinter-Store-abbreviation-data-directly-in-DIE-a.patch (34K) Download Attachment
0004-AsmPrinter-Stop-exposing-underlying-DIEValue-list-NF.patch (18K) Download Attachment
0005-AsmPrinter-Return-added-DIE-from-DIE-addChild.patch (2K) Download Attachment
0006-AsmPrinter-Stop-exposing-underlying-DIE-children-lis.patch (5K) Download Attachment
0007-AsmPrinter-Convert-DIE-Values-to-a-linked-list.patch (86K) Download Attachment
0008-AsmPrinter-Use-an-intrusively-linked-list-for-DIE-Ch.patch (60K) Download Attachment
all.patch (153K) Download Attachment