[llvm-dev] llvm-symbolizer memory usage

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] llvm-symbolizer memory usage

Jonas Paulsson via llvm-dev
I work on a linux program with restricted RSS limits (a couple hundred MB), and one of the things this program does is symbolication. Ideally, we'd like to use llvm-symbolizer for this symbolication (because we get things like function inlining that we can't get from cheaper symbolizers), but for large binaries, the memory usage gets pretty huge.

Based on some memory profiling, it looks like the majority of this memory cost comes from mmap-ing the binary to be symbolized (via `llvm::object::createBinary"). This alone comes with hundreds of MB of cost in many cases.

I have 2 questions here:
1) Does it seem feasible to make llvm-symbolizer work *without* loading the full binary into memory (perhaps just reading sections from disk as needed, at the cost of some extra CPU)?
2) If we figured this out, and put it behind something like a "--low-memory" flag, would it be something the upstream community would accept?

Francis

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] llvm-symbolizer memory usage

Jonas Paulsson via llvm-dev
(Adding Hyoun who's been looking at memory use of llvm-symbolizer recently too)

On Tue, Jan 14, 2020 at 11:07 AM Francis Ricci via llvm-dev <[hidden email]> wrote:
I work on a linux program with restricted RSS limits (a couple hundred MB), and one of the things this program does is symbolication. Ideally, we'd like to use llvm-symbolizer for this symbolication (because we get things like function inlining that we can't get from cheaper symbolizers), but for large binaries, the memory usage gets pretty huge.

Based on some memory profiling, it looks like the majority of this memory cost comes from mmap-ing the binary to be symbolized (via `llvm::object::createBinary"). This alone comes with hundreds of MB of cost in many cases.

I have 2 questions here:
1) Does it seem feasible to make llvm-symbolizer work *without* loading the full binary into memory (perhaps just reading sections from disk as needed, at the cost of some extra CPU)?

Does memory mapping the file actually use real memory? Or is it just reading from the file, effectively? I don't think the mapped file was part of the memory usage Hyoun and I encountered when doing memory accounting. What we were talking about was an LRU cache of DwarfCompileUnits, or something like that - to strip out the DIEArrays and other associated data structures after they were used.

Are you running llvm-symbolizer on many input addresses in a single run? Only a single address? Optimized or unoptimized build of llvm-symbolizer?
 
2) If we figured this out, and put it behind something like a "--low-memory" flag, would it be something the upstream community would accept?

Maybe, though I'm hoping we can avoid having to have too much of a perf tradeoff for low memory usage, so we can keep it all together without a flag.
 

Francis
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] llvm-symbolizer memory usage

Jonas Paulsson via llvm-dev


On Tue, Jan 14, 2020 at 2:32 PM David Blaikie <[hidden email]> wrote:
(Adding Hyoun who's been looking at memory use of llvm-symbolizer recently too)

On Tue, Jan 14, 2020 at 11:07 AM Francis Ricci via llvm-dev <[hidden email]> wrote:
I work on a linux program with restricted RSS limits (a couple hundred MB), and one of the things this program does is symbolication. Ideally, we'd like to use llvm-symbolizer for this symbolication (because we get things like function inlining that we can't get from cheaper symbolizers), but for large binaries, the memory usage gets pretty huge.

Based on some memory profiling, it looks like the majority of this memory cost comes from mmap-ing the binary to be symbolized (via `llvm::object::createBinary"). This alone comes with hundreds of MB of cost in many cases.

I have 2 questions here:
1) Does it seem feasible to make llvm-symbolizer work *without* loading the full binary into memory (perhaps just reading sections from disk as needed, at the cost of some extra CPU)?

Does memory mapping the file actually use real memory? Or is it just reading from the file, effectively? I don't think the mapped file was part of the memory usage Hyoun and I encountered when doing memory accounting. What we were talking about was an LRU cache of DwarfCompileUnits, or something like that - to strip out the DIEArrays and other associated data structures after they were used.

I might be wrong because I'm not familiar with LLVM. When I tried to reduce the RSS of our symbolizer usage, I also saw both input file mapping and internal data structure (DIEArray, line table, etc.) took significant memory.

As Dave mentioned, I've tried LRU caching for the internal data structure and that could reduce the memory usage quite a bit for our use case of symbolizing many addresses in a single run. We're working on somehow upstreaming the caching.

The input file part seems more complicated. For us, the file is memory-mapped and the kernel only brings in needed pages. It was a problem for us because we need to symbolize many addresses and the kernel couldn't handle the access pattern very well leaving the entire file in memory. I could reduce RSS by inserting madvise(MADV_DONTNEED) here and there, but I don't think it's likely to be upstreamed.

While I follow the code path for memory mapping the input file, I vaguely recall seeing other code paths that could just alloc memory worth the entire file and copy it when memory-mapped file is not available. Is this the case for you?

Thanks,
HK

 

Are you running llvm-symbolizer on many input addresses in a single run? Only a single address? Optimized or unoptimized build of llvm-symbolizer?
 
2) If we figured this out, and put it behind something like a "--low-memory" flag, would it be something the upstream community would accept?

Maybe, though I'm hoping we can avoid having to have too much of a perf tradeoff for low memory usage, so we can keep it all together without a flag.
 

Francis
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev