[llvm-dev] [MS] Partial PDB (/DEBUG:FASTLINK) parsing support in LLVM

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[llvm-dev] [MS] Partial PDB (/DEBUG:FASTLINK) parsing support in LLVM

Gerolf Hoflehner via llvm-dev
Hi Zach (or anyone else who may have a clue),

I'm currently investigating making use of LLVM for PDB parsing for with a view to supporting partial PDBs as produced by /DEBUG:FASTLINK as the VS DIA SDK hasn't been updated to handle them. I know this is probably low on your priority list but since /DEBUG:FASTLINK is now the implied default for VS2017 I figure it's a good time to take a look at it.

Unfortunately I'm finding very little information on the internal structure used by partial PDBs. It seems https://github.com/Microsoft/microsoft-pdb doesn't offer much either, unless I'm missing something...

So, two questions: Are you planning to try and support partial PDBs? And do you have any good references for their layout?

Many thanks,
Will.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] [MS] Partial PDB (/DEBUG:FASTLINK) parsing support in LLVM

Gerolf Hoflehner via llvm-dev
I didn't believe you at first that DIA SDK didn't support partial PDBs, so I went and tried `llvm-pdbdump pretty -types foo.pdb` on a partial PDB and it caused llvm-pdbdump to crash.  When I looked further, it turns out IDiaSymbol::findChildren() is returning E_NOTIMPL.  Wow!  I'm a bit surprised honestly.

I've pushed a fix for this in r304982, but all that does is make llvm-pdbdump not crash.  It still doesn't display any types.

Luckily llvm-pdbdump has another mode (accessible via the `raw` subcommand) that can bypass the DIA SDK and show you the underlying structure.  Here's what I get when I try dumping types of a partial PDB.

D:\src\llvmbuild\ninja>bin\llvm-pdbdump raw -tpi-records cpptest.pdb
Type Info Stream (TPI) {
  TPI Version: 20040203
  Record count: 0
  Records [
    TypeIndexOffsets [
    ]
  ]
}

Umm, ok.  So there's *actually* no types in the PDB.

Let's try symbols.

D:\src\llvmbuild\ninja>bin\llvm-pdbdump raw -module-syms cpptest.pdb
DBI Stream {
  # snip
  Modules [
    {
      Name: test2.obj
      # snip
      Symbols [
        {
          UnknownSym {
            Kind: 0x1167
            Length: 52
          }
        }
        {
          UnknownSym {
            Kind: 0x1167
            Length: 64
          }
        }
        {
          UnknownSym {
            Kind: 0x1167
            Length: 60
          }
        }
  # thousands of similar lines snipped.

So this is a little bit more interesting.  Let's see what these records look like:

D:\src\llvmbuild\ninja>bin\llvm-pdbdump raw -module-syms -sym-record-bytes cpptest.pdb
DBI Stream {
  # snip
  Modules [
    {
      Name: test2.obj
      # snip
      Symbols [
        {
          UnknownSym {
            Kind: 0x1167
            Length: 52
          }
          Bytes (
            0000: 30140000 04005F5F 76635F61 74747269  |0.....__vc_attri|
            0010: 62757465 733A3A65 76656E74 5F736F75  |butes::event_sou|
            0020: 72636541 74747269 62757465 00000000  |rceAttribute....|
          )
        }
        {
          UnknownSym {
            Kind: 0x1167
            Length: 64
          }
          Bytes (
            0000: 29140000 04005F5F 76635F61 74747269  |).....__vc_attri|
            0010: 62757465 733A3A65 76656E74 5F736F75  |butes::event_sou|
            0020: 72636541 74747269 62757465 3A3A6F70  |rceAttribute::op|
            0030: 74696D69 7A655F65 00000000           |timize_e....|
          )
        }
        {
          UnknownSym {
            Kind: 0x1167
            Length: 60
          }
          Bytes (
            0000: 27140000 04005F5F 76635F61 74747269  |'.....__vc_attri|
            0010: 62757465 733A3A65 76656E74 5F736F75  |butes::event_sou|
            0020: 72636541 74747269 62757465 3A3A7479  |rceAttribute::ty|
            0030: 70655F65 00000000                    |pe_e....|
          )
        }
        {
          UnknownSym {
            Kind: 0x1167
            Length: 68
          }
          Bytes (
            0000: 0C140000 04005F5F 76635F61 74747269  |......__vc_attri|
            0010: 62757465 733A3A68 656C7065 725F6174  |butes::helper_at|
            0020: 74726962 75746573 3A3A7631 5F616C74  |tributes::v1_alt|
            0030: 74797065 41747472 69627574 65000000  |typeAttribute...|
          )
        }

So, this symbol record with kind 0x1167 is pretty interesting, and clearly related to /debug:fastlink.  Its format can be deduced as something like this:

struct DebugFastLinkRecord {
  char Unknown[6];
  char Name[0]; // null terminated string
  char Padding[0]; // pad to 4 bytes
};

What those first 6 bytes are I can't tell you.

Let's see what else we can find.  another source of interesting debug info comes from what I refer to as "debug subsections".  In an object file, every .debug$S section is basically just a big list of these.  In a PDB file though, the debug subsections appear embedded inside of a each module's debug stream.  Which is similar to a .debug$S section, but with some additional PDB-specific stuff.  You can find llvm-pdbdump's code for parsing this in ModuleDebugStream.cpp

Anyway, the part we're interested can be dumped using llvm-pdbdump raw -subsections=unknown.  I say unknown because we're looking for stuff that is unique to /debug:fastlink PDBs, so presumably any /debug:fastlink specific data would be something we don't know about / have never seen before.  (Note that this command line option hasn't made it upstream yet, it's still in review.  But expect it today or tomorrow if all goes well).

So we'll try this:

bin\llvm-pdbdump raw -subsections=unknown cpptest.pdb
DBI Stream {
  # snip
  Modules [
    {
      Name: test2.obj
      # snip
      Subsections [
        Unknown {
          Kind: 0xFD
          Data (
            0000: 00000000 00000000 00000000 00000000  |................|
            0010: 00000000 00000000 00000000 00000000  |................|
            0020: 00000000 00000000 00000000 B0240100  |.............$..|
            0030: 00000000 00000000 00000000 00000000  |................|
            0040: 00000000 B0240100 90270100 D0270100  |.....$...'...'..|
            0050: 90990100 00000000 00000000 90990100  |................|
            0060: A49C0100 00000000 00000000 A49C0100  |................|
          )
        }
      ]
    }

Neat!  What is this thing?  0xFD is 253, and looking that up in our DebugSubsectionKind enumeration shows that this is a CoffSymbolRVA subsection.

The format of that subsection can very likely be understood by reading the code in the Microsoft repo, but I haven't investigated it yet.

Hopefully this is a good starting point.  llvm-pdbdump is a pretty useful tool for investigating these types of issues, so let me know if you try it out and have suggestions for how to improve it.

As mentioned, some of the commands I demonstrated above are still not upstream yet, but I'll try to get it in this week.

On Thu, Jun 8, 2017 at 5:07 AM Will Wilson <[hidden email]> wrote:
Hi Zach (or anyone else who may have a clue),

I'm currently investigating making use of LLVM for PDB parsing for with a view to supporting partial PDBs as produced by /DEBUG:FASTLINK as the VS DIA SDK hasn't been updated to handle them. I know this is probably low on your priority list but since /DEBUG:FASTLINK is now the implied default for VS2017 I figure it's a good time to take a look at it.

Unfortunately I'm finding very little information on the internal structure used by partial PDBs. It seems https://github.com/Microsoft/microsoft-pdb doesn't offer much either, unless I'm missing something...

So, two questions: Are you planning to try and support partial PDBs? And do you have any good references for their layout?

Many thanks,
Will.


_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [llvm-dev] [MS] Partial PDB (/DEBUG:FASTLINK) parsing support in LLVM

Gerolf Hoflehner via llvm-dev
Hi Zach,

A big thanks for the detailed analysis, it's super helpful. I've already being making use of llvm-pdbdump and it's proven very useful for my needs so far - especially in the absence of any real documentation of the PDB format.

I ran into the missing partial PDB support in DIA some time ago. Some background reading: https://developercommunity.visualstudio.com/content/problem/4631/dia-sdk-still-doesnt-support-debugfastlink.html

MS does provide the mspdbcmf.exe tool (https://blogs.msdn.microsoft.com/vcblog/2016/10/05/faster-c-build-cycle-in-vs-15-with-debugfastlink/) for conversion from partial to full PDBs. It might prove useful for testing parsing support, although it does seem to have occasional issues with incremental thunks and compiland env entries when parsing the converted PDB via DIA. It's also pretty slow at converting larger PDBs.

I should have some more time in the coming week to take a closer look at fastlink related parsing. So when you can get the -subsections=unknown option committed I'll dive in.

Many thanks,
Will.

On 8 June 2017 at 18:43, Zachary Turner <[hidden email]> wrote:
I didn't believe you at first that DIA SDK didn't support partial PDBs, so I went and tried `llvm-pdbdump pretty -types foo.pdb` on a partial PDB and it caused llvm-pdbdump to crash.  When I looked further, it turns out IDiaSymbol::findChildren() is returning E_NOTIMPL.  Wow!  I'm a bit surprised honestly.

I've pushed a fix for this in r304982, but all that does is make llvm-pdbdump not crash.  It still doesn't display any types.

Luckily llvm-pdbdump has another mode (accessible via the `raw` subcommand) that can bypass the DIA SDK and show you the underlying structure.  Here's what I get when I try dumping types of a partial PDB.

D:\src\llvmbuild\ninja>bin\llvm-pdbdump raw -tpi-records cpptest.pdb
Type Info Stream (TPI) {
  TPI Version: 20040203
  Record count: 0
  Records [
    TypeIndexOffsets [
    ]
  ]
}

Umm, ok.  So there's *actually* no types in the PDB.

Let's try symbols.

D:\src\llvmbuild\ninja>bin\llvm-pdbdump raw -module-syms cpptest.pdb
DBI Stream {
  # snip
  Modules [
    {
      Name: test2.obj
      # snip
      Symbols [
        {
          UnknownSym {
            Kind: 0x1167
            Length: 52
          }
        }
        {
          UnknownSym {
            Kind: 0x1167
            Length: 64
          }
        }
        {
          UnknownSym {
            Kind: 0x1167
            Length: 60
          }
        }
  # thousands of similar lines snipped.

So this is a little bit more interesting.  Let's see what these records look like:

D:\src\llvmbuild\ninja>bin\llvm-pdbdump raw -module-syms -sym-record-bytes cpptest.pdb
DBI Stream {
  # snip
  Modules [
    {
      Name: test2.obj
      # snip
      Symbols [
        {
          UnknownSym {
            Kind: 0x1167
            Length: 52
          }
          Bytes (
            0000: 30140000 04005F5F 76635F61 74747269  |0.....__vc_attri|
            0010: 62757465 733A3A65 76656E74 5F736F75  |butes::event_sou|
            0020: 72636541 74747269 62757465 00000000  |rceAttribute....|
          )
        }
        {
          UnknownSym {
            Kind: 0x1167
            Length: 64
          }
          Bytes (
            0000: 29140000 04005F5F 76635F61 74747269  |).....__vc_attri|
            0010: 62757465 733A3A65 76656E74 5F736F75  |butes::event_sou|
            0020: 72636541 74747269 62757465 3A3A6F70  |rceAttribute::op|
            0030: 74696D69 7A655F65 00000000           |timize_e....|
          )
        }
        {
          UnknownSym {
            Kind: 0x1167
            Length: 60
          }
          Bytes (
            0000: 27140000 04005F5F 76635F61 74747269  |'.....__vc_attri|
            0010: 62757465 733A3A65 76656E74 5F736F75  |butes::event_sou|
            0020: 72636541 74747269 62757465 3A3A7479  |rceAttribute::ty|
            0030: 70655F65 00000000                    |pe_e....|
          )
        }
        {
          UnknownSym {
            Kind: 0x1167
            Length: 68
          }
          Bytes (
            0000: 0C140000 04005F5F 76635F61 74747269  |......__vc_attri|
            0010: 62757465 733A3A68 656C7065 725F6174  |butes::helper_at|
            0020: 74726962 75746573 3A3A7631 5F616C74  |tributes::v1_alt|
            0030: 74797065 41747472 69627574 65000000  |typeAttribute...|
          )
        }

So, this symbol record with kind 0x1167 is pretty interesting, and clearly related to /debug:fastlink.  Its format can be deduced as something like this:

struct DebugFastLinkRecord {
  char Unknown[6];
  char Name[0]; // null terminated string
  char Padding[0]; // pad to 4 bytes
};

What those first 6 bytes are I can't tell you.

Let's see what else we can find.  another source of interesting debug info comes from what I refer to as "debug subsections".  In an object file, every .debug$S section is basically just a big list of these.  In a PDB file though, the debug subsections appear embedded inside of a each module's debug stream.  Which is similar to a .debug$S section, but with some additional PDB-specific stuff.  You can find llvm-pdbdump's code for parsing this in ModuleDebugStream.cpp

Anyway, the part we're interested can be dumped using llvm-pdbdump raw -subsections=unknown.  I say unknown because we're looking for stuff that is unique to /debug:fastlink PDBs, so presumably any /debug:fastlink specific data would be something we don't know about / have never seen before.  (Note that this command line option hasn't made it upstream yet, it's still in review.  But expect it today or tomorrow if all goes well).

So we'll try this:

bin\llvm-pdbdump raw -subsections=unknown cpptest.pdb
DBI Stream {
  # snip
  Modules [
    {
      Name: test2.obj
      # snip
      Subsections [
        Unknown {
          Kind: 0xFD
          Data (
            0000: 00000000 00000000 00000000 00000000  |................|
            0010: 00000000 00000000 00000000 00000000  |................|
            0020: 00000000 00000000 00000000 B0240100  |.............$..|
            0030: 00000000 00000000 00000000 00000000  |................|
            0040: 00000000 B0240100 90270100 D0270100  |.....$...'...'..|
            0050: 90990100 00000000 00000000 90990100  |................|
            0060: A49C0100 00000000 00000000 A49C0100  |................|
          )
        }
      ]
    }

Neat!  What is this thing?  0xFD is 253, and looking that up in our DebugSubsectionKind enumeration shows that this is a CoffSymbolRVA subsection.

The format of that subsection can very likely be understood by reading the code in the Microsoft repo, but I haven't investigated it yet.

Hopefully this is a good starting point.  llvm-pdbdump is a pretty useful tool for investigating these types of issues, so let me know if you try it out and have suggestions for how to improve it.

As mentioned, some of the commands I demonstrated above are still not upstream yet, but I'll try to get it in this week.

On Thu, Jun 8, 2017 at 5:07 AM Will Wilson <[hidden email]> wrote:
Hi Zach (or anyone else who may have a clue),

I'm currently investigating making use of LLVM for PDB parsing for with a view to supporting partial PDBs as produced by /DEBUG:FASTLINK as the VS DIA SDK hasn't been updated to handle them. I know this is probably low on your priority list but since /DEBUG:FASTLINK is now the implied default for VS2017 I figure it's a good time to take a look at it.

Unfortunately I'm finding very little information on the internal structure used by partial PDBs. It seems https://github.com/Microsoft/microsoft-pdb doesn't offer much either, unless I'm missing something...

So, two questions: Are you planning to try and support partial PDBs? And do you have any good references for their layout?

Many thanks,
Will.




--
Home of Recode : Runtime C++ Editing for VS

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Loading...