[llvm-dev] [tablegen] table readability / performance

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] [tablegen] table readability / performance

Hiroshi Yamauchi via llvm-dev
Hello

I've been looking at the tables generated by
`SequenceToOffsetTable::emit`, and notice that when the generated data
are strings, the data is basically un-grep-able, and very tricky to
read, as they are emitted as an array of comma-separated char-literal:

    extern const char HexagonInstrNameData[] = {
      /* 0 */ 'G', '_', 'F', 'L', 'O', 'G', '1', '0', 0,
      /* 9 */ 'E', 'N', 'D', 'L', 'O', 'O', 'P', '0', 0,
      /* 18 */ 'V', '6', '_', 'v', 'd', 'd', '0', 0,
      /* 26 */ 'P', 'S', '_', 'v', 'd', 'd', '0', 0,
      /* 34 */ 'V', '6', '_', 'l', 'd', '0', 0,
      /* 41 */ 'V', '6', '_', 'z', 'l', 'd', '0', 0,
      [...]
    };

As far as I can see, this makes it more difficult than necessary to read
for at least the following cases:

    Target AsmStrs
    Target InstrNameData
    Target RegStrings
    Target RegClassStrings


I hacked together a fix for the above cases locally, and found that for
at least for clang and gcc, the compile-time for generated tables is
significantly reduced when emitting string literals, and the user can
grep the name tables without huge effort. The above table is now:

    extern const char HexagonInstrNameData[] = {
      /* 0 */ "G_FLOG10\0"
      /* 9 */ "ENDLOOP0\0"
      /* 18 */ "V6_vdd0\0"
      /* 26 */ "PS_vdd0\0"
      /* 34 */ "V6_ld0\0"
      /* 41 */ "V6_zld0\0"
      [...]
    };

My question then is: Is there a specific technical reason that we should
avoid emitting concatenated string literals rather array of
comma-separated char literals for "string-like" data?

If not, I can probably post a patch, which I feel will make it much
easier to understand the output from tablegen, and helps compilation
speed of generated tables.

Any thoughts appreciated.

All the Best

Luke

--
Codeplay Software Ltd.
Company registered in England and Wales, number: 04567874
Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [tablegen] table readability / performance

Hiroshi Yamauchi via llvm-dev
I don't think there's any technical reason for the current structure. Your change sounds like an improvement to me. I'm curious if you have any compile time measurements from this change.

On Tue, Jan 14, 2020 at 4:59 AM Luke Drummond via llvm-dev <[hidden email]> wrote:
Hello

I've been looking at the tables generated by
`SequenceToOffsetTable::emit`, and notice that when the generated data
are strings, the data is basically un-grep-able, and very tricky to
read, as they are emitted as an array of comma-separated char-literal:

    extern const char HexagonInstrNameData[] = {
      /* 0 */ 'G', '_', 'F', 'L', 'O', 'G', '1', '0', 0,
      /* 9 */ 'E', 'N', 'D', 'L', 'O', 'O', 'P', '0', 0,
      /* 18 */ 'V', '6', '_', 'v', 'd', 'd', '0', 0,
      /* 26 */ 'P', 'S', '_', 'v', 'd', 'd', '0', 0,
      /* 34 */ 'V', '6', '_', 'l', 'd', '0', 0,
      /* 41 */ 'V', '6', '_', 'z', 'l', 'd', '0', 0,
      [...]
    };

As far as I can see, this makes it more difficult than necessary to read
for at least the following cases:

    Target AsmStrs
    Target InstrNameData
    Target RegStrings
    Target RegClassStrings


I hacked together a fix for the above cases locally, and found that for
at least for clang and gcc, the compile-time for generated tables is
significantly reduced when emitting string literals, and the user can
grep the name tables without huge effort. The above table is now:

    extern const char HexagonInstrNameData[] = {
      /* 0 */ "G_FLOG10\0"
      /* 9 */ "ENDLOOP0\0"
      /* 18 */ "V6_vdd0\0"
      /* 26 */ "PS_vdd0\0"
      /* 34 */ "V6_ld0\0"
      /* 41 */ "V6_zld0\0"
      [...]
    };

My question then is: Is there a specific technical reason that we should
avoid emitting concatenated string literals rather array of
comma-separated char literals for "string-like" data?

If not, I can probably post a patch, which I feel will make it much
easier to understand the output from tablegen, and helps compilation
speed of generated tables.

Any thoughts appreciated.

All the Best

Luke

--
Codeplay Software Ltd.
Company registered in England and Wales, number: 04567874
Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [tablegen] table readability / performance

Hiroshi Yamauchi via llvm-dev
Hi

On Tue Jan 14, 2020 at 8:27 PM, Reid Kleckner wrote:
> I don't think there's any technical reason for the current structure.
> Your
> change sounds like an improvement to me. I'm curious if you have any
> compile time measurements from this change.

I can't find a good permanent place to dump my test files (they're too
large to paste here, but the gist is):

cd $BUILD
awk '
    /InstrNameData\[\] = / {printing = 1}
    (printing) {print} /};$/
    {printing = 0}
' `grep InstrInfo.inc: build.ninja | awk '{print $2}' | tr -d :` > ~/after.cpp

for both with and without my proposed fix.

...and then:

    clang++ before.cpp -S  0.59s user 0.06s system 96% cpu 0.668 total
    clang++ before.cpp -S  0.62s user 0.02s system 99% cpu 0.643 total
    clang++ before.cpp -S  0.61s user 0.03s system 99% cpu 0.639 total
    clang++ after.cpp -S  0.03s user 0.01s system 99% cpu 0.043 total
    clang++ after.cpp -S  0.03s user 0.02s system 99% cpu 0.049 total
    clang++ after.cpp -S  0.03s user 0.02s system 99% cpu 0.046 total
    g++ before.cpp -S  0.70s user 0.05s system 99% cpu 0.747 total
    g++ before.cpp -S  0.72s user 0.04s system 99% cpu 0.762 total
    g++ before.cpp -S  0.71s user 0.04s system 99% cpu 0.745 total
    g++ after.cpp -S  0.01s user 0.01s system 99% cpu 0.022 total
    g++ after.cpp -S  0.02s user 0.01s system 99% cpu 0.025 total
    g++ after.cpp -S  0.02s user 0.00s system 99% cpu 0.026 total

clang is (Debian) version 8.0.1-4 (tags/RELEASE_801/final)
gcc is (Debian 9.2.1-23) 9.2.1 20200110
Host is x86_64 Linux i7

I'll post the patch to Phabricator as soon as I'm able.

Thanks for your input.

Luke

--
Codeplay Software Ltd.
Company registered in England and Wales, number: 04567874
Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [tablegen] table readability / performance

Hiroshi Yamauchi via llvm-dev
In reply to this post by Hiroshi Yamauchi via llvm-dev
On Tue Jan 14, 2020 at 8:27 PM, Reid Kleckner wrote:
> I don't think there's any technical reason for the current structure.

Apparently
[this](https://docs.microsoft.com/en-us/cpp/cpp/compiler-limits?view=vs-2019)
is a thing.

This results in the following delight happening on Visual Studio 2015:

> fatal error C1091: compiler limit: string exceeds 65535 bytes in length

So maybe another time...

All the Best

Luke
--
Codeplay Software Ltd.
Company registered in England and Wales, number: 04567874
Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [tablegen] table readability / performance

Hiroshi Yamauchi via llvm-dev
Does the same limitation exist in VS 2017? I think that's our support floor these days:

One *could* make the tablegen behavior conditional on the compiler, and generate the character arrays for VS, and strings for other compilers. It has the potential to create MSVC only issues, but it seems harmless enough...

On Wed, Jan 15, 2020 at 10:52 AM Luke Drummond <[hidden email]> wrote:
On Tue Jan 14, 2020 at 8:27 PM, Reid Kleckner wrote:
> I don't think there's any technical reason for the current structure.

Apparently
[this](https://docs.microsoft.com/en-us/cpp/cpp/compiler-limits?view=vs-2019)
is a thing.

This results in the following delight happening on Visual Studio 2015:

> fatal error C1091: compiler limit: string exceeds 65535 bytes in length

So maybe another time...

All the Best

Luke
--
Codeplay Software Ltd.
Company registered in England and Wales, number: 04567874
Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [tablegen] table readability / performance

Hiroshi Yamauchi via llvm-dev
On Wed Jan 15, 2020 at 6:58 PM, Reid Kleckner wrote:
> Does the same limitation exist in VS 2017? I think that's our support
> floor
> these days:
> https://llvm.org/docs/GettingStarted.html#host-c-toolchain-both-compiler-and-standard-library
>
It appears that all releases including the latest 2019 are affected.
>
> One *could* make the tablegen behavior conditional on the compiler, and
> generate the character arrays for VS, and strings for other compilers.
> It
> has the potential to create MSVC only issues, but it seems harmless
> enough...
I can't see any current `#ifdef _MSCV_VER` usage in tablegen but I'm happy to
besmirch it for this.
--
Codeplay Software Ltd.
Company registered in England and Wales, number: 04567874
Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [tablegen] table readability / performance

Hiroshi Yamauchi via llvm-dev
On Wed, Jan 15, 2020 at 11:14 AM Luke Drummond <[hidden email]> wrote:
On Wed Jan 15, 2020 at 6:58 PM, Reid Kleckner wrote:
> Does the same limitation exist in VS 2017? I think that's our support
> floor
> these days:
> https://llvm.org/docs/GettingStarted.html#host-c-toolchain-both-compiler-and-standard-library
>
It appears that all releases including the latest 2019 are affected.
>
> One *could* make the tablegen behavior conditional on the compiler, and
> generate the character arrays for VS, and strings for other compilers.
> It
> has the potential to create MSVC only issues, but it seems harmless
> enough...
I can't see any current `#ifdef _MSCV_VER` usage in tablegen but I'm happy to
besmirch it for this.

I think the cleanest way to do this would be to have a cl::opt to control how string tables are emitted, and then control the default with `#if defined(_MSC_VER) && !defined(__clang__)`. The !clang check is so I get the benefits of this locally with clang-cl. :)

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [tablegen] table readability / performance

Hiroshi Yamauchi via llvm-dev
On Wed, 15 Jan 2020, Reid Kleckner via llvm-dev wrote:

> On Wed, Jan 15, 2020 at 11:14 AM Luke Drummond <[hidden email]>
> wrote:
>       On Wed Jan 15, 2020 at 6:58 PM, Reid Kleckner wrote:
>       > Does the same limitation exist in VS 2017? I think that's our
>       support
>       > floor
>       > these days:
>       >https://llvm.org/docs/GettingStarted.html#host-c-toolchain-both-compiler-an
>       d-standard-library
>       >
>       It appears that all releases including the latest 2019 are
>       affected.
>       >
>       > One *could* make the tablegen behavior conditional on the
>       compiler, and
>       > generate the character arrays for VS, and strings for other
>       compilers.
>       > It
>       > has the potential to create MSVC only issues, but it seems
>       harmless
>       > enough...
>       I can't see any current `#ifdef _MSCV_VER` usage in tablegen but
>       I'm happy to
>       besmirch it for this.
>
>
> I think the cleanest way to do this would be to have a cl::opt to control
> how string tables are emitted, and then control the default with `#if
> defined(_MSC_VER) && !defined(__clang__)`. The !clang check is so I get the
> benefits of this locally with clang-cl. :)

Would this assume that the compiler compiling tablegen would be the same
as compiling the generated code? I do occasionally cross compile from
linux with MSVC, and in those cases, the tablegen is built as a native
linux binary.

// Martin

_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [tablegen] table readability / performance

Hiroshi Yamauchi via llvm-dev
It would. In this case, we would need to add CMake logic to pass the option when running tablegen. I'm not sure how to detect this precise case, though.

On Wed, Jan 15, 2020 at 12:16 PM Martin Storsjö <[hidden email]> wrote:
On Wed, 15 Jan 2020, Reid Kleckner via llvm-dev wrote:

> On Wed, Jan 15, 2020 at 11:14 AM Luke Drummond <[hidden email]>
> wrote:
>       On Wed Jan 15, 2020 at 6:58 PM, Reid Kleckner wrote:
>       > Does the same limitation exist in VS 2017? I think that's our
>       support
>       > floor
>       > these days:
>       >https://llvm.org/docs/GettingStarted.html#host-c-toolchain-both-compiler-an
>       d-standard-library
>       >
>       It appears that all releases including the latest 2019 are
>       affected.
>       >
>       > One *could* make the tablegen behavior conditional on the
>       compiler, and
>       > generate the character arrays for VS, and strings for other
>       compilers.
>       > It
>       > has the potential to create MSVC only issues, but it seems
>       harmless
>       > enough...
>       I can't see any current `#ifdef _MSCV_VER` usage in tablegen but
>       I'm happy to
>       besmirch it for this.
>
>
> I think the cleanest way to do this would be to have a cl::opt to control
> how string tables are emitted, and then control the default with `#if
> defined(_MSC_VER) && !defined(__clang__)`. The !clang check is so I get the
> benefits of this locally with clang-cl. :)

Would this assume that the compiler compiling tablegen would be the same
as compiling the generated code? I do occasionally cross compile from
linux with MSVC, and in those cases, the tablegen is built as a native
linux binary.

// Martin


_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] [tablegen] table readability / performance

Hiroshi Yamauchi via llvm-dev
Hi Reid, Martin

On Wed Jan 15, 2020 at 9:28 PM, Reid Kleckner wrote:
> It would. In this case, we would need to add CMake logic to pass the
> option
> when running tablegen. I'm not sure how to detect this precise case,
> though.
>

Okay. I think I managed a reasonable middle-ground. By default, if
cross-compiling or running cmake with Visual Studio, emit char literal
arrays, otherwise string literals. The user can override at runtime with
a switch.

I've pushed a patch to phab: https://reviews.llvm.org/D73044

Please tag any extra reviewers who may be interested.

Thanks for the input.

All the Best

Luke

--
Codeplay Software Ltd.
Company registered in England and Wales, number: 04567874
Registered office: Regent House, 316 Beulah Hill, London, SE19 3HF
_______________________________________________
LLVM Developers mailing list
[hidden email]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev