[llvm-dev] RFC: Are auto-generated assertions a good practice?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[llvm-dev] RFC: Are auto-generated assertions a good practice?

Dean Michael Berris via llvm-dev

Hello llvm-dev,


On a recent code review I was asked to auto-generate assertion checks for my unit test. I wasn't aware that this was even possible. I am referring to the python `update` scripts under `utils` directory. My first reaction was wow! I found it very practical and useful. It saves you significant amount of time when writing a regression test. So I gave it a try. The generated checks were satisfying enough, almost exactly what I wanted. Then I got a bit sceptical about them. I am worried that auto-generating tests based on the compiler output can be quite dangerous. The tests will always pass regardless of whether the compiler emits right or wrong code, therefore you have to be certain that they impose the desired compiler behaviour. I guess the question here is how often we should be using those scripts.


Regards,

Alexandros



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Are auto-generated assertions a good practice?

Dean Michael Berris via llvm-dev
On 4 May 2018 at 10:25, Alexandros Lamprineas via llvm-dev
<[hidden email]> wrote:

> Hello llvm-dev,
>
>
> On a recent code review I was asked to auto-generate assertion checks for my
> unit test. I wasn't aware that this was even possible. I am referring to the
> python `update` scripts under `utils` directory. My first reaction was wow!
> I found it very practical and useful. It saves you significant amount of
> time when writing a regression test. So I gave it a try. The generated
> checks were satisfying enough, almost exactly what I wanted. Then I got a
> bit sceptical about them. I am worried that auto-generating tests based on
> the compiler output can be quite dangerous. The tests will always pass
> regardless of whether the compiler emits right or wrong code, therefore you
> have to be certain that they impose the desired compiler behaviour. I guess
> the question here is how often we should be using those scripts.

Like many test-related issues, it comes down to personal judgement. It
is of course easy to create test/CodeGen/*/* tests that pass
regardless of whether the compiler breaks broken code regardless of
whether the test CHECK lines are generated by
update_llc_test.checks.py or not.

I find it very helpful to have auto-generated CHECK lines that pick up
any codegen change, but this can of course be problematic for very
large test cases that are likely to see churn due to scheduling or
regallloc changes. Being able to regenerate the CHECK lines and view
the diff is also incredibly helpful when rebasing or moving a patch
between different branches.

My policy for test/Codegen/RISCV is to use update_llc_test_checks.py
wherever possible, except in cases where there are so many CHECK lines
on the output that they obscure the property being tested, indicating
that a more limited hand-crafted pattern would be superior.

Best,

Alex
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Are auto-generated assertions a good practice?

Dean Michael Berris via llvm-dev
Yep - all about balance.

The main risk are tests that overfit (golden files being the worst case - checking that the entire output matches /exactly/ - this is what FileCheck is intended to help avoid) and maintainability. In the case of the autogenerated FileCheck lines I've seen so far - they seem like they still walk a fairly good line of checking exactly what's intended. Though I sometimes wonder if they're checking full combinatorial expansions that may not be required/desirable - always a tradeoff of just how black/white box tests are.

On Fri, May 4, 2018 at 2:56 AM Alex Bradbury via llvm-dev <[hidden email]> wrote:
On 4 May 2018 at 10:25, Alexandros Lamprineas via llvm-dev
<[hidden email]> wrote:
> Hello llvm-dev,
>
>
> On a recent code review I was asked to auto-generate assertion checks for my
> unit test. I wasn't aware that this was even possible. I am referring to the
> python `update` scripts under `utils` directory. My first reaction was wow!
> I found it very practical and useful. It saves you significant amount of
> time when writing a regression test. So I gave it a try. The generated
> checks were satisfying enough, almost exactly what I wanted. Then I got a
> bit sceptical about them. I am worried that auto-generating tests based on
> the compiler output can be quite dangerous. The tests will always pass
> regardless of whether the compiler emits right or wrong code, therefore you
> have to be certain that they impose the desired compiler behaviour. I guess
> the question here is how often we should be using those scripts.

Like many test-related issues, it comes down to personal judgement. It
is of course easy to create test/CodeGen/*/* tests that pass
regardless of whether the compiler breaks broken code regardless of
whether the test CHECK lines are generated by
update_llc_test.checks.py or not.

I find it very helpful to have auto-generated CHECK lines that pick up
any codegen change, but this can of course be problematic for very
large test cases that are likely to see churn due to scheduling or
regallloc changes. Being able to regenerate the CHECK lines and view
the diff is also incredibly helpful when rebasing or moving a patch
between different branches.

My policy for test/Codegen/RISCV is to use update_llc_test_checks.py
wherever possible, except in cases where there are so many CHECK lines
on the output that they obscure the property being tested, indicating
that a more limited hand-crafted pattern would be superior.

Best,

Alex
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Are auto-generated assertions a good practice?

Dean Michael Berris via llvm-dev
I understand the overfit argument (but in most cases it just shows that a unit test isn't minimized)...but I don't see how the complete auto-generated assertions could be worse at detecting a miscompile than incomplete manually-generated assertions?

The whole point of auto-generating complete checks is to catch miscompiles/regressions sooner. Ie, before they get committed and result in bug reports. I don't know how to mine the bugzilla stats on that, but it feels like we have made good progress on that front with better regression tests. Some history of the motivation and scripts here:
https://bugs.llvm.org/show_bug.cgi?id=22897



On Fri, May 4, 2018 at 10:31 AM, David Blaikie via llvm-dev <[hidden email]> wrote:
Yep - all about balance.

The main risk are tests that overfit (golden files being the worst case - checking that the entire output matches /exactly/ - this is what FileCheck is intended to help avoid) and maintainability. In the case of the autogenerated FileCheck lines I've seen so far - they seem like they still walk a fairly good line of checking exactly what's intended. Though I sometimes wonder if they're checking full combinatorial expansions that may not be required/desirable - always a tradeoff of just how black/white box tests are.

On Fri, May 4, 2018 at 2:56 AM Alex Bradbury via llvm-dev <[hidden email]> wrote:
On 4 May 2018 at 10:25, Alexandros Lamprineas via llvm-dev
<[hidden email]> wrote:
> Hello llvm-dev,
>
>
> On a recent code review I was asked to auto-generate assertion checks for my
> unit test. I wasn't aware that this was even possible. I am referring to the
> python `update` scripts under `utils` directory. My first reaction was wow!
> I found it very practical and useful. It saves you significant amount of
> time when writing a regression test. So I gave it a try. The generated
> checks were satisfying enough, almost exactly what I wanted. Then I got a
> bit sceptical about them. I am worried that auto-generating tests based on
> the compiler output can be quite dangerous. The tests will always pass
> regardless of whether the compiler emits right or wrong code, therefore you
> have to be certain that they impose the desired compiler behaviour. I guess
> the question here is how often we should be using those scripts.

Like many test-related issues, it comes down to personal judgement. It
is of course easy to create test/CodeGen/*/* tests that pass
regardless of whether the compiler breaks broken code regardless of
whether the test CHECK lines are generated by
update_llc_test.checks.py or not.

I find it very helpful to have auto-generated CHECK lines that pick up
any codegen change, but this can of course be problematic for very
large test cases that are likely to see churn due to scheduling or
regallloc changes. Being able to regenerate the CHECK lines and view
the diff is also incredibly helpful when rebasing or moving a patch
between different branches.

My policy for test/Codegen/RISCV is to use update_llc_test_checks.py
wherever possible, except in cases where there are so many CHECK lines
on the output that they obscure the property being tested, indicating
that a more limited hand-crafted pattern would be superior.

Best,

Alex
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Are auto-generated assertions a good practice?

Dean Michael Berris via llvm-dev


On Fri, May 4, 2018 at 10:16 AM Sanjay Patel <[hidden email]> wrote:
I understand the overfit argument (but in most cases it just shows that a unit test isn't minimized)...

Even minimized tests sometimes need a few other things to setup the circumstance (many DWARF tests, for example - produce the full DWARF output, but maybe you only care about one part of it (maybe you care about the encoding of instruction ranges, but not the way names are rendered - but there's no option to produce only ranges without names (wouldn't really make sense)).
 
but I don't see how the complete auto-generated assertions could be worse at detecting a miscompile than incomplete manually-generated assertions?

Generally overfitting wouldn't result in being bad at detecting failures, but excess false positives - if things in the output unrelated to the issue change (in intentional/benign ways) & cause the tests to fail often & become a burden for the project.

Not suggesting that's the case with these particular instances - but it's a tradeoff/concern to keep in mind.

- Dave
 
The whole point of auto-generating complete checks is to catch miscompiles/regressions sooner. Ie, before they get committed and result in bug reports. I don't know how to mine the bugzilla stats on that, but it feels like we have made good progress on that front with better regression tests. Some history of the motivation and scripts here:
https://bugs.llvm.org/show_bug.cgi?id=22897



On Fri, May 4, 2018 at 10:31 AM, David Blaikie via llvm-dev <[hidden email]> wrote:
Yep - all about balance.

The main risk are tests that overfit (golden files being the worst case - checking that the entire output matches /exactly/ - this is what FileCheck is intended to help avoid) and maintainability. In the case of the autogenerated FileCheck lines I've seen so far - they seem like they still walk a fairly good line of checking exactly what's intended. Though I sometimes wonder if they're checking full combinatorial expansions that may not be required/desirable - always a tradeoff of just how black/white box tests are.

On Fri, May 4, 2018 at 2:56 AM Alex Bradbury via llvm-dev <[hidden email]> wrote:
On 4 May 2018 at 10:25, Alexandros Lamprineas via llvm-dev
<[hidden email]> wrote:
> Hello llvm-dev,
>
>
> On a recent code review I was asked to auto-generate assertion checks for my
> unit test. I wasn't aware that this was even possible. I am referring to the
> python `update` scripts under `utils` directory. My first reaction was wow!
> I found it very practical and useful. It saves you significant amount of
> time when writing a regression test. So I gave it a try. The generated
> checks were satisfying enough, almost exactly what I wanted. Then I got a
> bit sceptical about them. I am worried that auto-generating tests based on
> the compiler output can be quite dangerous. The tests will always pass
> regardless of whether the compiler emits right or wrong code, therefore you
> have to be certain that they impose the desired compiler behaviour. I guess
> the question here is how often we should be using those scripts.

Like many test-related issues, it comes down to personal judgement. It
is of course easy to create test/CodeGen/*/* tests that pass
regardless of whether the compiler breaks broken code regardless of
whether the test CHECK lines are generated by
update_llc_test.checks.py or not.

I find it very helpful to have auto-generated CHECK lines that pick up
any codegen change, but this can of course be problematic for very
large test cases that are likely to see churn due to scheduling or
regallloc changes. Being able to regenerate the CHECK lines and view
the diff is also incredibly helpful when rebasing or moving a patch
between different branches.

My policy for test/Codegen/RISCV is to use update_llc_test_checks.py
wherever possible, except in cases where there are so many CHECK lines
on the output that they obscure the property being tested, indicating
that a more limited hand-crafted pattern would be superior.

Best,

Alex
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Are auto-generated assertions a good practice?

Dean Michael Berris via llvm-dev


On Fri, May 4, 2018 at 11:30 AM, David Blaikie <[hidden email]> wrote:


On Fri, May 4, 2018 at 10:16 AM Sanjay Patel <[hidden email]> wrote:
I understand the overfit argument (but in most cases it just shows that a unit test isn't minimized)...

Even minimized tests sometimes need a few other things to setup the circumstance (many DWARF tests, for example - produce the full DWARF output, but maybe you only care about one part of it (maybe you care about the encoding of instruction ranges, but not the way names are rendered - but there's no option to produce only ranges without names (wouldn't really make sense)).
 
but I don't see how the complete auto-generated assertions could be worse at detecting a miscompile than incomplete manually-generated assertions?

Generally overfitting wouldn't result in being bad at detecting failures, but excess false positives - if things in the output unrelated to the issue change (in intentional/benign ways) & cause the tests to fail often & become a burden for the project.

Not suggesting that's the case with these particular instances - but it's a tradeoff/concern to keep in mind.

Yes, that's definitely something we've acknowledged for x86 codegen tests. If someone wants to make a big isel, scheduler, or RA change, it's going to result in a *lot* of regression test diffs. On the plus side, it forces the author (I've definitely been there) to consider a lot of patterns that they probably didn't think about it initially.
 
Note that for IR assertions, the scripts reg-ex-ify value names:

define i1 @trueval_is_true(i1 %C, i1 %X) {
; CHECK-LABEL: @trueval_is_true(
; CHECK-NEXT:    [[R:%.*]] = or i1 [[C:%.*]], [[X:%.*]]
; CHECK-NEXT:    ret i1 [[R]]
;
  %R = select i1 %C, i1 true, i1 %X
  ret i1 %R
}

So this actually insulates the tests from cosmetic changes in value naming. That's something that's frequently overlooked in hand-written tests because it takes a lot of effort to type all that out. :)


 
- Dave
 
The whole point of auto-generating complete checks is to catch miscompiles/regressions sooner. Ie, before they get committed and result in bug reports. I don't know how to mine the bugzilla stats on that, but it feels like we have made good progress on that front with better regression tests. Some history of the motivation and scripts here:
https://bugs.llvm.org/show_bug.cgi?id=22897



On Fri, May 4, 2018 at 10:31 AM, David Blaikie via llvm-dev <[hidden email]> wrote:
Yep - all about balance.

The main risk are tests that overfit (golden files being the worst case - checking that the entire output matches /exactly/ - this is what FileCheck is intended to help avoid) and maintainability. In the case of the autogenerated FileCheck lines I've seen so far - they seem like they still walk a fairly good line of checking exactly what's intended. Though I sometimes wonder if they're checking full combinatorial expansions that may not be required/desirable - always a tradeoff of just how black/white box tests are.

On Fri, May 4, 2018 at 2:56 AM Alex Bradbury via llvm-dev <[hidden email]> wrote:
On 4 May 2018 at 10:25, Alexandros Lamprineas via llvm-dev
<[hidden email]> wrote:
> Hello llvm-dev,
>
>
> On a recent code review I was asked to auto-generate assertion checks for my
> unit test. I wasn't aware that this was even possible. I am referring to the
> python `update` scripts under `utils` directory. My first reaction was wow!
> I found it very practical and useful. It saves you significant amount of
> time when writing a regression test. So I gave it a try. The generated
> checks were satisfying enough, almost exactly what I wanted. Then I got a
> bit sceptical about them. I am worried that auto-generating tests based on
> the compiler output can be quite dangerous. The tests will always pass
> regardless of whether the compiler emits right or wrong code, therefore you
> have to be certain that they impose the desired compiler behaviour. I guess
> the question here is how often we should be using those scripts.

Like many test-related issues, it comes down to personal judgement. It
is of course easy to create test/CodeGen/*/* tests that pass
regardless of whether the compiler breaks broken code regardless of
whether the test CHECK lines are generated by
update_llc_test.checks.py or not.

I find it very helpful to have auto-generated CHECK lines that pick up
any codegen change, but this can of course be problematic for very
large test cases that are likely to see churn due to scheduling or
regallloc changes. Being able to regenerate the CHECK lines and view
the diff is also incredibly helpful when rebasing or moving a patch
between different branches.

My policy for test/Codegen/RISCV is to use update_llc_test_checks.py
wherever possible, except in cases where there are so many CHECK lines
on the output that they obscure the property being tested, indicating
that a more limited hand-crafted pattern would be superior.

Best,

Alex
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev




_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Reply | Threaded
Open this post in threaded view
|

Re: [llvm-dev] RFC: Are auto-generated assertions a good practice?

Dean Michael Berris via llvm-dev


On Fri, May 4, 2018 at 10:53 AM Sanjay Patel <[hidden email]> wrote:
On Fri, May 4, 2018 at 11:30 AM, David Blaikie <[hidden email]> wrote:


On Fri, May 4, 2018 at 10:16 AM Sanjay Patel <[hidden email]> wrote:
I understand the overfit argument (but in most cases it just shows that a unit test isn't minimized)...

Even minimized tests sometimes need a few other things to setup the circumstance (many DWARF tests, for example - produce the full DWARF output, but maybe you only care about one part of it (maybe you care about the encoding of instruction ranges, but not the way names are rendered - but there's no option to produce only ranges without names (wouldn't really make sense)).
 
but I don't see how the complete auto-generated assertions could be worse at detecting a miscompile than incomplete manually-generated assertions?

Generally overfitting wouldn't result in being bad at detecting failures, but excess false positives - if things in the output unrelated to the issue change (in intentional/benign ways) & cause the tests to fail often & become a burden for the project.

Not suggesting that's the case with these particular instances - but it's a tradeoff/concern to keep in mind.

Yes, that's definitely something we've acknowledged for x86 codegen tests. If someone wants to make a big isel, scheduler, or RA change, it's going to result in a *lot* of regression test diffs. On the plus side, it forces the author (I've definitely been there) to consider a lot of patterns that they probably didn't think about it initially.

*nod* Though if we had good ways of isolating tests further to avoid this churn, I think we would do so - even at the loss of that consideration.
 
 Note that for IR assertions, the scripts reg-ex-ify value names:

*nod* that's one example of the sort of good thing that avoids tests being noisier than they need to be, etc.
 

define i1 @trueval_is_true(i1 %C, i1 %X) {
; CHECK-LABEL: @trueval_is_true(
; CHECK-NEXT:    [[R:%.*]] = or i1 [[C:%.*]], [[X:%.*]]
; CHECK-NEXT:    ret i1 [[R]]
;
  %R = select i1 %C, i1 true, i1 %X
  ret i1 %R
}

So this actually insulates the tests from cosmetic changes in value naming. That's something that's frequently overlooked in hand-written tests because it takes a lot of effort to type all that out. :)

That's not been my experience - though it is a bit laborious, most/all of the hand written tests I see these days tend to do a good job of using matches for value names rather than hardcoding them. But I don't write/work with optimization tests so much - maybe they're often small enough that the names are rarely perturbed & so hardcoding can be reasonable. In debug info and clang tests it's pretty necessary because there's lots of moving parts.

In any case I'm not suggesting the autogenerated checks are bad/broken/need to be fixed - just that there are tradeoffs to keep in mind when working with these things.

- Dave
 


 
- Dave
 
The whole point of auto-generating complete checks is to catch miscompiles/regressions sooner. Ie, before they get committed and result in bug reports. I don't know how to mine the bugzilla stats on that, but it feels like we have made good progress on that front with better regression tests. Some history of the motivation and scripts here:
https://bugs.llvm.org/show_bug.cgi?id=22897



On Fri, May 4, 2018 at 10:31 AM, David Blaikie via llvm-dev <[hidden email]> wrote:
Yep - all about balance.

The main risk are tests that overfit (golden files being the worst case - checking that the entire output matches /exactly/ - this is what FileCheck is intended to help avoid) and maintainability. In the case of the autogenerated FileCheck lines I've seen so far - they seem like they still walk a fairly good line of checking exactly what's intended. Though I sometimes wonder if they're checking full combinatorial expansions that may not be required/desirable - always a tradeoff of just how black/white box tests are.

On Fri, May 4, 2018 at 2:56 AM Alex Bradbury via llvm-dev <[hidden email]> wrote:
On 4 May 2018 at 10:25, Alexandros Lamprineas via llvm-dev
<[hidden email]> wrote:
> Hello llvm-dev,
>
>
> On a recent code review I was asked to auto-generate assertion checks for my
> unit test. I wasn't aware that this was even possible. I am referring to the
> python `update` scripts under `utils` directory. My first reaction was wow!
> I found it very practical and useful. It saves you significant amount of
> time when writing a regression test. So I gave it a try. The generated
> checks were satisfying enough, almost exactly what I wanted. Then I got a
> bit sceptical about them. I am worried that auto-generating tests based on
> the compiler output can be quite dangerous. The tests will always pass
> regardless of whether the compiler emits right or wrong code, therefore you
> have to be certain that they impose the desired compiler behaviour. I guess
> the question here is how often we should be using those scripts.

Like many test-related issues, it comes down to personal judgement. It
is of course easy to create test/CodeGen/*/* tests that pass
regardless of whether the compiler breaks broken code regardless of
whether the test CHECK lines are generated by
update_llc_test.checks.py or not.

I find it very helpful to have auto-generated CHECK lines that pick up
any codegen change, but this can of course be problematic for very
large test cases that are likely to see churn due to scheduling or
regallloc changes. Being able to regenerate the CHECK lines and view
the diff is also incredibly helpful when rebasing or moving a patch
between different branches.

My policy for test/Codegen/RISCV is to use update_llc_test_checks.py
wherever possible, except in cases where there are so many CHECK lines
on the output that they obscure the property being tested, indicating
that a more limited hand-crafted pattern would be superior.

Best,

Alex
_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev