New -O3 Performance tester - Use hardware to get reliable numbers

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

New -O3 Performance tester - Use hardware to get reliable numbers

Tobias Grosser-5
Hi,

I would like to announce a new set of LNT -O3 performance testers.

In a discussion titled "Question about results reliability in LNT
infrustructure" Anton suggested that one way to get statistically
reliable test results from the LNT infrastructure is to use a larger
sample size (5-10) as well as a more robust statistical test
(Wilcoxon/Mann-Whitney). Another requirement to make the performance
results we get from our testers useful is to have a per-commit
performance run.

I would like to announce that I set up 4 identical machines* that
publicly report LNT results for 'clang -O3' at:

http://llvm.org/perf/db_default/v4/nts/machine/34

We currently catch in average groups of 3-5 commits. As most commits
obviously do not impact performance this seems to be enough to track
down performance regressions/changes easily.

The results that have been reported so far seem to provide sufficient
information to catch performance changes. Specifically, when setting the
aggregation function to median, most runs are shown to not impact
performance:

e.g:
http://llvm.org/perf/db_default/v4/nts/19939?num_comparison_runs=10&test_filter=&test_min_value_filter=&aggregation_fn=median&compare_to=19934&submit=Update

We still have a couple of runs that report performance differences, but
where looking at the performance graph of the changed test cases makes
it very clear that those are false positives due to test case noise.

Here comes the point of this mail. I am currently not sure when I find
time to improve the LNT infrastructure to take advantage of the data
provided. So in case someone else would like to have a look and e.g. add
the Wilcoxon/Mann-Whitney test this would be highly appreciated.

I also have a couple of more machines. Hence, if the LNT infrastructure
is in place we can use them to increase the reliability of the results
even more.

Cheers,
Tobias

* Also have sufficiently close performance characteristics when running
LNT tests for the same version
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: New -O3 Performance tester - Use hardware to get reliable numbers

Sean Silva-2



On Tue, Jan 7, 2014 at 11:06 AM, Tobias Grosser <[hidden email]> wrote:
Hi,

I would like to announce a new set of LNT -O3 performance testers.

In a discussion titled "Question about results reliability in LNT infrustructure" Anton suggested that one way to get statistically reliable test results from the LNT infrastructure is to use a larger sample size (5-10) as well as a more robust statistical test (Wilcoxon/Mann-Whitney). Another requirement to make the performance results we get from our testers useful is to have a per-commit performance run.

I would like to announce that I set up 4 identical machines* that publicly report LNT results for 'clang -O3' at:

http://llvm.org/perf/db_default/v4/nts/machine/34

We currently catch in average groups of 3-5 commits. As most commits obviously do not impact performance this seems to be enough to track down performance regressions/changes easily.

If possible, I think it would be a good idea to filter out commits that don't affect code generation. This would allow machine resources to be better used.

Is there some way we can easily filter commits based on whether they affect code generation or not? Would it be reliable enough to check if the commit touches any of our integration tests?

As a rough estimate:

sean:~/pg/llvm/llvm % git log --oneline --since='1 month ago' | wc -l
706
sean:~/pg/llvm/llvm % git log --oneline --since='1 month ago' ./test | wc -l
317

So it seems like if this is reasonable we can effectively double our performance testing coverage by filtering like this.

-- Sean Silva
 

The results that have been reported so far seem to provide sufficient information to catch performance changes. Specifically, when setting the aggregation function to median, most runs are shown to not impact performance:

e.g: http://llvm.org/perf/db_default/v4/nts/19939?num_comparison_runs=10&test_filter=&test_min_value_filter=&aggregation_fn=median&compare_to=19934&submit=Update

We still have a couple of runs that report performance differences, but where looking at the performance graph of the changed test cases makes it very clear that those are false positives due to test case noise.

Here comes the point of this mail. I am currently not sure when I find time to improve the LNT infrastructure to take advantage of the data provided. So in case someone else would like to have a look and e.g. add the Wilcoxon/Mann-Whitney test this would be highly appreciated.

I also have a couple of more machines. Hence, if the LNT infrastructure is in place we can use them to increase the reliability of the results even more.

Cheers,
Tobias

* Also have sufficiently close performance characteristics when running LNT tests for the same version
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: New -O3 Performance tester - Use hardware to get reliable numbers

Tobias Grosser-5
On 01/08/2014 02:48 AM, Sean Silva wrote:

> On Tue, Jan 7, 2014 at 11:06 AM, Tobias Grosser <[hidden email]> wrote:
>
>> Hi,
>>
>> I would like to announce a new set of LNT -O3 performance testers.
>>
>> In a discussion titled "Question about results reliability in LNT
>> infrustructure" Anton suggested that one way to get statistically reliable
>> test results from the LNT infrastructure is to use a larger sample size
>> (5-10) as well as a more robust statistical test (Wilcoxon/Mann-Whitney).
>> Another requirement to make the performance results we get from our testers
>> useful is to have a per-commit performance run.
>>
>> I would like to announce that I set up 4 identical machines* that publicly
>> report LNT results for 'clang -O3' at:
>>
>> http://llvm.org/perf/db_default/v4/nts/machine/34
>>
>> We currently catch in average groups of 3-5 commits. As most commits
>> obviously do not impact performance this seems to be enough to track down
>> performance regressions/changes easily.
>>
>
> If possible, I think it would be a good idea to filter out commits that
> don't affect code generation. This would allow machine resources to be
> better used.
>
> Is there some way we can easily filter commits based on whether they affect
> code generation or not? Would it be reliable enough to check if the commit
> touches any of our integration tests?
>
> As a rough estimate:
>
> sean:~/pg/llvm/llvm % git log --oneline --since='1 month ago' | wc -l
> 706
> sean:~/pg/llvm/llvm % git log --oneline --since='1 month ago' ./test | wc -l
> 317
>
> So it seems like if this is reasonable we can effectively double our
> performance testing coverage by filtering like this.

Hi Sean,

this is a very interesting idea. Though I have no idea if checking for
'test/ this will be enough or not. If we keep the performance tester
running for a while, we can probably validate this assumption by
checking if runs that do not contain integration tests showed
performance changes (and what kind of changes).

As said before, I would be glad if I could get help with further
improvements on the software side.

Cheers,
Tobias
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: New -O3 Performance tester - Use hardware to get reliable numbers

Diego Novillo-3
In reply to this post by Sean Silva-2
On Tue, Jan 7, 2014 at 8:48 PM, Sean Silva <[hidden email]> wrote:

> sean:~/pg/llvm/llvm % git log --oneline --since='1 month ago' | wc -l
> 706
> sean:~/pg/llvm/llvm % git log --oneline --since='1 month ago' ./test | wc -l
> 317

Wouldn't this also catch commits to code generation that added tests as well?


Diego.
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: New -O3 Performance tester - Use hardware to get reliable numbers

Sean Silva-2



On Wed, Jan 8, 2014 at 7:58 AM, Diego Novillo <[hidden email]> wrote:
On Tue, Jan 7, 2014 at 8:48 PM, Sean Silva <[hidden email]> wrote:

> sean:~/pg/llvm/llvm % git log --oneline --since='1 month ago' | wc -l
> 706
> sean:~/pg/llvm/llvm % git log --oneline --since='1 month ago' ./test | wc -l
> 317

Wouldn't this also catch commits to code generation that added tests as well?


I'm not sure what you mean or how it would affect what I'm saying. Any commit that affects code generation should include a test (there may be some rare exceptions, but this is the general rule).

-- Sean Silva
 

Diego.


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: New -O3 Performance tester - Use hardware to get reliable numbers

Diego Novillo-3
On Wed, Jan 8, 2014 at 4:07 PM, Sean Silva <[hidden email]> wrote:

>
>
>
> On Wed, Jan 8, 2014 at 7:58 AM, Diego Novillo <[hidden email]> wrote:
>>
>> On Tue, Jan 7, 2014 at 8:48 PM, Sean Silva <[hidden email]> wrote:
>>
>> > sean:~/pg/llvm/llvm % git log --oneline --since='1 month ago' | wc -l
>> > 706
>> > sean:~/pg/llvm/llvm % git log --oneline --since='1 month ago' ./test |
>> > wc -l
>> > 317
>>
>> Wouldn't this also catch commits to code generation that added tests as
>> well?
>>
>
> I'm not sure what you mean or how it would affect what I'm saying. Any
> commit that affects code generation should include a test (there may be some
> rare exceptions, but this is the general rule).

If you pruned the commits that include directory ./test, won't that
prune the associated codegen patch as well? This would cause coverage
loss for the tester.

Or maybe I just did not understand what you were proposing :)


Diego.
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: New -O3 Performance tester - Use hardware to get reliable numbers

David Tweed-2
Hi Diego,

I think what Sean was saying was to collect (filter) only commits that
_do_ touch the tests subtree. The git example is saying that, in a
time period when there were 706, there were only 317 that touched
tests and hence need checking (hence the comment about doubling).
("Filter" is annoyingly ambiguous verb in English because it can mean
either that you're keeping the things stopped by the filter, or that
you're throwing away things stopped by the filter. I could never
remember which one Haskell's filter function did and always had to
look it up.)

On Thu, Jan 9, 2014 at 4:42 PM, Diego Novillo <[hidden email]> wrote:

> On Wed, Jan 8, 2014 at 4:07 PM, Sean Silva <[hidden email]> wrote:
>>
>>
>>
>> On Wed, Jan 8, 2014 at 7:58 AM, Diego Novillo <[hidden email]> wrote:
>>>
>>> On Tue, Jan 7, 2014 at 8:48 PM, Sean Silva <[hidden email]> wrote:
>>>
>>> > sean:~/pg/llvm/llvm % git log --oneline --since='1 month ago' | wc -l
>>> > 706
>>> > sean:~/pg/llvm/llvm % git log --oneline --since='1 month ago' ./test |
>>> > wc -l
>>> > 317
>>>
>>> Wouldn't this also catch commits to code generation that added tests as
>>> well?
>>>
>>
>> I'm not sure what you mean or how it would affect what I'm saying. Any
>> commit that affects code generation should include a test (there may be some
>> rare exceptions, but this is the general rule).
>
> If you pruned the commits that include directory ./test, won't that
> prune the associated codegen patch as well? This would cause coverage
> loss for the tester.
>
> Or maybe I just did not understand what you were proposing :)
>
>
> Diego.
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



--
cheers, dave tweed__________________________
high-performance computing and machine vision expert: [hidden email]
"while having code so boring anyone can maintain it, use Python." --
attempted insult seen on slashdot
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: New -O3 Performance tester - Use hardware to get reliable numbers

Diego Novillo-3
On Thu, Jan 9, 2014 at 11:50 AM, David Tweed <[hidden email]> wrote:
> Hi Diego,
>
> I think what Sean was saying was to collect (filter) only commits that
> _do_ touch the tests subtree.

Ah, the exact opposite of what I understood. OK, yes, that makes sense
(assuming every codegen patch has an associated test, which should be
a pretty safe assumption).


Diego.
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev