LNT BenchmarkGame

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

LNT BenchmarkGame

Renato Golin-2
Hi folks,

I'm investigating the LNT failures on our bot and found that I cannot reproduce BenchmarkGame pass.

I've compiled it with GCC, Clang on both ARM and x86_64, with -O3 or with the arguments that the test-suite passes to it and all I can get is the result below:

Found duplicate: 420094
Found duplicate: 341335
Found duplicate: 150397
Found duplicate: 157527
Found duplicate: 269724

But not the one that is on the reference output:

Found duplicate: 4
Found duplicate: 485365
Found duplicate: 417267
Found duplicate: 436989
Found duplicate: 60067

If I run the LNT on my machine (x86_64) that test fails, and if I change the reference output to the one above, it passes. 

On the ARM buildbot I'm also getting the same results, so I'm really surprised that the x86_64 LNT buildbot is passing. PowerPC is also failing, and I suspect for the same reason.

Is there any chance that the results are not being checked correctly? Any other ideas? I'm tempted to just change the reference output and see what happens with the other bots...

thanks,
--renato

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LNT BenchmarkGame

Tim Northover-2
> Is there any chance that the results are not being checked correctly? Any
> other ideas?

I think I vaguely convinced myself that the infrastructure didn't
actually check whether tests it classified as benchmarks passed or
failed. Not sure I had any good evidence for it other than things like
you're seeing.

> I'm tempted to just change the reference output and see what
> happens with the other bots...

Could be worth a try. But if that thing really is generating random
numbers I'm not sure replacing one genuine cast-iron random number
with another is the best solution long-term.

Tim.
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LNT BenchmarkGame

Renato Golin-2
On 12 March 2013 14:24, Tim Northover <[hidden email]> wrote:
Could be worth a try. But if that thing really is generating random
numbers I'm not sure replacing one genuine cast-iron random number
with another is the best solution long-term.

The test is initializing srand(1), so in theory, it shouldn't be different between compilers, since Clang is using the same libraries.

Also, if the "native" result is generated by GCC, than all problems go away, since the result will be target dependent (or rather, library dependent). Is there a way to turn on the dynamic generation of the native file instead of copying it from the reference_output?

cheers,
--renato

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LNT BenchmarkGame

Tim Northover-2
Hi Renato,

> The test is initializing srand(1), so in theory, it shouldn't be different
> between compilers, since Clang is using the same libraries.

If Clang and GCC disagree on the same source, same machine and with
the same libraries, that certainly is odd. But it doesn't make
checking against the output of a particular libc's RNG any better an
idea in general.

Cheers.

Tim.
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LNT BenchmarkGame

Renato Golin-2
On 12 March 2013 14:36, Tim Northover <[hidden email]> wrote:
If Clang and GCC disagree on the same source, same machine and with
the same libraries, that certainly is odd.

They don't. That's the odd bit. GCC and Clang agree on the output on both ARM and x86_64, and neither agree with the reference_output file. 

What could be happening is that the version of the libraries on that buildbot is old, and both ARM and x86_64 have been updated.

I'm not suggesting we should keep replacing the "golden" file for the new value, but that we should disable checking the reference_output at all, and rely on a GCC vs. Clang comparison.

I agree that the comparison is no better than a reference file (since it, too, could be wrong), but comparing both outputs eliminate any library mismatch and it's less likely that both GCC and Clang will be wrong about exactly the same thing at the same time.

Is there a way to turn off the check against the reference_output and make it check against a GCC executable output?

cheers,
--renato



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LNT BenchmarkGame

Marshall Clow-2
In reply to this post by Tim Northover-2
On Mar 12, 2013, at 7:36 AM, Tim Northover <[hidden email]> wrote:

> Hi Renato,
>
>> The test is initializing srand(1), so in theory, it shouldn't be different
>> between compilers, since Clang is using the same libraries.
>
> If Clang and GCC disagree on the same source, same machine and with
> the same libraries, that certainly is odd. But it doesn't make
> checking against the output of a particular libc's RNG any better an
> idea in general.

I agree; I'm pretty sure that the only guarantee is that for a given implementation of stand, if you initialize it with the same seed, you get the same sequence.

There is no "correct" sequence.

-- Marshall

Marshall Clow     Idio Software   <mailto:[hidden email]>

A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait).
        -- Yu Suzuki


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LNT BenchmarkGame

Renato Golin-2
On 12 March 2013 14:53, Marshall Clow <[hidden email]> wrote:
I agree; I'm pretty sure that the only guarantee is that for a given implementation of stand, if you initialize it with the same seed, you get the same sequence.

There is no "correct" sequence.

I'm not suggesting a correct sequence, I'm just looking for a way to turn off the verification against the reference output and force LNT to run GCC for the "native" output.

cheers,
--renato

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LNT BenchmarkGame

Hal Finkel
----- Original Message -----

> From: "Renato Golin" <[hidden email]>
> To: "Marshall Clow" <[hidden email]>
> Cc: "LLVM Dev" <[hidden email]>
> Sent: Tuesday, March 12, 2013 10:22:41 AM
> Subject: Re: [LLVMdev] LNT BenchmarkGame
>
>
>
> On 12 March 2013 14:53, Marshall Clow < [hidden email] >
> wrote:
>
>
>
>
>
>
> I agree; I'm pretty sure that the only guarantee is that for a given
> implementation of stand, if you initialize it with the same seed,
> you get the same sequence.
>
> There is no "correct" sequence.
>
>
>
> I'm not suggesting a correct sequence, I'm just looking for a way to
> turn off the verification against the reference output and force LNT
> to run GCC for the "native" output.

Can't we just paste in a RNG so that we'll get the same output on all systems (and can still use the reference output)?

 -Hal

>
>
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LNT BenchmarkGame

Renato Golin-2
On 12 March 2013 15:28, Hal Finkel <[hidden email]> wrote:
Can't we just paste in a RNG so that we'll get the same output on all systems (and can still use the reference output)?

We can, though other tests suffer from the same issue. Would be good to have a solution to all of them without pasting the same code on all of them.

I really thought that the native output was always generated by the "native compiler" which is normally GCC. Removing the reference output doesn't work, since it just creates an empty file instead. The Makefile is too simple to mean anything, but maybe there's some environment variable that needs setting to make LNT get the result from a GCC run...

--renato

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LNT BenchmarkGame

Duncan Sands
In reply to this post by Renato Golin-2
Hi Renato,

On 12/03/13 15:33, Renato Golin wrote:

> On 12 March 2013 14:24, Tim Northover <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Could be worth a try. But if that thing really is generating random
>     numbers I'm not sure replacing one genuine cast-iron random number
>     with another is the best solution long-term.
>
>
> The test is initializing srand(1), so in theory, it shouldn't be different
> between compilers, since Clang is using the same libraries.
>
> Also, if the "native" result is generated by GCC, than all problems go away,
> since the result will be target dependent (or rather, library dependent). Is
> there a way to turn on the dynamic generation of the native file instead of
> copying it from the reference_output?

IIRC the reference output is not used by default.  You have to put
   USE_REFERENCE_OUTPUT := 1
in the Makefile in order to make use of the reference output.  As
BenchmarkGame doesn't have this, are you sure the reference output
is causing the problem?

Ciao, Duncan.
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LNT BenchmarkGame

Renato Golin-2
On 12 March 2013 16:21, Duncan Sands <[hidden email]> wrote:
IIRC the reference output is not used by default.  You have to put
  USE_REFERENCE_OUTPUT := 1
in the Makefile in order to make use of the reference output.  As
BenchmarkGame doesn't have this, are you sure the reference output
is causing the problem?

That was my initial assumption, too. But if I just run that test, the Makefile doesn't use GCC at all and only copies the reference_output to the out-nat file.

I then copied a "good" output to the reference_output, and the test passed. I'm intrigued... ;)

Attached is a test.log of a local run. The buildbots' logs are pretty similar on BenchmarkGame.

cheers,
--renato

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

test.log (57K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: LNT BenchmarkGame

Daniel Dunbar
In reply to this post by Renato Golin-2
Hi Renato,

This is probably a platform specific dependency where the Linux output file differs from the Darwin one. I fixed up a lot of those in the past but the random number issue blocks some others. For reference see LLVM r111522.

On my machine I get output that matches the reference output:
--
ddunbar@ozzy-2:BenchmarkGame (master)$ clang puzzle.c && ./a.out
Found duplicate: 4
Found duplicate: 485365
Found duplicate: 417267
Found duplicate: 436989
Found duplicate: 60067
--

The best solution is that which I mention in r111522 - build some extra runtime support code that each benchmark can use, and include a platform stable RNG in it.

 - Daniel


On Tue, Mar 12, 2013 at 6:56 AM, Renato Golin <[hidden email]> wrote:
Hi folks,

I'm investigating the LNT failures on our bot and found that I cannot reproduce BenchmarkGame pass.

I've compiled it with GCC, Clang on both ARM and x86_64, with -O3 or with the arguments that the test-suite passes to it and all I can get is the result below:

Found duplicate: 420094
Found duplicate: 341335
Found duplicate: 150397
Found duplicate: 157527
Found duplicate: 269724

But not the one that is on the reference output:

Found duplicate: 4
Found duplicate: 485365
Found duplicate: 417267
Found duplicate: 436989
Found duplicate: 60067

If I run the LNT on my machine (x86_64) that test fails, and if I change the reference output to the one above, it passes. 

On the ARM buildbot I'm also getting the same results, so I'm really surprised that the x86_64 LNT buildbot is passing. PowerPC is also failing, and I suspect for the same reason.

Is there any chance that the results are not being checked correctly? Any other ideas? I'm tempted to just change the reference output and see what happens with the other bots...

thanks,
--renato


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LNT BenchmarkGame

Daniel Dunbar
In reply to this post by Tim Northover-2
On Tue, Mar 12, 2013 at 7:24 AM, Tim Northover <[hidden email]> wrote:
> Is there any chance that the results are not being checked correctly? Any
> other ideas?

I think I vaguely convinced myself that the infrastructure didn't
actually check whether tests it classified as benchmarks passed or
failed. Not sure I had any good evidence for it other than things like
you're seeing.

This is false.

Every test gets compared against some kind of expected output file (which includes the exit code). The correct output is either:
 a. a reference output file
or
 b. the output from a natively run executable
depending on some of the test parameters.

 - Daniel
 

> I'm tempted to just change the reference output and see what
> happens with the other bots...

Could be worth a try. But if that thing really is generating random
numbers I'm not sure replacing one genuine cast-iron random number
with another is the best solution long-term.

Tim.
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LNT BenchmarkGame

Duncan Sands
In reply to this post by Renato Golin-2
Hi Renato,

On 12/03/13 17:33, Renato Golin wrote:

> On 12 March 2013 16:21, Duncan Sands <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     IIRC the reference output is not used by default.  You have to put
>        USE_REFERENCE_OUTPUT := 1
>     in the Makefile in order to make use of the reference output.  As
>     BenchmarkGame doesn't have this, are you sure the reference output
>     is causing the problem?
>
>
> That was my initial assumption, too. But if I just run that test, the Makefile
> doesn't use GCC at all and only copies the reference_output to the out-nat file.

if you look at the first line of your log

2013-03-12 15:19:41: running: "make" "-k"
"TARGET_LLVMGCC=/home/rengolin/devel/llvm/build/bin/clang" "TARGET_CXX=None"
"LLI_OPTFLAGS=-O3" "TARGET_CC=None"
"TARGET_LLVMGXX=/home/rengolin/devel/llvm/build/bin/clang++" "TEST=simple"
"CC_UNDER_TEST_IS_CLANG=1" "ENABLE_PARALLEL_REPORT=1" "TARGET_FLAGS="
"USE_REFERENCE_OUTPUT=1" "CC_UNDER_TEST_TARGET_IS_X86_64=1" "OPTFLAGS=-O3"
"LLC_OPTFLAGS=-O3" "ENABLE_OPTIMIZED=1" "ARCH=x86_64"
"ENABLE_HASHED_PROGRAM_OUTPUT=1" "DISABLE_JIT=1" "-C"
"SingleSource/Benchmarks/BenchmarkGame" "-j" "8" "report" "report.simple.csv"

then you see that it forces USE_REFERENCE_OUTPUT=1.  Maybe LNT does that?

Ciao, Duncan.

>
> I then copied a "good" output to the reference_output, and the test passed. I'm
> intrigued... ;)
>
> Attached is a test.log of a local run. The buildbots' logs are pretty similar on
> BenchmarkGame.
>
> cheers,
> --renato

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LNT BenchmarkGame

Daniel Dunbar
In reply to this post by Renato Golin-2



On Tue, Mar 12, 2013 at 9:19 AM, Renato Golin <[hidden email]> wrote:
On 12 March 2013 15:28, Hal Finkel <[hidden email]> wrote:
Can't we just paste in a RNG so that we'll get the same output on all systems (and can still use the reference output)?

We can, though other tests suffer from the same issue. Would be good to have a solution to all of them without pasting the same code on all of them.

I really thought that the native output was always generated by the "native compiler" which is normally GCC. Removing the reference output doesn't work, since it just creates an empty file instead. The Makefile is too simple to mean anything, but maybe there's some environment variable that needs setting to make LNT get the result from a GCC run...

The test suite supports multiple modes, one mode in which the native output is generated by an executable built by the native compiler, another in which the output is compared to a reference compiler.

The former mode is historically what the test suite did, the latter mode is substantially faster (and independent of bugs in the native CC).

 - Daniel


--renato

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LNT BenchmarkGame

Renato Golin-2
In reply to this post by Duncan Sands
On 12 March 2013 16:48, Duncan Sands <[hidden email]> wrote:
then you see that it forces USE_REFERENCE_OUTPUT=1.  Maybe LNT does that?

Ha! Well spotted, thanks! ;)

I think we should force it zero on random tests...

--renato 

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LNT BenchmarkGame

Renato Golin-2
In reply to this post by Daniel Dunbar
On 12 March 2013 16:48, Daniel Dunbar <[hidden email]> wrote:
The former mode is historically what the test suite did, the latter mode is substantially faster (and independent of bugs in the native CC).

Yes, I agree this is better for many cases, but not for all. Implementing RNG that is good enough for the tests' purposes, fast enough not to steal the benchmarks' hot spots and does not use target/library-specific code is not trivial. I think that, in this particular case, having bugs in GCC is far less problematic than assuming fixed outputs.

I've tried USE_REFERENCE_OUTPUT := 0 on the Makefile, but the test.log still prints it as 1 (and fails).

cheers,
--renato

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LNT BenchmarkGame

Daniel Dunbar



On Tue, Mar 12, 2013 at 10:23 AM, Renato Golin <[hidden email]> wrote:
On 12 March 2013 16:48, Daniel Dunbar <[hidden email]> wrote:
The former mode is historically what the test suite did, the latter mode is substantially faster (and independent of bugs in the native CC).

Yes, I agree this is better for many cases, but not for all. Implementing RNG that is good enough for the tests' purposes, fast enough not to steal the benchmarks' hot spots and does not use target/library-specific code is not trivial.

This is not true, all one needs to do is replace existing srand(), rand() with some specific platforms version (and those are usually very simple RNGs). If the code is already using srand()/rand() then there is no reason to assume somehow the benchmark is worse if it always used the FreeBSD one, say, as opposed to a platform specific one.
 
 - Daniel

I think that, in this particular case, having bugs in GCC is far less problematic than assuming fixed outputs.

I've tried USE_REFERENCE_OUTPUT := 0 on the Makefile, but the test.log still prints it as 1 (and fails).

cheers,
--renato


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LNT BenchmarkGame

Renato Golin-2
On 12 March 2013 17:30, Daniel Dunbar <[hidden email]> wrote:
If the code is already using srand()/rand() then there is no reason to assume somehow the benchmark is worse if it always used the FreeBSD one, say, as opposed to a platform specific one.

I'm not convinced that running GCC on library-specific tests will be worse than pasting library code inside each test that has a library problem.

In theory, it should just work if we manage to disable USE_REFERENCE_OUTPUT for those particular tests.

cheers,
--renato

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: LNT BenchmarkGame

Hal Finkel
In reply to this post by Daniel Dunbar


----- Original Message -----

> From: "Daniel Dunbar" <[hidden email]>
> To: "Renato Golin" <[hidden email]>
> Cc: "Hal Finkel" <[hidden email]>, "Marshall Clow" <[hidden email]>, "LLVM Dev" <[hidden email]>
> Sent: Tuesday, March 12, 2013 12:30:12 PM
> Subject: Re: [LLVMdev] LNT BenchmarkGame
>
>
>
>
>
>
>
> On Tue, Mar 12, 2013 at 10:23 AM, Renato Golin <
> [hidden email] > wrote:
>
>
>
>
> On 12 March 2013 16:48, Daniel Dunbar < [hidden email] > wrote:
>
>
>
>
>
>
> The former mode is historically what the test suite did, the latter
> mode is substantially faster (and independent of bugs in the native
> CC).
>
>
>
> Yes, I agree this is better for many cases, but not for all.
> Implementing RNG that is good enough for the tests' purposes, fast
> enough not to steal the benchmarks' hot spots and does not use
> target/library-specific code is not trivial.
>
>
> This is not true, all one needs to do is replace existing srand(),
> rand() with some specific platforms version (and those are usually
> very simple RNGs). If the code is already using srand()/rand() then
> there is no reason to assume somehow the benchmark is worse if it
> always used the FreeBSD one, say, as opposed to a platform specific
> one.

+1

There are a couple of example implementations here which are only a few lines long:
http://wiki.osdev.org/Random_Number_Generator

 -Hal

>
> - Daniel
>
>
>
>
>
>
>
> I think that, in this particular case, having bugs in GCC is far less
> problematic than assuming fixed outputs.
>
>
> I've tried USE_REFERENCE_OUTPUT := 0 on the Makefile, but the
> test.log still prints it as 1 (and fails).
>
>
> cheers,
> --renato
>
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
12