llvmlab (phased buildmaster) is in production mode!

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

llvmlab (phased buildmaster) is in production mode!

Michael Gottesman
Hello LLVM Dev and Clang Dev!

David Dean and I just finished bringing up a new build master on lab.llvm.org, llvmlab, which is located at the url http://lab.llvm.org:8013 and is in #llvm under the username llvmlab.

llvmlab is different than the current buildbot based continuous integration systems llvm uses; llvmlab is a phased builder based system. The high level details of the phased builder system are as follows:

1. Builders are grouped together into phases as sub builders. Each phase is a builder itself and triggers said grouped ``sub builders'' and assuming that all of the sub builders complete, triggers its successor phased builder if one exists. This creates a gating like effect.

2. All phases are gated together in a linear fashion, yielding via the linearity the phenomena that if an earlier phase does not succeed, then no later phases run. The key idea here is if we know that there is a fundamental issue with the compiler why try to build 20 compilers, when performing one quick build is all that is needed to ascertain such a fact? Also if we can not build a compiler successfully, why try to do LNT performance runs? This gets rid of pointless work, stops excessive emails from being sent out for 1 bad commit, and reduces cycle time especially if certain builds take significantly longer than others to fail.

3. Later phases do broader, longer lasting testing than earlier phases. Thus the 4 phases we currently have are:

a. Phase 1 (sanity):          Phase 1 is a quick non-bootstrapped, non-lto compiler build, to check the ``basic sanity'' of the code base and build process. This generally takes 15-20 minutes to complete.
b. Phase 2 (living on):      Phase 2 builds bootstrapped compilers for the different configurations that can be used for ``living on'', i.e. good enough for common compilation tasks. This is meant to cycle in up to an hour.
c. Phase 3 (tree health):  Phase 3 runs performance tests, i.e., LNT (which are not live yet) as well as other compiler builds which take a longer amount of time (a full clang build with LTO enabled for instance).
d. Phase 4 (validation):   Phase 4 runs longer running validation tests. Currently we have nothing in phase 4, but I am sure that will change = p.

4. Builders in later phases rely on outputs from earlier phases. If we are doing performance runs, why should we compile a new compiler for such a performance run? This is duplicated work! Instead the phased build system stores ``artifacts'', i.e., built compilers, in earlier phases and uses them as compilers for builds in later phases. Thus we could have 30 different Phase 3 LNT performance runs with different configurations all using the same compiler artifacts built in phase 2. This significantly deduplicates work yielding a decreased cycle time.

As time moves on we will be moving more and more builders to llvmlab including LNT performance builders and a builder which runs the libcxx test suite.

Michael


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: llvmlab (phased buildmaster) is in production mode!

Reed Kotler
Most of the selections, like "console" for example, do not work when I
click on them.

On 03/27/2013 03:57 PM, Michael Gottesman wrote:

> Hello LLVM Dev and Clang Dev!
>
> David Dean and I just finished bringing up a new build master on
> lab.llvm.org <http://lab.llvm.org/>, llvmlab, which is located at the
> url http://lab.llvm.org:8013 <http://lab.llvm.org:8013/> and is in #llvm
> under the username llvmlab.
>
> llvmlab is different than the current buildbot based continuous
> integration systems llvm uses; llvmlab is a phased builder based system.
> The high level details of the phased builder system are as follows:
>
> 1. Builders are grouped together into phases as sub builders. Each phase
> is a builder itself and triggers said grouped ``sub builders'' and
> assuming that all of the sub builders complete, triggers its successor
> phased builder if one exists. This creates a gating like effect.
>
> 2. All phases are gated together in a linear fashion, yielding via the
> linearity the phenomena that if an earlier phase does not succeed, then
> no later phases run. The key idea here is if we know that there is a
> fundamental issue with the compiler why try to build 20 compilers, when
> performing one quick build is all that is needed to ascertain such a
> fact? Also if we can not build a compiler successfully, why try to do
> LNT performance runs? This gets rid of pointless work, stops excessive
> emails from being sent out for 1 bad commit, and reduces cycle time
> especially if certain builds take significantly longer than others to fail.
>
> 3. Later phases do broader, longer lasting testing than earlier phases.
> Thus the 4 phases we currently have are:
>
> a. Phase 1 (sanity):          Phase 1 is a quick non-bootstrapped,
> non-lto compiler build, to check the ``basic sanity'' of the code base
> and build process. This generally takes 15-20 minutes to complete.
> b. Phase 2 (living on):      Phase 2 builds bootstrapped compilers for
> the different configurations that can be used for ``living on'', i.e.
> good enough for common compilation tasks. This is meant to cycle in up
> to an hour.
> c. Phase 3 (tree health):  Phase 3 runs performance tests, i.e., LNT
> (which are not live yet) as well as other compiler builds which take a
> longer amount of time (a full clang build with LTO enabled for instance).
> d. Phase 4 (validation):   Phase 4 runs longer running validation tests.
> Currently we have nothing in phase 4, but I am sure that will change = p.
>
> 4. Builders in later phases rely on outputs from earlier phases. If we
> are doing performance runs, why should we compile a new compiler for
> such a performance run? This is duplicated work! Instead the phased
> build system stores ``artifacts'', i.e., built compilers, in earlier
> phases and uses them as compilers for builds in later phases. Thus we
> could have 30 different Phase 3 LNT performance runs with different
> configurations all using the same compiler artifacts built in phase 2.
> This significantly deduplicates work yielding a decreased cycle time.
>
> As time moves on we will be moving more and more builders to llvmlab
> including LNT performance builders and a builder which runs the libcxx
> test suite.
>
> Michael
>
>
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [cfe-dev] llvmlab (phased buildmaster) is in production mode!

Sean Silva
In reply to this post by Michael Gottesman
Cool! This is great news.

I feel like this information should be in our documentation somewhere. Could you start a new file ContinuousIntegration.rst and use this content to seed it? This new page would also be a good place to mention some LLVM idiosyncrasies like smooshlab being Apple-internal but still reporting via IRC; these things have not had a good place to be put yet. AFAIK currently our continuous integration infrastructure is mostly community wisdom and besides a small mention of some of the reporting bots on index.rst there is no documentation describing it.

-- Sean Silva

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [cfe-dev] llvmlab (phased buildmaster) is in production mode!

Chandler Carruth-2
In reply to this post by Michael Gottesman

On Wed, Mar 27, 2013 at 3:57 PM, Michael Gottesman <[hidden email]> wrote:
3. Later phases do broader, longer lasting testing than earlier phases. Thus the 4 phases we currently have are:

a. Phase 1 (sanity):          Phase 1 is a quick non-bootstrapped, non-lto compiler build, to check the ``basic sanity'' of the code base and build process. This generally takes 15-20 minutes to complete.

While most of this sounds great, this one really doesn't.

The sanity tests should be able to run *much* faster than 15-20 minutes. Can we prioritize getting an incremental rebuild bot as the sanity phase on reasonably fast hardware? I think it's important to get through the sanity phase in 1-5 minutes so that phase 2 doesn't get soooo many commits piled up on it when someone checks in code with both a miscompile and a tiny small build break (for example).

-Chandler

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [cfe-dev] llvmlab (phased buildmaster) is in production mode!

Michael Gottesman
In reply to this post by Sean Silva
Smooshlab is going away so the LLVM idiosyncrasy that you mention will be going away soon = p. The whole idea behind bringing up this infrastructure is so that everything that is Apple-internal but COULD be public is public and is on the phased builders. Anything we can't bring out will (of course) stay internal. Also I agree about the documentation.

Michael

On Mar 27, 2013, at 5:58 PM, Sean Silva <[hidden email]> wrote:

> Cool! This is great news.
>
> I feel like this information should be in our documentation somewhere. Could you start a new file ContinuousIntegration.rst and use this content to seed it? This new page would also be a good place to mention some LLVM idiosyncrasies like smooshlab being Apple-internal but still reporting via IRC; these things have not had a good place to be put yet. AFAIK currently our continuous integration infrastructure is mostly community wisdom and besides a small mention of some of the reporting bots on index.rst there is no documentation describing it.
>
> -- Sean Silva

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: llvmlab (phased buildmaster) is in production mode!

Michael Gottesman
In reply to this post by Reed Kotler
This is a known issue. I have actually screwed with this before it is just a matter of getting it into zorg in the right manner.

On Mar 27, 2013, at 5:50 PM, Reed Kotler <[hidden email]> wrote:

> Most of the selections, like "console" for example, do not work when I click on them.
>
> On 03/27/2013 03:57 PM, Michael Gottesman wrote:
>> Hello LLVM Dev and Clang Dev!
>>
>> David Dean and I just finished bringing up a new build master on
>> lab.llvm.org <http://lab.llvm.org/>, llvmlab, which is located at the
>> url http://lab.llvm.org:8013 <http://lab.llvm.org:8013/> and is in #llvm
>> under the username llvmlab.
>>
>> llvmlab is different than the current buildbot based continuous
>> integration systems llvm uses; llvmlab is a phased builder based system.
>> The high level details of the phased builder system are as follows:
>>
>> 1. Builders are grouped together into phases as sub builders. Each phase
>> is a builder itself and triggers said grouped ``sub builders'' and
>> assuming that all of the sub builders complete, triggers its successor
>> phased builder if one exists. This creates a gating like effect.
>>
>> 2. All phases are gated together in a linear fashion, yielding via the
>> linearity the phenomena that if an earlier phase does not succeed, then
>> no later phases run. The key idea here is if we know that there is a
>> fundamental issue with the compiler why try to build 20 compilers, when
>> performing one quick build is all that is needed to ascertain such a
>> fact? Also if we can not build a compiler successfully, why try to do
>> LNT performance runs? This gets rid of pointless work, stops excessive
>> emails from being sent out for 1 bad commit, and reduces cycle time
>> especially if certain builds take significantly longer than others to fail.
>>
>> 3. Later phases do broader, longer lasting testing than earlier phases.
>> Thus the 4 phases we currently have are:
>>
>> a. Phase 1 (sanity):          Phase 1 is a quick non-bootstrapped,
>> non-lto compiler build, to check the ``basic sanity'' of the code base
>> and build process. This generally takes 15-20 minutes to complete.
>> b. Phase 2 (living on):      Phase 2 builds bootstrapped compilers for
>> the different configurations that can be used for ``living on'', i.e.
>> good enough for common compilation tasks. This is meant to cycle in up
>> to an hour.
>> c. Phase 3 (tree health):  Phase 3 runs performance tests, i.e., LNT
>> (which are not live yet) as well as other compiler builds which take a
>> longer amount of time (a full clang build with LTO enabled for instance).
>> d. Phase 4 (validation):   Phase 4 runs longer running validation tests.
>> Currently we have nothing in phase 4, but I am sure that will change = p.
>>
>> 4. Builders in later phases rely on outputs from earlier phases. If we
>> are doing performance runs, why should we compile a new compiler for
>> such a performance run? This is duplicated work! Instead the phased
>> build system stores ``artifacts'', i.e., built compilers, in earlier
>> phases and uses them as compilers for builds in later phases. Thus we
>> could have 30 different Phase 3 LNT performance runs with different
>> configurations all using the same compiler artifacts built in phase 2.
>> This significantly deduplicates work yielding a decreased cycle time.
>>
>> As time moves on we will be moving more and more builders to llvmlab
>> including LNT performance builders and a builder which runs the libcxx
>> test suite.
>>
>> Michael
>>
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> [hidden email]
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [cfe-dev] llvmlab (phased buildmaster) is in production mode!

Michael Gottesman
In reply to this post by Chandler Carruth-2
On Mar 27, 2013, at 6:41 PM, Chandler Carruth <[hidden email]> wrote:


On Wed, Mar 27, 2013 at 3:57 PM, Michael Gottesman <[hidden email]> wrote:
3. Later phases do broader, longer lasting testing than earlier phases. Thus the 4 phases we currently have are:

a. Phase 1 (sanity):          Phase 1 is a quick non-bootstrapped, non-lto compiler build, to check the ``basic sanity'' of the code base and build process. This generally takes 15-20 minutes to complete.

While most of this sounds great, this one really doesn't.

The sanity tests should be able to run *much* faster than 15-20 minutes.

Agreed. I think getting the sanity time down further is possible and very important. IIRC gribozavr has a ninja cmake bot that does clean builds in < 10 minutes. I think that that is a first step (bringing me to your question).

Can we prioritize getting an incremental rebuild bot as the sanity phase on reasonably fast hardware?

Incremental builds are quicker but less robust than clean builds. The nice thing about always doing clean builds is that it simplify things by throwing out the potential of any build failures due to a dirty build directory (but maybe I am paranoid). If we want to do incremental builds (which note I am not averse to btw), we should at least set up some manner to clean the build directory if we get a build failure due to a dirty build directory that does not involve sshing into the machine. Perhaps if a builder fails a number of times in a row, clean it?

Also, hardware contributions are always welcome = p.

Michael

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [cfe-dev] llvmlab (phased buildmaster) is in production mode!

Reid Kleckner-2
On Wed, Mar 27, 2013 at 10:09 PM, Michael Gottesman <[hidden email]> wrote:
On Mar 27, 2013, at 6:41 PM, Chandler Carruth <[hidden email]> wrote:


On Wed, Mar 27, 2013 at 3:57 PM, Michael Gottesman <[hidden email]> wrote:
3. Later phases do broader, longer lasting testing than earlier phases. Thus the 4 phases we currently have are:

a. Phase 1 (sanity):          Phase 1 is a quick non-bootstrapped, non-lto compiler build, to check the ``basic sanity'' of the code base and build process. This generally takes 15-20 minutes to complete.

While most of this sounds great, this one really doesn't.

The sanity tests should be able to run *much* faster than 15-20 minutes.

Agreed. I think getting the sanity time down further is possible and very important. IIRC gribozavr has a ninja cmake bot that does clean builds in < 10 minutes. I think that that is a first step (bringing me to your question).

Can we prioritize getting an incremental rebuild bot as the sanity phase on reasonably fast hardware?

Incremental builds are quicker but less robust than clean builds. The nice thing about always doing clean builds is that it simplify things by throwing out the potential of any build failures due to a dirty build directory (but maybe I am paranoid). If we want to do incremental builds (which note I am not averse to btw), we should at least set up some manner to clean the build directory if we get a build failure due to a dirty build directory that does not involve sshing into the machine. Perhaps if a builder fails a number of times in a row, clean it?

It's common to have buttons on the builder page that force the next build to be clean.  Usually they have some kind of auth mechanism.  Maybe you can get by with something simple.

Also, hardware contributions are always welcome = p.

Michael

_______________________________________________
cfe-dev mailing list
[hidden email]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev



_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [cfe-dev] llvmlab (phased buildmaster) is in production mode!

Joerg Sonnenberger
In reply to this post by Michael Gottesman
On Wed, Mar 27, 2013 at 10:09:59PM -0700, Michael Gottesman wrote:
> Incremental builds are quicker but less robust than clean builds.

Can you do add clean-after-error behavior? I.e. default to incremental
build, but do a clean rebuild after a failing build?

Joerg
_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [cfe-dev] llvmlab (phased buildmaster) is in production mode!

Michael Gottesman
(I said the same thing in my last email = p, except a bit more conservative: do a clean build after 3 failures or something along those lines).

Michael

On Mar 28, 2013, at 6:16 AM, Joerg Sonnenberger <[hidden email]> wrote:

> On Wed, Mar 27, 2013 at 10:09:59PM -0700, Michael Gottesman wrote:
>> Incremental builds are quicker but less robust than clean builds.
>
> Can you do add clean-after-error behavior? I.e. default to incremental
> build, but do a clean rebuild after a failing build?
>
> Joerg
> _______________________________________________
> LLVM Developers mailing list
> [hidden email]         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reply | Threaded
Open this post in threaded view
|

Re: [cfe-dev] llvmlab (phased buildmaster) is in production mode!

David Blaikie


On Mar 29, 2013 5:49 AM, "Michael Gottesman" <[hidden email]> wrote:
>
> (I said the same thing in my last email = p, except a bit more conservative: do a clean build after 3 failures or something along those lines).

Right, but the difference is significant:

Clean on failure would mean we would never see a build bot failure from a bad incremental rebuild. Rather than getting a couple of builds worth of noise.

It has the disadvantage that we'd increase latency of any failure result, unfortunately.

Personally I think that any incremental build failure should be a bug that we fix, which should make the not go green without ever having to force clean it. Is there a reason this isn't possible or desirable?

> Michael
>
> On Mar 28, 2013, at 6:16 AM, Joerg Sonnenberger <[hidden email]> wrote:
>
> > On Wed, Mar 27, 2013 at 10:09:59PM -0700, Michael Gottesman wrote:
> >> Incremental builds are quicker but less robust than clean builds.
> >
> > Can you do add clean-after-error behavior? I.e. default to incremental
> > build, but do a clean rebuild after a failing build?
> >
> > Joerg
> > _______________________________________________
> > LLVM Developers mailing list
> > [hidden email]         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
> _______________________________________________
> cfe-dev mailing list
> [hidden email]
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev


_______________________________________________
LLVM Developers mailing list
[hidden email]         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev