Developers:BuildBot

From OctopusWiki
Revision as of 22:48, 30 June 2007 by Lorenzen (talk | contribs) (using an external editor)
Jump to navigation Jump to search

Generalities

BuildBot is a piece of software to automate software builds and test. It operates a server that triggers jobs on a number of slaves. These slaves may be running on the same machine as the server but also on different ones. This opens the possibility to compile and check on different architectures and operating systems.

We now have BuildBot running on www.tddft.org. Currently, it is configured for two cases:

  • It tracks trunk of the Subversion repository and on each commit, it triggers several slaves to do a quick configure and compilation (without optimizations) of the code. This way, we get an instant response if a commit broke something in a different environment.
  • Each night, the code is build in different settings (with a lot of debugging turned on) and the testsuite is run.

For a failing build, an e-mail is send to octopus-notify@tddft.org as with the old nightly builds. This e-mail contains all output of the commands which have been run by the slave. In additon to this, there is also a so-called waterfall page in our Trac that lists all BuildBot activities, and that can be used to have a look at the build and test results.

Installation of BuildBot

To install BuildBot the packages

and Python, at least version 2.3.

Either install thise software via your distribution's package system or simply by hand: After unpacking run

$ python ./setup.py bdist
$ python ./setup.py install --prefix=DIR

in the package's source directory. This will install the files in DIR in a hierarchy usually found below /usr/lib/python. Note, that it might be necessary to set the PYTHONPATH environment variable to DIR/lib/pythonX.Y/site-packages (no slash at the end) that the Python system can find the libraries.

Setup of the BuildBot master

The BuildBot master is installed in /home/lorenzen/opt/python on www.tddft.org and also runs under my account. Its configuration and data files are in /server/www/tddft.org/programs/octopus/buildbot-master so that all developers have access to the configuration and can add new slaves (see below).

The Trac plugin (http://www.trac-hacks.org/attachment/ticket/339/TracBB-0.1.2.tar.gz), I had to hack it a little bit to make it work. There are still some glitches but for simply browsing the waterfall it works.

Setup of a BuildBot slave

There are two step to set up a new slave:

  • install BuildBot on the machine in question, and
  • add the appropriate entries in the master configuration.

Installing a slave

Run the steps to install BuildBot as described above. Then run

$ DIR/bin/buildbot create-slave BASEDIR MASTER NAME PASSWORD

with BASEDIR being the directory where all the builds of this slave take place, MASTER being www.tddft.org:8898 in our case, NAME and PASSWORD are from the master configuration (see below).

As last step, you have to start the slave:

$ DIR/bin/buildbot start BASEDIR

If you want to be sure, that it is restarted when the machine reboots, the simplest thing as a crontab entry like

@reboot DIR/bin/buildbot start BASEDIR

but this one only works of you have Vixie cron running, usually the case on Linux distributions but commonly not on System V (check out man 5 crontab.

Please also note, that the environment set up by cron might be very different from your login environment. On the slaves, I have set up, I had to include a line

PYTHONPATH=DIR/lib/python2.4/site-packages

in the crontab.

Registering a slave with the master

The entire configuration of the BuildBot master is done via the file master.cfg that lives in /server/www/tddft.org/programs/octopus/buildbot-master/. It is a Python file, so for those of you that unlike me know this language, it will be easy to understand what's going on.

The following steps have to be undertaken:

Add a slave entry to the c['bots'] dictionary. The dictionary looks like this at the moment:

c['bots'] = [("athos", "secret1"),   # AMD Opteron, Debian
             ("octopus", "secret2"), # AMD Opteron, Fedora
             ("pepita", "secret3"),  # Sparc, Solaris
             ("g22", "secret4")      # i386, Debian
             ]

After adding your new slave, it might look like

c['bots'] = [("athos", "secret1"),   # AMD Opteron, Debian
             ("octopus", "secret2"), # AMD Opteron, Fedora
             ("pepita", "secret3"),  # Sparc, Solaris
             ("g22", "secret4"),     # i386, Debian
             ("wheel", "secret5")    # Alpha, OSF1
             ]

with wheel being the slave's name and secret5 the password you give on the create-slave command line. The name of the slave should not contain dots (at least I had trouble with that) and thus should not be the fully qualified domain name but something else which is unique and understandable.

Next, you have to add a builder, that knows how to compile (and test if you want) the code on the new slave. Go to the HOW TO BUILD THE CODE section of the file and add an entry like

x86_64_gfortran_build = octopusVpathBuild(
        var={"FC" : "/home/lorenzen/opt/gcc-4.2.0/bin/gfortran",
             "FCFLAGS" : "-Wall -I/home/lorenzen/opt/gfortran-4.2.0/netcdf/include"},
        flags=["--with-blas=-lblas -L/home/lorenzen/opt/gfortran-4.2.0/blas/lib",
               "--with-lapack=-llapack -L/home/lorenzen/opt/gfortran-4.2.0/lapack/lib",
               "--with-fft-lib=-lfftw3 -L/home/lorenzen/opt/gfortran-4.2.0/fftw/lib",
               "--with-sparskit=-lskit -L/home/lorenzen/opt/gfortran-4.2.0/sparskit/lib",
               "--with-arpack=-larpack -L/home/lorenzen/opt/gfortran-4.2.0/arpack/lib",
               "--with-netcdf=-lnetcdf -L/home/lorenzen/opt/gfortran-4.2.0/netcdf/lib",
               "--with-gsl-prefix=/home/lorenzen/opt/gsl",
               "--disable-gdlib"])

The name of the builder x86_64_gfortran_build and the octopusVpathBuild created a builder that uses VPATH capabilities of make to find more bugs in the build system (octopusVpathBuild is actually just a Pyhton function defined above the current section). The nomencalture for builders is arch_compiler_options_type with

  • arch being the target architecture like sparc, i386, ppc,
  • compiler the Fortran 90 compiler used like gfortran, ifort, nag, pgi, g95, sunf90,
  • options special build characteristics like mpich2, and
  • type the kind of build, build and test for the moment. build only compiles and links the code, test additionally runs the testsuite.

The two keyword arguments var and flags are for the configure invocation:

Setting the arguments to

var={"A" : "a", "B" : "b"},
flags=["--with-bar=/usr/lib/bar",
       "--without-foo"]

results into the configure line

$ A=a B=b ./configure --with-bar=/usr/lib/bar --without-foo

In order to run the testsuite, it might be sufficient to add the line

octopusAddTest(x86_64_gfortran_build)

This causes a make -C testuite check to be issued after the compilation. For special builds and environment, this might not be enough, e. g. MPI builds which require additional setup.

For these cases, the testing step can be added by

x86_64_gfortran_mpich2_test.addStep(shell.ShellCommand,
                                    description="testing",
                                    descriptionDone="test",
                                    command="cd _build && COMMAND")

with COMMAND being a shell command line that does the required steps. The cd _build is to get into the build tree is necessary because a VPATH build is being performed. The exit code of COMMAND determines the testresult. So, if you append several command by && or ; be sure that the correct exit code is returned, e. g. by

command="cd _build && do_preparations && do_tests; err=$?; do_cleanup; exit $?"

After specifying the build, you have to say in the AND WHERE section which slave shall perform the build. Add an entry

bot_BUILDNAME = {'name' : "BUILDNAME",
                 'slavename' : "SLAVE",
                 'builddir' : "BUILDNAME",
                 'factory' : BUILDNAME}

If several slaves are able to do the build instead of 'slavename' : "SLAVE" a list 'slavenames' : ["SLAVE1", "SLAVE2"] can be given. This allows for some load-balancing in the scheduler. The bot_BUILDNAME has to be added to the c['builders'] list:

c['builders'] = [bot_x86_64_gfortran_build, 
		 bot_x86_64_ifort_build,

		 bot_x86_64_gfortran_mpich2_build, 
		 bot_x86_64_ifort_mpich2_build,
		 bot_sparc_sunf90_build,
		 bot_x86_64_ifort_test,
		 bot_x86_64_ifort_mpich2_test,
		 bot_i386_gfortran_test,
                 bot_BUILDNAME
		 ]

As a last step, the build has to be added to one or more schedulers. Scheduler are defined in the SCHEDULERS section and are responsible for actually triggering the builds. Currently, there is the buildonly scheduler that triggers rather quick compilations on each commit and the nightly scheduler to which all BUILDNAME_tests go. Simply add your build to the builderNames entru of the scheduler specification.

After saving the configuration file, it takes a little while until the master rereads its configuration, up to an hour because of the @hourly crontab entry I have set, until the new build appears on the waterfall page. If it does not appear, have a look at the twistd.log file in the master's directory, perhaps you rendered the Python file invalid. In such cases, the master simply ignores the change in configuration and continues with the old one.


List of slaves

We currently have the following slaves running:

Name Architecture OS Location
g22 i386 Debian/GNU Linux FU Berlin
pepita sparc Sun Solaris 10 TU Berlin
octopus x86_64 Debian/GNU Linux FU Berlin