12 July 2005
About a month ago, I started work on a new project: Bitten is a continuous integration system built on top of Trac, but with a twist: its´ focus is to enable the central collection of software metrics and other data produced by builds: unit tests, code coverage, style checks, etc. Unlike with similar systems this information isn't generated once and thrown away; rather, it is collected in a central repository so that the evolution of the project can be monitored.
I'm really excited to say that this project has been accepted for Googles´ Summer of Code program. The mentoring organization is the PSF, the mentors are Greg Wilson and Trent Mick. In total, the PSF selected 19 of the proposals, listed here.
Anyway, here are some snippets from my proposal. First, the motivation for the project:
Today´s automated builds generate a tremendous volume of information about the state of software development projects. This starts with basic status indicators such as compilation errors and test failures, but is increasingly extended to include advanced software metrics such as dependency analysis, code coverage analysis or style checking.
Traditionally, continuous integration systems such as Gump, Tinderbox and BuildBot only record and display the data that the build system prints to the standard output and error streams. Thus all the information about a code base generated by the build often cannot be used to full extent.
To effectively provide value for the ongoing development and management of a project, data generated by builds needs to be collected in a central repository, and in a machine-readable format, to allow for analysis and presentation of the data even long after the actual build has been run. In addition to being able to adjust how data is analyzed and presented in retrospect, this approach is essential for historical reports that show how specific metrics are evolving over time – which is often more valuable than the absolute values of these metrics at one specific point in time.
Bitten is a distributed CI system, meaning that a central build master delegates the actual work of building a project to any number of connection build slaves. This is important for projects that need to be built on different platforms.
The proposal also has some implementation notes:
The build slave script is intended to be light-weight and have a minimum number of dependencies. It should not depend on Trac or frameworks such as Twisted. In a nutshell, every machine that has the necessary tool chain installed to perform the build itself should be able to perform builds without requiring the installation of any additional software apart from the Bitten client. Furthermore, while the default Bitten client will be implemented in Python, master and slave should be sufficiently decoupled to allow the use of an alternate client implementation (for example one written in Java).
To decouple the master and slave, an application protocol will be defined on top of the meta-protocol BEEP (Blocks Extensible Exchange Protocol, RFC 3080). BEEP was chosen because it provides peer-to-peer communication (so that both the client and the server can send requests to the other) and because of its relative simplicity compared to protocols such as XMPP.
Another important implementation consideration is how the collected data is to be stored in the central repository. The data generated by automated builds can almost always be mapped to the physical and/or the logical view of the code base, where the physical view corresponds to files and line numbers, while the logical view is composed of units such as packages, classes and functions. Specific metrics basically annotate either view with the extracted information. The attributes of the annotation, however, depend entirely on the type of metric. As this does not easily map to relational databases, the use of alternatives such as an XML database (for example Sleepycats' DBXML) needs to be considered.
Bitten has reached milestone 0.2 a couple of days ago, which means that there is a basic client/server infrastructure for running builds, and there is a simple user interface integrated with Trac that displays the status and results of builds. You can check this out on the Bitten site, where I run the very latest versions of both Trac and Bitten. While the work up to this point was mostly the design and implementation of a traditional CI system, from now on I'll concentrate on the data collection functionality.