Author Archive

Incremental Building with Maven and TeamCity

Wednesday, March 14th, 2012

Since version 7.0 TeamCity offers native support for module-based incremental building possible with some build system like Maven and Gradle. Thanks to the ability to track developers’ changes in the project files, TeamCity is now able to build and test only those parts of systems affected by the changes.

The overview of this feature is available here, which is recommended for reading by those not familiar with the concept of incremental building.

This article describes in detail the technical aspects and difficulties of implementing incremental building in TeamCity Maven runner, and can be useful for TeamCity users extensively working with Maven.

Maven own abilities

Maven has some limited support for incremental builds. In Maven architecture all build procedures (compilation, testing, code generation, etc.) are performed by plugins. Plugins are also responsible for making decisions whether results of their previous work are up to date and can be used in the current execution. At the moment of writing this I know only one Maven plugin that actually performs such analysis – Maven compile plugin. Before compiling a .java file it looks into the target directory and checks if the corresponding .class file exists and its modification time is greater than the modification time of the source file. If it’s true, the source file is considered not requiring recompilation.

Apart from problems with this trivial approach (it’s enough to mention that Maven compiler plugin doesn’t detect deletion of source files keeping the obsolete class files in place) incremental compilation is definitely not enough. I don’t think it would be too far from truth if I say that in most projects the longest build phase is running tests, not compilation.

Maven Surefire plugin (which is responsible for running tests by default in the Maven ecosystem) doesn’t provide any file-level incrementality similar to the compiler plugin.

But how about the module based dependency analysis? In the theoretic example from the previous post we want to execute tests for modules X, C, D, E. Can Maven employ the technique described above to achieve the desired effect? The answer is — yes, to some extent.

For Maven 2.x prior to 2.1.0 this can be done using the Maven reactor plugin. In later Maven 2 and Maven 3 releases this mechanism is embedded into the core. Without digging into the details, the essence of this feature is that you provide the list of modules of interest in the command line and depending on additional switches Maven builds them alone, with upstream modules, or with downstream modules.

Maven 3 command line would look like this:

mvn test --projects X --also-make-dependents

or like this with short key versions:

mvn test -pl X -amd

This will execute tests in modules X, C, D, E sequentially going down the dependency graph starting from module X. (One important note: to have this build succeed one need all the dependencies of X — A and B — to be installed into the local repository or deployed into a remote repository. Otherwise Maven won’t resolve these dependencies, since they won’t be included in build reactor)

So it looks like all TeamCity has to do is translate a build change list into a module list and run mvn -pl --amd, right? Not really. There are some problems making things not that simple.

The first problem is that when analyzing dependency graph Maven doesn’t take into account the parent relationship, which is a kind of dependency. Changes in the parent pom indeed affect all child projects. But if you try to specify parent pom in the --project list you’ll get a build error “Couldn’t find specified project in module list”, because Maven allows placing there only modules producing artifacts.

The second problem is that Maven doesn’t distinguish dependency scopes. Is there a difference between a “compile” dependency and a “test” dependency in terms of change impact analysis? The first one is transitive, while the second one is not.

Let’s consider another example:

In this project there are 4 modules. Subsystem1 and Subsystem2 depend on CommonUtil, which provides a common use library with utility methods. CommonUtil contains tests, which use classes from MyTestFramework. This is not a production dependency. It doesn’t affect CommonUtil production code, so a change in MyTestFramework affects only tests in CommonUtil and does not Subsystem1 and Subsystem2. To test this change we need to run tests only in MyTestFramework and CommonUtil.

Maven doesn’t take this into account. If you put MyTestFramework in the --project list it will run tests in all 4 modules.

Yet another problem is also related to limiting the dependency scope. Assume that in the last example we’ve changed only tests in CommonUtil. Obviously, to test this change we need to run tests in CommonUtil only. Unfortunately we can’t tell Maven not to go down the dependency graph deeper for this module. One may argue that in this case we can simply omit the --also-make-dependents switch, and Maven will build only this module. But a real change list is usually a mix of test and production modifications, often in more than one module. So we cannot apply the --also-make-dependents switch to some modules in the list and not apply to the others. At least, we cannot do this in a single Maven execution.

The final issue isn’t related directly to Maven. It’s rather an intrinsic issue of the distributed architecture of TeamCity – it cannot guarantee that the results of the previous build are available for the current build, because the older build might have run on a different agent machine, and Maven rightly doesn’t care about it.

Let’s get back to the first example. TeamCity can schedule this build to a fresh agent on which this project has never been built. Since this build is incremental, Maven starts building modules X, C, D, E. But when building module X it immediately fails because modules A and B, which X depends on, simply haven’t been compiled.

This means that before running tests for some set of modules we should build their own dependencies and install them into a local repository. It effectively means running one goal for one module set and another goal for another module set in a single execution, which Maven with its current architecture is unable to do.

With all said above it becomes clear that TeamCity cannot rely on Maven’s --also-make-dependents feature. So, how does it work in TeamCity?

What TeamCity adds

First of all TeamCity implements its own change impact analysis algorithm for determining the set of affected modules, taking into account parent relationships and different dependency scopes. Second, TeamCity performs a special preliminary phase for making dependencies of the affected modules.

The build is split into two sequential Maven executions.

The first Maven execution called preparation phase is intended for building the dependencies of affected modules and installing them into the local repository. This ensures our snapshot dependencies are (1) available and (2) up-to-date. Thus the preparation goal is always “install”. But we don’t want tests to be executed for dependencies, which aren’t affected by changes, so the command line for the preparation phase is accompanied with the -DskipTests switch. The preparation phase is to assure there will be no compiler or other errors during the second execution caused by absence or inconsistency of dependency classes.

The second Maven execution called main phase is pretty clear. It executes the main goal “test” (in our case), thus performing only those tests affected by the change.

Effectively it’s transformed into the following Maven 3 commands:

mvn install --projects A,B -DskipTests

mvn test --projects X,C,D,E

For Maven 2.x the command lines looks like this:

mvn install -N -r -Dmaven.reactor.includes=A,B -DskipTests

mvn test -N -r -Dmaven.reactor.includes=A,B

Finally, after the main execution TeamCity removes the installed dependency artifacts from the local repository to avoid possible interference in future builds.

Enabling Maven incremental building

The best part of the whole story is how to enable incremental building for Maven in TeamCity, because it’s done with one click. Just turn on the “Build only projects affected by changes” check box in the Maven runner configuration UI:

Enjoy!

Incremental testing with TeamCity

Tuesday, March 13th, 2012

When you build a large project, it is often important that previously built components that are still up-to-date are not rebuilt. Otherwise, if all targets are built every time, each build will take a long time to complete.

The longer it takes to make a build, the later you get feedback on your changes, the more difficult it is to setup reliable and meaningful Continuous Integration process.

And the faster your builds are, the higher the rate at which your changes are integrated, the less is the feedback time, and the more value you get from your Continuous Integration.

TeamCity 7.0 introduces a new feature: support for incremental builds for IntelliJ IDEA, Gradle and Maven projects, primarily aimed at incremental testing.

Overview

Most of the changes developers make to the source code don’t affect the whole system. The better components and their dependencies are organized within the project, the more localized the impact of a change will be.

Incremental builds are all about re-building only those parts of project which have been changed or affected by the changes, leaving the rest intact. It’s the same good old concept of incremental compilation based on a file timestamp, taken one step further to affect all other build stages, including packaging, automated testing, etc. Especially testing, because automated tests may take really long time.

In most of the big projects with large amount of tests, building everything each time we change something could be a huge waste of time. Granted, from time to time we do need full builds, as we do need full re-compile. That’s because incremental builds are less reliable, less informative, since they show only a part of the whole picture, and they may accumulate errors. Still, in most cases, full builds are not required, and incremental builds can be real time saver for the cost of a bit lesser reliability. That’s especially true for well-organized project, where tests are properly distributed among modules.

Incremental Builds in TeamCity

TeamCity knows about a change when the file changes in the version control. After all changes are collected for the build, TeamCity calculates which parts of the project are affected by the changes based on the module dependencies.

TeamCity uses modules as a level of granularity for making incremental builds. Why did we choose modules? Here are just some of the few reasons:

  1. Modules provide easy to analyse project structure.
  2. Modules are the first class citizens in most of the modern build systems and IDEs (not in all of them they’re called ‘modules’ though).
  3. Modules have clearly defined dependencies, in contrast to those between language entities, some of which can be revealed only at the runtime (like interface implementations created by reflection).
  4. As a result of 3., module dependency graph can be easily extracted from project files and analyzed without understanding the language semantics.

Now that we have a module dependency graph, all we need is to map the changed file to a module. This is an easy task, given that the module structure is (usually) unambiguously reflected in the project file structure.

Theoretic example

Lets consider a project consisting of eight modules with dependencies between them shown on the picture below.

When we detect change in file “src/x.java“, we can easily identify that it belongs to module X. Based on this knowledge, and the knowledge of module dependencies, we can quickly discover all other modules affected by the changes, which are in our case: C, D and (transitively) E. We can also assume that modules A, B, F and G stay unaffected.

If we only re-build modules X, C, D, and E now, the odds are high that we’ll get the very same result as with re-building the whole project from scratch, but with much less time spent.

TeamCity is primarily focused on making incremental testing possible. And that is for the reason: from our own experience, and in most of our clients’ installations, automated testing is usually the longest part of a build. Nevertheless, same approach may apply for other build stages. For instance, with Maven incremental builds, you only deploy to remote repository SNAPSHOT artifacts, affected by the change.

Real example with Spring Framework

To demonstrate how TeamCity users can benefit form incremental building we took a well known open source project Spring Framework (http://www.springsource.org/), which builds with Gradle, as an example.

Let’s consider a small source code modification. In our case, we changed the ReflectionTestUtils.java in spring-test module, which has two dependent modules: spring-struts and spring-aspects.

The full build of Spring Framework upon our change runs over 10361 tests, and lasts for 4 min. 53 sec.

If we enable TeamCity incremental builds, we get the following results: only 805 tests are run, and the build takes 1 min. 12 sec.

What has happened?

For the full build, we have a usual Build Configuration set up within TeamCity. Each next run of this Build Configuration results in execution of the full set of tests no matter how big was the change since the previous build. This is generally not a bad thing except that it may take very long time to perform such builds.

As a result, over 10000 tests are run, almost 5 minutes are spent, and the build is RED: 16 test runs have failed.




Now let’s copy this Build Configuration and make only one modification in the runner settings – enable incremental building:



In case of Gradle, this setting tells TeamCity to execute Gradle’s buildDependents task for each module directly affected by the change. buildDependents is a magic Gradle standard task, which in a turn executes build task for the given module, and all the dependent modules, both directly and transitively.

With this new Build Configuration, only 805 tests were run, and the build took a little over 1 minute. Note that this time the build is GREEN. That’s because only those tests were executed, which are relevant to our change. This GREEN gives us clear indication that our change was OK, without cluttering with the rest of tests.

Was it worth the hustle?

For those thinking 3.5 minutes not a big deal I want to show a picture from our real life:



This is an excerpt from the build history of an incremental IntelliJ IDEA build configuration in Faradi (TeamCity 7.0 code name).

The build marked with red underline is a full build (because of a change in a core component), and it lasted for 2h:48m.

The build marked with green underline is an incremental build, which only contains small local changes, and lasted for less than 20 minutes.

That is a considerable difference (almost ten times!). Although most of the builds in this picture are not that quick, as their changes are often big and distributed, they still take much less than 2.5 hour to build, providing huge time saving.

Incremental personal builds

As was demonstrated before, incremental builds can save you a lot of time. Still, the biggest benefit incremental builds provide being combined with TeamCity personal builds.

For a personal build only your personal changes are taken into account. All required dependencies are built upon concurrently made changes, but tests will be run only for yours.

The screenshot below shows the results of my personal build with a trivial change in maven-runner module (don’t be surprised with the duration — those tests are quite slow). Regardless of the fact that one more change in completely different module (from Kirill Maximov) was included into the same build, only tests affected by my own personal changes were run.
Even if Kirill’s change breaks the full build, my personal build is still marked as GREEN. So, no cluttering with the changes of others.

When to use incremental builds

Incremental builds aren’t always good and helpful. Below is some general advice.

Incremental builds ARE NOT generally good for:

  • Release and deployment builds, where you have to make sure that the build is clean, and that you see the whole picture in the build results.
  • Less frequent tasks such as nightly builds, code inspections, heavy integration tests.
  • Projects with poor componentization.

Incremental builds ARE good for:

  • Build configurations supposed to be used for unit-testing of developers’ commits.
  • Personal builds.

Individual build tools specifics

TeamCity 7.0 supports incremental testing for IntelliJ IDEA, Maven and Gradle runners. And yes, we’re planning to enhance this list in future versions by including .NET runners.

There are some individual specifics of incremental building in different runners forced by underlying build system limitations.

IntelliJ IDEA Projects

At this moment IntelliJ IDEA project runner only supports running tests incrementally, and doesn’t support incremental compilation or any other build stages. It is still great because we feel running test is the most time consuming build activity in majority of software projects.

Gradle

Gradle can incrementally run any task, which is great. The problem is that it only calculates local file system changes, which is not usable for the distributed architecture of TeamCity. So the current implementation is limited to only executing Gradle’s buildDependents task.

This is still enough for most cases, because, as I mentioned earlier, this task executes the build task for every affected module. The build task in its turn executes a full set of build activities for a module.

Additionally, you can use standard Gradle ‘-x‘ command line parameter to exclude an arbitrary set of tasks. This is particularly helpful when you want to only run tests.

Maven

Maven is able to run any build activity (goal) for selected modules, but requires splitting a build into two phases with an intermediate installation of artifacts required by affected modules into the local repository. This brings in some overhead, making the incremental building feature not very usable (in terms of speed) for projects with small number of tests. If you want to know why Maven runner needs this you’re welcome to read a detailed post about Maven incremental building specifics that will follow shortly.

To be continued…