LaForge: Always-Correct and Fast Incremental Builds from Simple Specifications

Charlie Curtsinger,Daniel W. Barowy
DOI: https://doi.org/10.48550/arXiv.2108.12469
2021-09-02
Abstract:Developers rely on build systems to generate software from code. At a minimum, a build system should produce build targets from a clean copy of the code. However, developers rarely work from clean checkouts. Instead, they rebuild software repeatedly, sometimes hundreds of times a day. To keep rebuilds fast, build systems run incrementally, executing commands only when built state cannot be reused. Existing tools like make present users with a tradeoff. Simple build specifications are easy to write, but limit incremental work. More complex build specifications produce faster incremental builds, but writing them is labor-intensive and error-prone. This work shows that no such tradeoff is necessary; build specifications can be both simple and fast. We introduce LaForge, a novel build tool that eliminates the need to specify dependencies or incremental build steps. LaForge builds are easy to specify; developers write a simple script that runs a full build. Even a single command like gcc src/*.c will suffice. LaForge traces the execution of the build and generates a transcript in the TraceIR language. On later builds, LaForge evaluates the TraceIR transcript to detect changes and perform an efficient incremental rebuild that automatically captures all build dependencies. We evaluate LaForge by building 14 software packages, including LLVM and memcached. Our results show that LaForge automatically generates efficient builds from simple build specifications. Full builds with LaForge have a median overhead of 16.1% compared to a project's default full build. LaForge's incremental builds consistently run fewer commands, and most take less than 3.08s longer than manually-specified incremental builds. Finally, LaForge is always correct.
Programming Languages,Software Engineering
What problem does this paper attempt to address?