Understanding and Predicting Docker Build Duration: an Empirical Study of Containerized Workflow of OSS Projects.

Yiwen Wu,Yang Zhang,Kele Xu,Tao Wang,Huaimin Wang
DOI: https://doi.org/10.1145/3551349.3556940
2022-01-01
Abstract:Docker building is a critical component of containerized workflow, which automates the process by which sources are packaged and transformed into container images. If not run properly, Docker builds can bring long durations (i.e., slow builds), which increases the cost in human and computing resources, and thus inevitably affect the software development. However, the current status and remedy for the duration cost in Docker builds remain unclear and need an in-depth study. To fill this gap, this paper provides the first empirical investigation on 171,439 Docker builds from 5,833 open source software (OSS) projects. Starting with an exploratory study, the Docker build durations can be characterized in real-world projects, and the developers’ perceptions of slow builds are obtained via a comprehensive survey. Driven by the results of our exploratory study, we propose a prediction modeling of Docker build duration, leveraging 27 handcrafted features from build-related context and configuration and 8 regression algorithms for the prediction task. Our results demonstrate that Random Forest model provides the superior performance with a Spearman’s correlation of 0.781, outperforming the baseline random model by 82.9% in RMSE, 90.6% in MAE, and 94.4% in MAPE, respectively. The implications of this study will facilitate research and assist practitioners in improving the Docker build process.
What problem does this paper attempt to address?