Dockerfile Changes in Practice: A Large-Scale Empirical Study of 4, 110 Projects on GitHub.

Yiwen Wu,Yang Zhang,Tao Wang,Huaimin Wang
DOI: https://doi.org/10.1109/apsec51365.2020.00033
2020-01-01
Abstract:Docker is one of the most popular containerization tools in current DevOps practice. Particularly, Dockerfile plays an important role in the Docker-based software development process by specifying the commands and build environment of Docker containers. As a project progresses through its development stages, the content of the Dockerfile may be revised many times. Previous studies have examined Dockerfile usage in open-source projects. However, little is known about the details of Dockerfile changes in practice. In this paper, we conduct an empirical study on Dockerfile changes for 4,110 open-source projects hosted on GitHub. Based on the Dockerfile data, we measure the frequency, magnitude, and instructions of Dockerfile changes and report how Dockerfile co-changed with other files. To explore the relationship between Dockerfile changes and project outcomes, i.e., popularity, success, and productivity, we also develop regression models, by controlling for various confounds. Our findings help to characterize and understand Dockerfile changes and motivate the need for collecting more empirical evidence.
What problem does this paper attempt to address?