A Large-Scale Evaluation of Automated Unit Test Generation Using EvoSuite
Gordon Fraser,Andrea Arcuri
DOI: https://doi.org/10.1145/2685612
IF: 3.685
2014-12-23
ACM Transactions on Software Engineering and Methodology
Abstract:Research on software testing produces many innovative automated techniques, but because software testing is by necessity incomplete and approximate, any new technique faces the challenge of an empirical assessment. In the past, we have demonstrated scientific advance in automated unit test generation with the E VO S UITE tool by evaluating it on manually selected open-source projects or examples that represent a particular problem addressed by the underlying technique. However, demonstrating scientific advance is not necessarily the same as demonstrating practical value; even if VO S UITE worked well on the software projects we selected for evaluation, it might not scale up to the complexity of real systems. Ideally, one would use large “real-world” software systems to minimize the threats to external validity when evaluating research tools. However, neither choosing such software systems nor applying research prototypes to them are trivial tasks. In this article we present the results of a large experiment in unit test generation using the VO S UITE tool on 100 randomly chosen open-source projects, the 10 most popular open-source projects according to the SourceForge Web site, seven industrial projects, and 11 automatically generated software projects. The study confirms that VO S UITE can achieve good levels of branch coverage (on average, 71% per class) in practice. However, the study also exemplifies how the choice of software systems for an empirical study can influence the results of the experiments, which can serve to inform researchers to make more conscious choices in the selection of software system subjects. Furthermore, our experiments demonstrate how practical limitations interfere with scientific advances, branch coverage on an unbiased sample is affected by predominant environmental dependencies. The surprisingly large effect of such practical engineering problems in unit testing will hopefully lead to a larger appreciation of work in this area, thus supporting transfer of knowledge from software testing research to practice.
computer science, software engineering