Tests4Py: A Benchmark for System Testing

Marius Smytzek,Martin Eberlein,Batuhan Serce,Lars Grunske,Andreas Zeller
2024-05-14
Abstract:Benchmarks are among the main drivers of progress in software engineering research. However, many current benchmarks are limited by inadequate system oracles and sparse unit tests. Our Tests4Py benchmark, derived from the BugsInPy benchmark, addresses these limitations. It includes 73 bugs from seven real-world Python applications and six bugs from example programs. Each subject in Tests4Py is equipped with an oracle for verifying functional correctness and supports both system and unit test generation. This allows for comprehensive qualitative studies and extensive evaluations, making Tests4Py a cutting-edge benchmark for research in test generation, debugging, and automatic program repair.
Software Engineering
What problem does this paper attempt to address?
The problem this paper attempts to address is the limitations of current benchmarks in software engineering research, particularly in the areas of system testing and unit test generation. Specifically: 1. **Insufficiency of System Testing**: Many existing benchmarks rely on general system testing standards, such as crash detection. While these standards have some utility in evaluating program security, they are limited in their effectiveness at uncovering functional defects. 2. **Insufficiency of Unit Testing**: Existing benchmarks typically include unit tests with fixed inputs and lack interfaces to integrate generated test inputs. This limits the combined use of test generators and automatic repair tools. To address these issues, the authors propose a new benchmark called Tests4Py. The main features of Tests4Py include: - **Rich Test Cases**: It includes 73 defects from 7 real-world Python applications and 6 defects from 6 sample programs. - **Test Diversity**: Each test object is equipped with an oracle to verify functional correctness, supporting the generation of both system tests and unit tests. - **Comprehensive Testing Environment**: It provides interfaces for the generation of system tests and unit tests, enabling extensive qualitative and quantitative research. With these improvements, Tests4Py aims to become an advanced benchmark in the research of test generation, debugging, and automatic program repair.