Beyond Accuracy: An Empirical Study on Unit Testing in Open-source Deep Learning Projects

Han Wang,Sijia Yu,Chunyang Chen,Burak Turhan,Xiaodong Zhu
DOI: https://doi.org/10.1145/3638245
2024-02-26
Abstract:Deep Learning (DL) models have rapidly advanced, focusing on achieving high performance through testing model accuracy and robustness. However, it is unclear whether DL projects, as software systems, are tested thoroughly or functionally correct when there is a need to treat and test them like other software systems. Therefore, we empirically study the unit tests in open-source DL projects, analyzing 9,129 projects from GitHub. We find that: 1) unit tested DL projects have positive correlation with the open-source project metrics and have a higher acceptance rate of pull requests, 2) 68% of the sampled DL projects are not unit tested at all, 3) the layer and utilities (utils) of DL models have the most unit tests. Based on these findings and previous research outcomes, we built a mapping taxonomy between unit tests and faults in DL projects. We discuss the implications of our findings for developers and researchers and highlight the need for unit testing in open-source DL projects to ensure their reliability and stability. The study contributes to this community by raising awareness of the importance of unit testing in DL projects and encouraging further research in this area.
Artificial Intelligence,Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **Are deep learning (DL) projects, as software systems, adequately unit - tested in terms of ensuring their functional correctness and stability?** Specifically, the paper focuses on the following aspects: 1. **The role of unit testing in open - source DL projects**: - Does unit testing help improve the quality of open - source DL projects, such as higher GitHub popularity metrics, better project management, and a higher pull - request acceptance rate? 2. **The extent of the application of unit testing in current DL projects**: - How many DL projects have been unit - tested? - Which testing frameworks do developers use? - What is the coverage rate of unit testing in DL projects? 3. **Which parts and properties in DL projects are tested**: - Which components (such as layers, loss functions, optimizers, etc.) are most frequently tested? - What properties (such as input/output ranges, error handling, performance metrics, etc.) are included in the test content? Through these questions, the paper aims to fill the existing research gap regarding unit testing in DL projects and provide guidance for developers and researchers to improve the reliability and stability of DL projects. ### Main contributions 1. **For the first time, analyzed the role of unit testing in open - source DL projects**: - Research shows that projects with unit testing are more popular on GitHub, have better project management, and a higher pull - request acceptance rate. 2. **Established a taxonomy of unit testing in DL projects**: - It includes unit types, test properties, and assertion statements, helping developers write more effective test cases. 3. **Systematically collected a data set of open - source DL projects for unit testing**: - This data set can inspire other related research, such as automated test - case generation, vulnerability repair, etc. ### Methodology - **Data collection**: Collected 9,129 open - source DL projects from GitHub, and screened out 2,878 projects containing unit - test scripts from them. Further randomly sampled 400 projects, and finally obtained 363 projects that meet the minimum sample size requirements. - **Data analysis**: Evaluated the impact of having or not having unit testing on projects by quantitatively analyzing basic metrics on GitHub (such as issues, pull requests, contributors, stars, and forks), as well as project size (KLOC). - **Classification and evaluation**: Developed two automatic classifiers to identify unit types and test properties in DL projects, and verified the accuracy of the classifiers through manual inspection. In conclusion, this paper emphasizes the importance of unit testing in DL projects and provides valuable insights and tools for future research and practice.