Abstract:Deep Learning (DL) models have rapidly advanced, focusing on achieving high performance through testing model accuracy and robustness. However, it is unclear whether DL projects, as software systems, are tested thoroughly or functionally correct when there is a need to treat and test them like other software systems. Therefore, we empirically study the unit tests in open-source DL projects, analyzing 9,129 projects from GitHub. We find that: 1) unit tested DL projects have positive correlation with the open-source project metrics and have a higher acceptance rate of pull requests, 2) 68% of the sampled DL projects are not unit tested at all, 3) the layer and utilities (utils) of DL models have the most unit tests. Based on these findings and previous research outcomes, we built a mapping taxonomy between unit tests and faults in DL projects. We discuss the implications of our findings for developers and researchers and highlight the need for unit testing in open-source DL projects to ensure their reliability and stability. The study contributes to this community by raising awareness of the importance of unit testing in DL projects and encouraging further research in this area.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **Are deep learning (DL) projects, as software systems, adequately unit - tested in terms of ensuring their functional correctness and stability?** Specifically, the paper focuses on the following aspects: 1. **The role of unit testing in open - source DL projects**: - Does unit testing help improve the quality of open - source DL projects, such as higher GitHub popularity metrics, better project management, and a higher pull - request acceptance rate? 2. **The extent of the application of unit testing in current DL projects**: - How many DL projects have been unit - tested? - Which testing frameworks do developers use? - What is the coverage rate of unit testing in DL projects? 3. **Which parts and properties in DL projects are tested**: - Which components (such as layers, loss functions, optimizers, etc.) are most frequently tested? - What properties (such as input/output ranges, error handling, performance metrics, etc.) are included in the test content? Through these questions, the paper aims to fill the existing research gap regarding unit testing in DL projects and provide guidance for developers and researchers to improve the reliability and stability of DL projects. ### Main contributions 1. **For the first time, analyzed the role of unit testing in open - source DL projects**: - Research shows that projects with unit testing are more popular on GitHub, have better project management, and a higher pull - request acceptance rate. 2. **Established a taxonomy of unit testing in DL projects**: - It includes unit types, test properties, and assertion statements, helping developers write more effective test cases. 3. **Systematically collected a data set of open - source DL projects for unit testing**: - This data set can inspire other related research, such as automated test - case generation, vulnerability repair, etc. ### Methodology - **Data collection**: Collected 9,129 open - source DL projects from GitHub, and screened out 2,878 projects containing unit - test scripts from them. Further randomly sampled 400 projects, and finally obtained 363 projects that meet the minimum sample size requirements. - **Data analysis**: Evaluated the impact of having or not having unit testing on projects by quantitatively analyzing basic metrics on GitHub (such as issues, pull requests, contributors, stars, and forks), as well as project size (KLOC). - **Classification and evaluation**: Developed two automatic classifiers to identify unit types and test properties in DL projects, and verified the accuracy of the classifiers through manual inspection. In conclusion, this paper emphasizes the importance of unit testing in DL projects and provides valuable insights and tools for future research and practice.

Beyond Accuracy: An Empirical Study on Unit Testing in Open-source Deep Learning Projects

There is Limited Correlation Between Coverage and Robustness for Deep Neural Networks

An Empirical Study on Correlation between Coverage and Robustness for Deep Neural Networks

Testing in the Evolving World of DL Systems:Insights from Python GitHub Projects

An empirical study of testing machine learning in the wild

On Reporting Performance and Accuracy Bugs for Deep Learning Frameworks: An Exploratory Study from GitHub

An Empirical Study Towards Characterizing Deep Learning Development and Deployment Across Different Frameworks and Platforms

Testing Deep Learning Models: A First Comparative Study of Multiple Testing Techniques

Toward Understanding Deep Learning Framework Bugs

Compatibility Issues in Deep Learning Systems: Problems and Opportunities

An Empirical Study on Deployment Faults of Deep Learning Based Mobile Applications

Practical Accuracy Evaluation for Deep Learning Systems Via Latent Representation Discrepancy.

Detecting Defects in Deep Learning Systems: a Survey

A comprehensive study on challenges in deploying deep learning based software

RobOT: Robustness-Oriented Testing for Deep Learning Systems

An Orchestrated Empirical Study on Deep Learning Frameworks and Platforms

DeepMutation: Mutation Testing of Deep Learning Systems

An Exploratory Study on Automatic Identification of Assumptions in the Development of Deep Learning Frameworks

Q uo T e : Quality-oriented Testing for Deep Learning Systems

QuoTe: Quality-oriented Testing for Deep Learning Systems

Audee: Automated Testing for Deep Learning Frameworks