Validity Matters: Uncertainty‐Guided Testing of Deep Neural Networks
Zhouxian Jiang,Honghui Li,Rui Wang,Xuetao Tian,Ci Liang,Fei Yan,Junwen Zhang,Zhen Liu
DOI: https://doi.org/10.1002/stvr.1894
2024-08-24
Software Testing Verification and Reliability
Abstract:We conduct a large‐scale empirical study of 11 predictive uncertainty metrics for DNNs on their effectiveness in detecting valid and invalid test inputs and find three best‐performance metrics: PE‐DP, EN and EN‐DP. We propose an uncertainty‐guided deep learning test approach to improve the naturalness of the generated test inputs. Extensive experiments show that our approach can generate more valid test inputs than the baselines, and appropriate data augmentation techniques can further boost performance. Despite numerous applications of deep learning technologies on critical tasks in various domains, advanced deep neural networks (DNNs) face persistent safety and security challenges, such as the overconfidence in predicting out‐of‐distribution samples and susceptibility to adversarial examples. Thorough testing by exploring the input space serves as a key strategy to ensure their robustness and trustworthiness of these networks. However, existing testing methods focus on disclosing more erroneous model behaviours, overlooking the validity of the generated test inputs. To mitigate this issue, we investigate devising valid test input generation method for DNNs from a predictive uncertainty perspective. Through a large‐scale empirical study across 11 predictive uncertainty metrics for DNNs, we explore the correlation between validity and uncertainty of test inputs. Our findings reveal that the predictive entropy‐based and ensemble‐based uncertainty metrics effectively characterize the input validity demonstration. Building on these insights, we introduce UCTest, an uncertainty‐guided deep learning testing approach, to efficiently generate valid and authentic test inputs. We formulate a joint optimization objective: to uncover the model's misbehaviours by maximizing the loss function and concurrently generate valid test input by minimizing uncertainty. Extensive experiments demonstrate that our approach outperforms the current testing methods in generating valid test inputs. Furthermore, incorporating natural variation through data augmentation techniques into UCTest effectively boosts the diversity of generated test inputs.
computer science, software engineering