A White-Box Testing for Deep Neural Networks Based on Neuron Coverage.
Jing Yu,Shukai Duan,Xiaojun Ye
DOI: https://doi.org/10.1109/tnnls.2022.3156620
IF: 14.255
2022-01-01
IEEE Transactions on Neural Networks and Learning Systems
Abstract:With the introduction of neuron coverage as a testing criterion for deep neural networks (DNNs), covering more neurons to detect more internal logic of DNNs became the main goal of many research studies. While some works had made progress, some new challenges for testing methods based on neuron coverage had been proposed, mainly as establishing better neuron selection and activation strategies influenced not only obtaining higher neuron coverage, but also more testing efficiency, validating testing results automatically, labeling generated test cases to extricate manual work, and so on. In this article, we put forward Test4Deep, an effective white-box testing DNN approach based on neuron coverage. It is based on a differential testing framework to automatically verify inconsistent DNNs' behavior. We designed a strategy that can track inactive neurons and constantly triggered them in each iteration to maximize neuron coverage. Furthermore, we devised an optimization function that guided the DNN under testing to deviate predictions between the original input and generated test data and dominated unobservable generation perturbations to avoid manually checking test oracles. We conducted comparative experiments with two state-of-the-art white-box testing methods DLFuzz and DeepXplore. Empirical results on three popular datasets with nine DNNs demonstrated that compared to DLFuzz and DeepXplore, Test4Deep, on average, exceeded by 32.87% and 35.69% in neuron coverage, while reducing 58.37% and 53.24% testing time, respectively. In the meantime, Test4Deep also produced 58.37% and 53.24% more test cases with 23.81% and 98.40% fewer perturbations. Even compared with the two highest neuron coverage strategies of DLFuzz, Test4Deep still enhanced neuron coverage by 4.34% and 23.23% and achieved 94.48% and 85.67% higher generation time efficiency. Furthermore, Test4Deep could improve the accuracy and robustness of DNNs by merging generated test cases and retraining.