DeepCNP: an Efficient White-Box Testing of Deep Neural Networks by Aligning Critical Neuron Paths

Weiguang Liu,Senlin Luo,Limin Pan,Zhao Zhang
DOI: https://doi.org/10.1016/j.infsof.2024.107640
IF: 3.9
2024-01-01
Information and Software Technology
Abstract:Context Erroneous decisions of Deep Neural Networks may pose a significant threat to Deep Learning systems deployed in security-critical domains. The key to testing DNNs is to propose a testing technique to generate test cases that can detect more defects of the models. It has been demonstrated that coverage-guided fuzz testing methods are difficult to detect the correctness defects of model's decision logic. Meanwhile, the neuron activation threshold is set based on experience, which increases the uncertainty of the test even more. In addition, the randomly selected seed mutations are prone to generate a large number of invalid test cases, which has a great impact on the testing efficiency. Objective This paper introduces DeepCNP, a method that combines Critical Neuron Paths alignment and dynamic seeds selection strategy, which can comprehensively and efficiently test all the decision paths of DNN and generate as many different classes of test cases as possible to expose misbehaviors of the model and thus finding defects. Method DeepCNP utilizes training data to construct decision paths determined by the neuron output distribution, and aligns different decision paths in order to generate test cases. Seeds that are easy to align are dynamically selected based on the decision paths to be tested, and the labeling of seed mutations is specified during the path alignment process, thus improving the efficiency of fuzz testing. Results Experimental results show that DeepCNP achieves new state-of-the-art results, pioneering the testing of all decision logics of the model through critical neuron path alignment, which greatly enhances the number of defects found, the efficiency and number of generated test cases. Conclusion DeepCNP comprehensively tests the decision logic of DNNs, efficiently generating a large number of test cases of different categories to expose model's misbehaviors and thus finding additional defects.
What problem does this paper attempt to address?