Abstract:Deep Neural Networks (DNNs) have been widely used in various domains, such as computer vision and software engineering. Although many DNNs have been deployed to assist various tasks in the real world, similar to traditional software, they also suffer from defects that may lead to severe outcomes. DNN testing is one of the most widely used methods to ensure the quality of DNNs. Such method needs rich test inputs with oracle information (expected output) to reveal the incorrect behaviors of a DNN model. However, manually labeling all the collected test inputs is a labor-intensive task, which delays the quality assurance process. Test selection tackles this problem by carefully selecting a small, more suspicious set of test inputs to label, enabling the failure detection of a DNN model with reduced effort. Researchers have proposed different test selection methods, including neuron-coverage-based and uncertainty-based methods, where the uncertainty-based method is arguably the most popular technique. Unfortunately, existing uncertainty-based selection methods meet the performance bottleneck due to one or several limitations: 1) they ignore noisy data in real scenarios; 2) they wrongly exclude many failure-revealing test inputs but rather include many successful test inputs (referring to those test inputs that are correctly predicted by the model); 3) they ignore the diversity of the selected test set. In this paper, we propose RTS, a Robust Test Selection method for deep neural networks to overcome the limitations mentioned above. First, RTS divides all unlabeled candidate test inputs into noise set, successful set, and suspicious set and assigns different selection prioritization to divided sets, which effectively alleviates the impact of noise and improves the ability to identify suspect test inputs. Subsequently, RTS leverages a probability-tier-matrix-based test metric for prioritizing the test inputs in each divided set (i.e., suspicious, successful, and noise set). As a result, RTS can select more suspicious test inputs within a limited selection size. We evaluate RTS by comparing it with 14 baseline methods under 5 widely-used DNN models and 6 widely-used datasets. The experimental results demonstrate that RTS can significantly outperform all test selection methods in failure detection capability and the test suites selected by RTS have the best model optimization capability. For example, when selecting 2.5% test input, RTS achieves an improvement of 9.37%-176.75% over baseline methods in terms of failure detection.

DeepState: Selecting Test Suites to Enhance the Robustness of Recurrent Neural Networks

DeepState

There is Limited Correlation Between Coverage and Robustness for Deep Neural Networks

An Empirical Study on Correlation between Coverage and Robustness for Deep Neural Networks

Adaptive Test Selection for Deep Neural Networks

Robust Test Selection for Deep Neural Networks

Neuron Sensitivity Guided Test Case Selection for Deep Learning Testing

Coverage-Guided Testing for Recurrent Neural Networks

In Defense of Simple Techniques for Neural Network Test Case Selection

Evaluating the Robustness of Test Selection Methods for Deep Neural Networks

DeepGini: prioritizing massive tests to enhance the robustness of deep neural networks

RNN-Test: Towards Adversarial Testing for Recurrent Neural Network Systems

DeepGD: A Multi-Objective Black-Box Test Selection Approach for Deep Neural Networks

DeepCover: Advancing RNN Test Coverage and Online Error Prediction using State Machine Extraction

Behavior Pattern-Driven Test Case Selection for Deep Neural Networks.

DeepIA: An Interpretability Analysis based Test Data Generation Method for DNN

A White-Box Testing for Deep Neural Networks Based on Neuron Coverage.

DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars

DeepSample: DNN sampling-based testing for operational accuracy assessment