Abstract:Deep Neural Networks (DNNs) have been widely used in various domains, such as computer vision and software engineering. Although many DNNs have been deployed to assist various tasks in the real world, similar to traditional software, they also suffer from defects that may lead to severe outcomes. DNN testing is one of the most widely used methods to ensure the quality of DNNs. Such method needs rich test inputs with oracle information (expected output) to reveal the incorrect behaviors of a DNN model. However, manually labeling all the collected test inputs is a labor-intensive task, which delays the quality assurance process. Test selection tackles this problem by carefully selecting a small, more suspicious set of test inputs to label, enabling the failure detection of a DNN model with reduced effort. Researchers have proposed different test selection methods, including neuron-coverage-based and uncertainty-based methods, where the uncertainty-based method is arguably the most popular technique. Unfortunately, existing uncertainty-based selection methods meet the performance bottleneck due to one or several limitations: 1) they ignore noisy data in real scenarios; 2) they wrongly exclude many failure-revealing test inputs but rather include many successful test inputs (referring to those test inputs that are correctly predicted by the model); 3) they ignore the diversity of the selected test set. In this paper, we propose RTS, a Robust Test Selection method for deep neural networks to overcome the limitations mentioned above. First, RTS divides all unlabeled candidate test inputs into noise set, successful set, and suspicious set and assigns different selection prioritization to divided sets, which effectively alleviates the impact of noise and improves the ability to identify suspect test inputs. Subsequently, RTS leverages a probability-tier-matrix-based test metric for prioritizing the test inputs in each divided set (i.e., suspicious, successful, and noise set). As a result, RTS can select more suspicious test inputs within a limited selection size. We evaluate RTS by comparing it with 14 baseline methods under 5 widely-used DNN models and 6 widely-used datasets. The experimental results demonstrate that RTS can significantly outperform all test selection methods in failure detection capability and the test suites selected by RTS have the best model optimization capability. For example, when selecting 2.5% test input, RTS achieves an improvement of 9.37%-176.75% over baseline methods in terms of failure detection.

Seed Selection for Testing Deep Neural Networks

In Defense of Simple Techniques for Neural Network Test Case Selection

There is Limited Correlation Between Coverage and Robustness for Deep Neural Networks

Evaluating the Robustness of Test Selection Methods for Deep Neural Networks

DeepGD: A Multi-Objective Black-Box Test Selection Approach for Deep Neural Networks

Robust Test Selection for Deep Neural Networks

DeepGini: prioritizing massive tests to enhance the robustness of deep neural networks

Test Selection for Deep Learning Systems

Q uo T e : Quality-oriented Testing for Deep Learning Systems

Neuron Sensitivity Guided Test Case Selection for Deep Learning Testing

Testing Deep Learning Models: A First Comparative Study of Multiple Testing Techniques

QuoTe: Quality-oriented Testing for Deep Learning Systems

RobOT: Robustness-Oriented Testing for Deep Learning Systems

DeepEvolution: A Search-Based Testing Approach for Deep Neural Networks

Can Search-Based Testing with Pareto Optimization Effectively Cover Failure-Revealing Test Inputs?

Can test input selection methods for deep neural network guarantee test diversity? A large-scale empirical study

Distance-Aware Test Input Selection for Deep Neural Networks

DeepState: Selecting Test Suites to Enhance the Robustness of Recurrent Neural Networks

Test Optimization in DNN Testing: A Survey

Measuring Discrimination to Boost Comparative Testing for Multiple Deep Learning Models