Abstract:With its growing use in safety/security-critical applications, Deep Learning (DL) has raised increasing concerns regarding its dependability. In particular, DL has a notorious problem of lacking robustness. Input added with adversarial perturbations, i.e. Adversarial Examples (AEs) are easily mis-predicted by the DL model. Despite recent efforts made in detecting AEs via state-of-the-art attack and testing methods, they are normally input distribution agnostic and/or disregard the perceptual quality of adversarial perturbations. Consequently, the detected AEs are irrelevant inputs in the application context or unrealistic that can be easily noticed by humans. This may lead to a limited effect on improving the DL model’s dependability, as the testing budget is likely to be wasted on detecting AEs that are encountered very rarely in its real-life operations. In this paper, we propose a new robustness testing approach for detecting AEs that considers both the feature level distribution and the pixel level distribution, capturing the perceptual quality of adversarial perturbations. The two considerations are encoded by a novel hierarchical mechanism. First, we select test seeds based on the density of feature level distribution and the vulnerability of adversarial robustness. The vulnerability of test seeds are indicated by the auxiliary information, that are highly correlated with local robustness. Given a test seed, we then develop a novel genetic algorithm based local test case generation method, in which two fitness functions work alternatively to control the perceptual quality of detected AEs. Finally, extensive experiments confirm that our holistic approach considering hierarchical distributions is superior to the state-of-the-arts that either disregard any input distribution or only consider a single (non-hierarchical) distribution, in terms of not only detecting imperceptible AEs but also improving the overall robustness of the DL model under testing.

Practical Accuracy Evaluation for Deep Learning Systems Via Latent Representation Discrepancy.

Label-free Evaluation for Performance of Fault Diagnosis Model on Unknown Distribution Dataset

Monitoring Perception Reliability in Autonomous Driving: Distributional Shift Detection for Estimating the Impact of Input Data on Prediction Accuracy

Testing Deep Learning Models: A First Comparative Study of Multiple Testing Techniques

Hierarchical Distribution-Aware Testing of Deep Learning

Towards characterizing adversarial defects of deep learning software from the lens of uncertainty

A Novel Statistical Measure for Out-of-Distribution Detection in Data Quality Assurance

Measuring Discrimination to Boost Comparative Testing for Multiple Deep Learning Models

Inaccurate Label Distribution Learning

Detecting Defects in Deep Learning Systems: a Survey

Evaluating the Robustness of Test Selection Methods for Deep Neural Networks

Complementary Learning for Real-World Model Failure Detection

An Effective Data-Driven Approach for Localizing Deep Learning Faults

Test Selection for Deep Learning Systems

LLM-Assisted Red Teaming of Diffusion Models through "Failures Are Fated, But Can Be Faded"

Cost-Effective Testing of a Deep Learning Model Through Input Reduction

Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks

LiRTest: Augmenting LiDAR Point Clouds for Automated Testing of Autonomous Driving Systems

Towards In-Distribution Compatible Out-of-Distribution Detection.

Coverage Guided Differential Adversarial Testing of Deep Learning Systems