Abstract:Deep learning (DL) applications, representing an emerging form of new software, are gaining increasing popularity by their intelligent and adaptive services. However, their service reliability depends highly on the prediction accuracy of their internally-integrated DL models. In practice, DL models are often observed to suffer from ill predictions upon abnormal inputs (e.g., adversarial attacking samples, out-of-distribution (OOD) samples, and etc.), and this could easily lead to unexpected behaviors or even catastrophic consequences (e.g., system crash). One promising way to guard the application reliability is to reveal such abnormal inputs in time before they are fed to the DL models integrated in the concerned applications. Then remedy actions (e.g., discarding or fixing these inputs) can be done to protect applications from acting abnormally. Existing work addressed this revealing problem by either making sample distance-comparison based analysis or generating sufficient model mutants for comparative analysis. However, such treatments caused a restricted focus on samples only, while overlooking the DL models themselves, or had to analyze massive mutants, incurring non-negligible overheads to applications. In this article, we propose a novel approach, NetChopper, to conducting a core analysis on the target DL model, and then partitioning it into two parts, one associating closely with the training knowledge being the model core (expected to be important and thus stable), and the other being the remaining part (expected to be immaterial and thus changeable). Based on such partitioning, NetChopper proceeds to preserve (or freeze) the model core, but mutate the remaining part to produce only a small number of model mutants. Later, NetChopper becomes able to reveal abnormal inputs from normal ones by exploiting these model-relevant and light-weight mutants only. We experimentally evaluated NetChopper by widely-used DL subjects (e.g., MNIST+LeNet4, and CIFAR10+VGG16) and typical abnormal inputs (e.g., adversarial and OOD samples). The results reported NetChopper ’s promising AUROC scores in revealing the abnormal degrees of inputs, generally and stably outperforming, or comparably effective as, state-of-the-art techniques (e.g., mMutant, Surprise, and Mahalanobis), and also confirmed its high effectiveness and efficiency (with only marginal online overhead).

DeepKernel: 2D-kernels clustering based mutant reduction for cost-effective deep learning model testing

Mutation Operator Reduction for Cost-effective Deep Learning Software Testing Via Decision Boundary Change Measurement

DeepMutation: Mutation Testing of Deep Learning Systems

How Higher Order Mutant Testing Performs for Deep Learning Models: A Fine-Grained Evaluation of Test Effectiveness and Efficiency Improved from Second-Order Mutant-Classification Tuples

Mutation Operator Reduction for Deep Learning System

Boundary Sampling to Boost Mutation Testing for Deep Learning Models.

There is Limited Correlation Between Coverage and Robustness for Deep Neural Networks

A Novel Method of Mutation Clustering Based on Domain Analysis

DevMuT: Testing Deep Learning Framework Via Developer Expertise-Based Mutation

Cost-Effective Testing of a Deep Learning Model Through Input Reduction

A Cost-effective and Machine-learning-based method to identify and cluster redundant mutants in software mutation testing

Freeze-and-mutate: Abnormal Sample Identification for DL Applications Through Model Core Analysis.

Mutation-Based Deep Learning Framework Testing Method in JavaScript Environment

DeepMetis: Augmenting a Deep Learning Test Set to Increase its Mutation Score

Deep Clustered Convolutional Kernels

MuNN: Mutation Analysis of Neural Networks

Optimizing Kernel Machines using Deep Learning

What Are We Really Testing in Mutation Testing for Machine Learning? A Critical Reflection

MutateNN: Mutation Testing of Image Recognition Models Deployed on Hardware Accelerators

Improving Testing of Deep-Learning Systems

Multiple-Boundary Clustering and Prioritization to Promote Neural Network Retraining