Abstract:Machine learning has achieved remarkable success across diverse domains. Nevertheless, concerns about interpretability in black-box models, especially within Deep Neural Networks (DNNs), have become pronounced in safety-critical fields like healthcare and finance. Classical machine learning (ML) classifiers, known for their higher interpretability, are preferred in these domains. Similar to DNNs, classical ML classifiers can exhibit bugs that could lead to severe consequences in practice. Test input prioritization has emerged as a promising approach to ensure the quality of an ML system, which prioritizes potentially misclassified tests so that such tests can be identified earlier with limited manual labeling costs. However, when applying to classical ML classifiers, existing DNN test prioritization methods are constrained from three perspectives: 1) Coverage-based methods are inefficient and time-consuming; 2) Mutation-based methods cannot be adapted to classical ML models due to mismatched model mutation rules; 3) Confidence-based methods are restricted to a single dimension when applying to binary ML classifiers, solely depending on the model's prediction probability for one class. To overcome the challenges, we propose MLPrior, a test prioritization approach specifically tailored for classical ML models. MLPrior leverages the characteristics of classical ML classifiers (i.e., interpretable models and carefully engineered attribute features) to prioritize test inputs. The foundational principles are: 1) tests more sensitive to mutations are more likely to be misclassified, and 2) tests closer to the model's decision boundary are more likely to be misclassified. Building on the first concept, we design mutation rules to generate two types of mutation features (i.e., model mutation features and input mutation features) for each test. Drawing from the second notion, MLPrior generates attribute features of each test based on its attribute values, which can indirectly reveal the proximity between the test and the decision boundary. For each test, MLPrior combines all three types of features of it into a final vector. Subsequently, MLPrior employs a pre-trained ranking model to predict the misclassification probability of each test based on its final vector and ranks tests accordingly. We conducted an extensive study to evaluate MLPrior based on 185 subjects, encompassing natural datasets, mixed noisy datasets, and fairness datasets. The results demonstrate that MLPrior outperforms all the compared test prioritization approaches, with an average improvement of 14.74% ∼66.93% on natural datasets, 18.55% ∼67.73% on mixed noisy datasets, and 15.34% ∼62.72% on fairness datasets.

Multiple-Boundary Clustering and Prioritization to Promote Neural Network Retraining

An Improved Multi-Task Approach for SHM Missing Data Reconstruction Using Attentive Neural Process and Meta-Learning

Prioritizing Test Inputs for DNNs Using Training Dynamics

In Defense of Simple Techniques for Neural Network Test Case Selection

Boosting Operational DNN Testing Efficiency Through Conditioning

Optimum splitting computing for DNN training through next generation smart networks: a multi-tier deep reinforcement learning approach

Measuring Discrimination to Boost Comparative Testing for Multiple Deep Learning Models

DeepGD: A Multi-Objective Black-Box Test Selection Approach for Deep Neural Networks

Boundary Sampling to Boost Mutation Testing for Deep Learning Models.

TBD: Benchmarking and Analyzing Deep Neural Network Training

Test Input Prioritization for Machine Learning Classifiers

Deep Learning System Boundary Testing through Latent Space Style Mixing

Neuron Sensitivity Guided Test Case Selection for Deep Learning Testing

Practical Accuracy Evaluation for Deep Learning Systems Via Latent Representation Discrepancy.

DeepGini: prioritizing massive tests to enhance the robustness of deep neural networks

HCEC: An efficient geo-distributed deep learning training strategy based on wait-free back-propagation

Cost-Effective Testing of a Deep Learning Model Through Input Reduction

RedTest: Towards Measuring Redundancy in Deep Neural Networks Effectively

A Scenario-Based Functional Testing Approach to Improving DNN Performance

A Model-Based Unsupervised Deep Learning Method for Low-Dose CT Reconstruction