Abstract:Machine learning has achieved remarkable success across diverse domains. Nevertheless, concerns about interpretability in black-box models, especially within Deep Neural Networks (DNNs), have become pronounced in safety-critical fields like healthcare and finance. Classical machine learning (ML) classifiers, known for their higher interpretability, are preferred in these domains. Similar to DNNs, classical ML classifiers can exhibit bugs that could lead to severe consequences in practice. Test input prioritization has emerged as a promising approach to ensure the quality of an ML system, which prioritizes potentially misclassified tests so that such tests can be identified earlier with limited manual labeling costs. However, when applying to classical ML classifiers, existing DNN test prioritization methods are constrained from three perspectives: 1) Coverage-based methods are inefficient and time-consuming; 2) Mutation-based methods cannot be adapted to classical ML models due to mismatched model mutation rules; 3) Confidence-based methods are restricted to a single dimension when applying to binary ML classifiers, solely depending on the model's prediction probability for one class. To overcome the challenges, we propose MLPrior, a test prioritization approach specifically tailored for classical ML models. MLPrior leverages the characteristics of classical ML classifiers (i.e., interpretable models and carefully engineered attribute features) to prioritize test inputs. The foundational principles are: 1) tests more sensitive to mutations are more likely to be misclassified, and 2) tests closer to the model's decision boundary are more likely to be misclassified. Building on the first concept, we design mutation rules to generate two types of mutation features (i.e., model mutation features and input mutation features) for each test. Drawing from the second notion, MLPrior generates attribute features of each test based on its attribute values, which can indirectly reveal the proximity between the test and the decision boundary. For each test, MLPrior combines all three types of features of it into a final vector. Subsequently, MLPrior employs a pre-trained ranking model to predict the misclassification probability of each test based on its final vector and ranks tests accordingly. We conducted an extensive study to evaluate MLPrior based on 185 subjects, encompassing natural datasets, mixed noisy datasets, and fairness datasets. The results demonstrate that MLPrior outperforms all the compared test prioritization approaches, with an average improvement of 14.74% ∼66.93% on natural datasets, 18.55% ∼67.73% on mixed noisy datasets, and 15.34% ∼62.72% on fairness datasets.

Seeing the Invisible: Test Prioritization for Object Detection System

Prioritizing test cases for deep learning-based video classifiers

Test Input Prioritization for 3D Point Clouds

Probabilistic Approach for Road-Users Detection

Unsupervised Object Detection Pretraining with Joint Object Priors Generation and Detector Learning

Test Input Prioritization for Machine Learning Classifiers

FAST: Boosting Uncertainty-based Test Prioritization Methods for Neural Networks via Feature Selection

Evaluating Object (mis)Detection from a Safety and Reliability Perspective: Discussion and Measures

Metamorphic Testing for Object Detection Systems

A3Rank: Augmentation Alignment Analysis for Prioritizing Overconfident Failing Samples for Deep Learning Models

Rethinking the Non-Maximum Suppression Step in 3D Object Detection from a Bird's-Eye View

Unified-IoU: For High-Quality Object Detection

Optimizing test prioritization via test distribution analysis.

Uncertainty Evaluation of Object Detection Algorithms for Autonomous Vehicles

Enhance the 3D Object Detection With 2D Prior

Multi-objective Test Report Prioritization Using Image Understanding

Attention Mechanism and Detection Box Information Based Real-time Multi-Object Vehicle Detection

Elevating Detection Performance in Optical Remote Sensing Image Object Detection: A Dual Strategy with Spatially Adaptive Angle-Aware Networks and Edge-Aware Skewed Bounding Box Loss Function

Box Re-Ranking: Unsupervised False Positive Suppression for Domain Adaptive Pedestrian Detection

Surface defect detection of industrial components based on vision

DeepGini: prioritizing massive tests to enhance the robustness of deep neural networks