Abstract:Machine Learning (ML) software, used to implement an ML algorithm, is widely used in many application domains such as financial, business, and engineering domains. Faults in ML software can cause substantial losses in these application domains. Thus, it is very critical to conduct effective testing of ML software to detect and eliminate its faults. However, testing ML software is difficult, especially on producing test oracles used for checking behavior correctness (such as using expected properties or expected test outputs). To tackle the test-oracle issue, this thesis presents a novel black-box approach of multiple-implementation testing for supervised learning software. The insight underlying the approach is that there can be multiple implementations (independently written) for a supervised learning algorithm, and majority of them may produce the expected output for a test input (even if none of these implementations are fault-free). In particular, the proposed approach derives a pseudo oracle for a test input by running the test input on n implementations of the supervised learning algorithm, and then using the common test output produced by a majority (determined by a percentage threshold) of these n implementations. The proposed approach includes techniques to address challenges in multiple-implementation testing (or generally testing) of supervised learning software: the definition of test cases in testing supervised learning software, along with resolution of inconsistent algorithm configurations across implementations. In addition, to improve dependability of supervised learning software during in-field usage while incurring low runtime overhead, The approach includes a multipleimplementation monitoring technique. The evaluations on the proposed approach show that multiple-implementation testing is effective in detecting real faults in real-world ML software (even popularly used ones), including 5 faults from 10 NaiveBayes implementations and 4 faults from 20 k-nearest neighbor implementations, and the proposed technique of multipleimplementation monitoring substantially reduces the need of running mul-

Multiple-implementation testing of supervised learning software by oreoluwa alebiosu

Towards More Realistic Evaluation for Neural Test Oracle Generation

Towards more accurate multi-label software behavior learning

An empirical study of testing machine learning in the wild

Testing and Validating Machine Learning Classifiers by Metamorphic Testing.

Audee: Automated Testing for Deep Learning Frameworks

Improving the Quality of Computational Science Software by Using Metamorphic Relations to Test Machine Learning Applications

Application of Metamorphic Testing to Supervised Classifiers

Reproducibility in Multiple Instance Learning: A Case For Algorithmic Unit Tests

Software Testing for Machine Learning

METTLE: A Metamorphic Testing Approach To Validating Unsupervised Machine Learning Methods

Learning to Encode and Classify Test Executions

Learning the Noise of Failure: Intelligent System Tests for Robots

A Validation and Quality Assessment Method with Metamorphic Relations for Unsupervised Machine Learning Software.

The Integrity of Machine Learning Algorithms against Software Defect Prediction

Uncovering Unknown System Behaviors in Uncertain Networks with Model and Search-based Testing

Subgraph-Oriented Testing for Deep Learning Libraries

Multiple-implementation testing for XACML implementations.

Effective Software Fault Localization Using Predicted Execution Results.

Spectrum-Based Fault Localization: Testing Oracles Are No Longer Mandatory

DLLens: Testing Deep Learning Libraries via LLM-aided Synthesis