Robust Black-box Testing of Deep Neural Networks using Co-Domain Coverage

Aishwarya Gupta,Indranil Saha,Piyush Rai
2024-08-13
Abstract:Rigorous testing of machine learning models is necessary for trustworthy deployments. We present a novel black-box approach for generating test-suites for robust testing of deep neural networks (DNNs). Most existing methods create test inputs based on maximizing some "coverage" criterion/metric such as a fraction of neurons activated by the test inputs. Such approaches, however, can only analyze each neuron's behavior or each layer's output in isolation and are unable to capture their collective effect on the DNN's output, resulting in test suites that often do not capture the various failure modes of the DNN adequately. These approaches also require white-box access, i.e., access to the DNN's internals (node activations). We present a novel black-box coverage criterion called Co-Domain Coverage (CDC), which is defined as a function of the model's output and thus takes into account its end-to-end behavior. Subsequently, we develop a new fuzz testing procedure named CoDoFuzz, which uses CDC to guide the fuzzing process to generate a test suite for a DNN. We extensively compare the test suite generated by CoDoFuzz with those generated using several state-of-the-art coverage-based fuzz testing methods for the DNNs trained on six publicly available datasets. Experimental results establish the efficiency and efficacy of CoDoFuzz in generating the largest number of misclassified inputs and the inputs for which the model lacks confidence in its decision.
Machine Learning
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve This paper aims to address the robustness testing problem of Deep Neural Networks (DNNs). Specifically, most existing testing methods rely on maximizing certain "coverage" criteria (such as the proportion of activated neurons). These methods can only analyze the behavior of each neuron or the output of each layer individually, failing to capture their collective impact on the DNN output. As a result, the generated test suites often cannot fully reveal various fault modes of the DNN. Additionally, these methods usually require white-box access, meaning they need access to the internal node activations of the DNN. To overcome these issues, the authors propose a new black-box testing method called "Co-Domain Coverage" (CDC). CDC is a coverage criterion based on the DNN output, capable of evaluating the DNN's behavior from an end-to-end perspective. Based on CDC, the authors developed a new fuzz testing program called CoDoFuzz, which is used to generate test suites for DNNs. Through experiments, the authors validated the efficiency and effectiveness of CoDoFuzz in generating the maximum number of misclassified inputs and inputs with uncertain model decisions.