Testing Neural Network Verifiers: A Soundness Benchmark with Hidden Counterexamples

Xingjian Zhou,Hongji Xu,Andy Xu,Zhouxing Shi,Cho-Jui Hsieh,Huan Zhang
2024-12-04
Abstract:In recent years, many neural network (NN) verifiers have been developed to formally verify certain properties of neural networks such as robustness. Although many benchmarks have been constructed to evaluate the performance of NN verifiers, they typically lack a ground-truth for hard instances where no current verifier can verify and no counterexample can be found, which makes it difficult to check the soundness of a new verifier if it claims to verify hard instances which no other verifier can do. We propose to develop a soundness benchmark for NN verification. Our benchmark contains instances with deliberately inserted counterexamples while we also try to hide the counterexamples from regular adversarial attacks which can be used for finding counterexamples. We design a training method to produce neural networks with such hidden counterexamples. Our benchmark aims to be used for testing the soundness of NN verifiers and identifying falsely claimed verifiability when it is known that hidden counterexamples exist. We systematically construct our benchmark and generate instances across diverse model architectures, activation functions, input sizes, and perturbation radii. We demonstrate that our benchmark successfully identifies bugs in state-of-the-art NN verifiers, as well as synthetic bugs, providing a crucial step toward enhancing the reliability of testing NN verifiers. Our code is available at <a class="link-external link-https" href="https://github.com/MVP-Harry/SoundnessBench" rel="external noopener nofollow">this https URL</a> and our benchmark is available at <a class="link-external link-https" href="https://huggingface.co/datasets/SoundnessBench/SoundnessBench" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence,Software Engineering
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of reliability testing of neural network (NN) verifiers. Specifically, the authors propose a new benchmarking method to evaluate the correctness and reliability of NN verifiers, especially when dealing with difficult - to - verify instances. #### Background and Motivation In recent years, many neural network verifiers have been developed to formally verify certain properties of neural networks, such as robustness. However, existing benchmarks usually lack "ground truth", that is, for those difficult instances that cannot be verified by any current verifier and for which no counter - examples can be found, it is very difficult to check whether a new verifier is reliable when it claims to verify these difficult instances. #### Core Problems of the Paper 1. **Lack of Ground Truth**: The instances in existing benchmarks do not have clear correct answers, so it is difficult to judge whether a new verifier is truly reliable when dealing with these instances. 2. **Hidden Counter - examples**: In order to test the reliability of verifiers more comprehensively, it is necessary to construct instances containing hidden counter - examples. These counter - examples should be able to avoid conventional adversarial attacks, so that the verifier can enter the actual verification process. #### Solutions The authors propose a new benchmarking method, which includes the following key steps: - **Design Training Method**: Generate neural network models containing hidden counter - examples through specific training methods. These counter - examples are not easily found in conventional adversarial attacks, but they do exist. - **Construct Diverse Instances**: Generate neural network models with different architectures, activation functions, input sizes and perturbation radii to comprehensively test different verifiers. - **Verifier Testing**: Use these instances containing hidden counter - examples to test the reliability of existing NN verifiers and identify mis - declared verification results. #### Specific Implementation 1. **Define the Problem**: - Each instance \((f, x_0, y, \epsilon)\) defines a property that needs to be verified: \[ \forall x \in B(x_0, \epsilon), f_y(x) - f_i(x) > 0 \] where \(B(x_0, \epsilon)=\{x\mid \|x - x_0\|_\infty \leq \epsilon\}\), representing the \(\ell_\infty\) ball centered at \(x_0\) with a perturbation radius of \(\epsilon\). 2. **Generate Data**: - Construct a data set containing non - verifiable instances (with predefined counter - examples) and clean instances. The counter - examples of non - verifiable instances should satisfy: \[ x_{cex}=x_0+\delta_{cex}\quad \text{s.t.}\quad \|\delta_{cex}\|_\infty \leq \epsilon,\quad f_y(x_{cex}) - f_i(x_{cex}) \leq 0 \] 3. **Training Method**: - Use a two - part loss function for training: - The first part ensures that the predefined counter - examples become real counter - examples. - The second part eliminates easily found counter - examples through adversarial training, making the model robust to most inputs. 4. **Experimental Verification**: - Use strong adversarial attacks (such as PGD attack and AutoAttack) to verify whether the predefined counter - examples are truly hidden. - Identify errors in existing NN verifiers through benchmarking and demonstrate their effectiveness in detecting synthetic errors. In conclusion, this paper provides a systematic method to evaluate and improve the reliability of NN verifiers by introducing benchmarks containing hidden counter - examples.