Abstract:Adversarial examples pose a security threat to many critical systems built on neural networks (such as face recognition systems, and self-driving cars). While many methods have been proposed to build robust models, how to build certifiably robust yet accurate neural network models remains an open problem. For example, adversarial training improves empirical robustness, but they do not provide certification of the model's robustness. On the other hand, certified training provides certified robustness but at the cost of a significant accuracy drop. In this work, we propose a novel approach that aims to achieve both high accuracy and certified probabilistic robustness. Our method has two parts, i.e., a probabilistic robust training method with an additional goal of minimizing variance in terms of divergence and a runtime inference method for certified probabilistic robustness of the prediction. The latter enables efficient certification of the model's probabilistic robustness at runtime with statistical guarantees. This is supported by our training objective, which minimizes the variance of the model's predictions in a given vicinity, derived from a general definition of model robustness. Our approach works for a variety of perturbations and is reasonably efficient. Our experiments on multiple models trained on different datasets demonstrate that our approach significantly outperforms existing approaches in terms of both certification rate and accuracy.

What problem does this paper attempt to address?

### The problems the paper attempts to solve This paper aims to address the security threats of neural networks when facing adversarial samples. Specifically, the goal of the paper is to construct a neural network model that has both high accuracy and certified probability robustness. Current methods are usually divided into two categories: 1. **Adversarial training**: This method improves the empirical robustness of the model by using a mixture of normal and adversarial samples during the training process. Although adversarial training can improve the robustness of the model, it cannot provide a certified guarantee of the model's robustness and is vulnerable to new attack methods. 2. **Certified training**: This method provides certification of the model's robustness by incorporating robustness verification techniques during the training process. Although certified training can provide robustness certification, it usually leads to a significant decrease in the model's accuracy. The paper proposes a new method that aims to achieve both high accuracy and certified probability robustness simultaneously. Specifically, this method consists of two parts: - **Probabilistic robust training method**: Improve the robustness of the model by minimizing the variance of prediction results on similar inputs. - **Runtime inference method**: During the inference stage, provide the model's probabilistic robustness certification by considering the input and samples within its neighborhood. ### Method overview #### 1. Probabilistic robust training method This method improves the robustness of the model by minimizing the prediction variance of the model between different inputs within the same neighborhood. Specifically, the training objective can be expressed as the following optimization problem: \[ \min_{h} \mathbb{E}_{x \sim D} \left[ \mathbb{E}_{t \sim U(B(x))} [\ell(h(t), G_t)] + \lambda \cdot \text{Var}_{t \sim U(B(x))} [\ell(h(t), G_t)] \right] \] where: - \(\mathbb{E}_{x \sim D}\) represents the expectation of sampling input \(x\) from the data distribution \(D\). - \(\mathbb{E}_{t \sim U(B(x))}\) represents the expectation of uniformly sampling \(t\) within the neighborhood \(B(x)\) of input \(x\). - \(\ell(h(t), G_t)\) is the loss function, which measures the deviation between the model prediction \(h(t)\) and the true label \(G_t\). - \(\text{Var}_{t \sim U(B(x))} [\ell(h(t), G_t)]\) is the variance of the loss function. - \(\lambda\) is a weighting parameter that balances the two terms. By minimizing the above objective function, the model can not only improve its robustness against adversarial samples but also maintain high accuracy. #### 2. Runtime inference method During the inference stage, this method provides the model's probabilistic robustness certification by considering the input and samples within its neighborhood. Specifically, for any input \(x\) and model \(h\), the inference method can be expressed as: \[ h^*(x) := (h * B)(x) := \int_X h(\tau) I(x - \tau \in B(x)) d\tau \] where: - \(h^*\) represents the model based on the proposed inference method. - \(B(x)\) is the neighborhood of input \(x\). - \(I(\phi)\) is an indicator function that returns 1 when the condition \(\phi\) is true and 0 otherwise. To implement this inference method, Algorithm 2 provides a step - by - step implementation scheme. This method samples multiple samples within the neighborhood of input \(x\) and makes the final prediction based on the prediction results of the majority of samples. In this way, the model will only make an error when more than half of the sampled samples are mis - predicted. ### Experimental results The paper demonstrates the effectiveness of this method by training the model on multiple standard benchmark datasets and comparing it with existing state - of - the - art methods. The experimental results show that this method significantly improves the adversarial accuracy and the certified robustness rate while maintaining a relatively high standard accuracy. ### Summary The paper proposes a new...

Towards Certified Probabilistic Robustness with High Accuracy

Certified Robust Accuracy of Neural Networks Are Bounded due to Bayes Errors

Adaptive Certified Training: Towards Better Accuracy-Robustness Tradeoffs

How Does Bayes Error Limit Probabilistic Robust Accuracy

CC-CERT: A Probabilistic Approach to Certify General Robustness of Neural Networks

Toward Intrinsic Adversarial Robustness Through Probabilistic Training.

Towards Certifying the Asymmetric Robustness for Neural Networks: Quantification and Applications

Robustra: Training Provable Robust Neural Networks over Reference Adversarial Space.

Robustness of Neural Networks: A Probabilistic and Practical Approach

Adversarial Robustness Certification for Bayesian Neural Networks

A Recipe for Improved Certifiable Robustness

RAB: Provable Robustness Against Backdoor Attacks

SoK: Certified Robustness for Deep Neural Networks

Adaptive Retraining for Neural Network Robustness in Classification

Towards Certifying L Robustness Using Neural Networks with L-Dist Neurons

Towards Bridging the gap between Empirical and Certified Robustness against Adversarial Examples

On the Robustness of Adversarial Training Against Uncertainty Attacks

On the Certified Robustness for Ensemble Models and Beyond

Certifying Global Robustness for Deep Neural Networks

Certified Adversarial Robustness Within Multiple Perturbation Bounds

Regularized Training and Tight Certification for Randomized Smoothed Classifier with Provable Robustness