Detecting Adversarial Attacks in Semantic Segmentation via Uncertainty Estimation: A Deep Analysis

Kira Maag,Roman Resner,Asja Fischer
2024-08-19
Abstract:Deep neural networks have demonstrated remarkable effectiveness across a wide range of tasks such as semantic segmentation. Nevertheless, these networks are vulnerable to adversarial attacks that add imperceptible perturbations to the input image, leading to false predictions. This vulnerability is particularly dangerous in safety-critical applications like automated driving. While adversarial examples and defense strategies are well-researched in the context of image classification, there is comparatively less research focused on semantic segmentation. Recently, we have proposed an uncertainty-based method for detecting adversarial attacks on neural networks for semantic segmentation. We observed that uncertainty, as measured by the entropy of the output distribution, behaves differently on clean versus adversely perturbed images, and we utilize this property to differentiate between the two. In this extended version of our work, we conduct a detailed analysis of uncertainty-based detection of adversarial attacks including a diverse set of adversarial attacks and various state-of-the-art neural networks. Our numerical experiments show the effectiveness of the proposed uncertainty-based detection method, which is lightweight and operates as a post-processing step, i.e., no model modifications or knowledge of the adversarial example generation process are required.
Computer Vision and Pattern Recognition,Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to detect adversarial attacks in semantic segmentation tasks. Specifically, the author focuses on how to distinguish normal samples from samples under adversarial attacks by estimating uncertainty. The following are the core issues of the paper: 1. **The threat of adversarial attacks**: Deep neural networks (DNNs) perform well in various tasks, but are very vulnerable to adversarial attacks. Adversarial attacks add small and imperceptible perturbations to the input image, causing the model to make wrong predictions. This vulnerability is especially dangerous in safety - critical applications such as autonomous driving. 2. **Limitations of existing research**: Although adversarial attacks and defense strategies have been widely studied in image classification tasks, relatively few studies have been conducted in semantic segmentation tasks. Therefore, it is crucial to develop effective strategies to detect or defend against adversarial attacks in semantic segmentation. 3. **Uncertainty - based detection method**: The author proposes an uncertainty - based detection method, which uses uncertainty measures such as the entropy of the output distribution to distinguish normal samples from adversarial samples. This method does not require modifying the model structure or knowing the generation process of adversarial samples, and only relies on the model output information as a post - processing step. ### Specific objectives of the paper - **Research on the effectiveness of the detection method**: Evaluate the performance of the proposed uncertainty - based detection method on different types of adversarial attacks and the latest network architectures (such as convolutional neural networks and Transformers). - **Explore the application of uncertainty features**: Not only use the aggregated uncertainty measures, but also study classifier training based on the complete heat map to verify whether pixel - level uncertainty information can improve classification performance. - **Provide extensive experimental analysis**: Through comprehensive experiments on multiple adversarial attack methods and different network architectures, demonstrate the robustness and effectiveness of this method in practical applications. ### Formula representation The formulas involved in the paper include: - **Entropy**: \[ E(x)_z = -\frac{1}{\log(|C|)} \sum_{y \in C} p(y|x)_z\cdot\log p(y|x)_z \] where \(E(x)_z\) represents the entropy of the \(z\) - th pixel, \(p(y|x)_z\) is the prediction probability of the model for class \(y\), and \(C\) is the set of all possible classes. - **Variation Ratio**: \[ V(x)_z = 1 - p(\hat{y}_x^z|x)_z \] where \(\hat{y}_x^z\) is the predicted class of the \(z\) - th pixel. - **Probability Margin**: \[ M(x)_z = p(\hat{y}_x^z|x)_z-\max_{y\in C\setminus\{\hat{y}_x^z\}} p(y|x)_z \] These formulas are used to quantify the uncertainty of model predictions, thereby helping to distinguish normal samples from adversarial samples. ### Summary This paper aims to effectively detect adversarial attacks in semantic segmentation tasks through an uncertainty - based method. The author has verified the robustness and effectiveness of this method under different attack types and network architectures through extensive experiments, and demonstrated its potential in practical applications.