Fabio De Gaspari,Dorjan Hitaj,Luigi V. Mancini
Abstract:The unprecedented availability of training data fueled the rapid development of powerful neural networks in recent years. However, the need for such large amounts of data leads to potential threats such as poisoning attacks: adversarial manipulations of the training data aimed at compromising the learned model to achieve a given adversarial goal.
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper aims to solve the threat of **Data Poisoning Attacks** to the training of neural networks, especially in the case of **Clean - label Poisoning Attacks**. Specifically, the paper focuses on how to detect and filter out poisoned data points in the **Transfer Learning** environment.
#### Background and problem description
As the demand for large amounts of training data in machine - learning models continues to increase, data - poisoning attacks have become a serious security threat. Such attacks disrupt the trained model by injecting maliciously modified data points into the training data, causing it to exhibit abnormal behavior during inference. In particular, **Clean - label Poisoning Attacks** do not change the labels of the poisoned samples and do not significantly affect the overall performance of the model, so they are very difficult to detect.
#### Main contributions of the paper
1. **Proposed a new defense method**: The paper proposed a method based on Feature Maps analysis, using the Batch Normalization (BN) layer to construct a feature vector representation of data points. This method can effectively separate clean samples and poisoned samples.
2. **Verified the effectiveness of the feature vector**: Through experimental verification, the paper shows that the feature vector can effectively distinguish between real poisoned samples and failed poisoned samples (i.e., those poisoned samples that failed to successfully disrupt the model).
3. **Extensive experimental evaluation**: The paper conducted a comprehensive experimental evaluation of the proposed defense method, covering multiple poison - generation algorithms, different datasets, and different - scale poison budgets. The experimental results show that this method is superior to existing state - of - the - art defense techniques in multiple aspects, including test accuracy and defense success rate.
#### Formula summary
- **Feature vector calculation**:
\[
C_y=\{(\mu_i(X_y), \sigma_i(X_y)^2)\mid\forall i < l\}\quad\forall y\in Y
\]
where \(X_y\) is the set of all data points with label \(y\), and \(\mu_i(X_y)\) and \(\sigma_i(X_y)^2\) are the mean and variance of the \(i\)-th BN layer, respectively.
- **Distance measurement**:
\[
d(X_j, C_y)=\sum_{i = 0}^{l}\gamma_i(\beta\cdot\text{sim}(\mu_i(x_j), \mu_i(X_y))+(1-\beta)\cdot\text{sim}(\sigma_i(x_j)^2, \sigma_i(X_y)^2))
\]
where \(\gamma_i\) is the weight coefficient of each BN layer, \(\beta\) is the weight of the mean and variance, and \(\text{sim}(A, B)\) is a similarity measurement function (e.g., cosine distance).
- **Cosine distance**:
\[
\text{sim}(A, B)=1-\frac{A\cdot B}{\|A\|\|B\|}
\]
In conclusion, this paper addresses the shortcomings of existing defense mechanisms in the face of complex poisoning attacks by introducing a new defense method based on feature - map analysis, especially prominent in the transfer - learning environment.