Regularization Helps with Mitigating Poisoning Attacks: Distributionally-Robust Machine Learning Using the Wasserstein Distance

Farhad Farokhi
DOI: https://doi.org/10.48550/arXiv.2001.10655
2020-01-29
Abstract:We use distributionally-robust optimization for machine learning to mitigate the effect of data poisoning attacks. We provide performance guarantees for the trained model on the original data (not including the poison records) by training the model for the worst-case distribution on a neighbourhood around the empirical distribution (extracted from the training dataset corrupted by a poisoning attack) defined using the Wasserstein distance. We relax the distributionally-robust machine learning problem by finding an upper bound for the worst-case fitness based on the empirical sampled-averaged fitness and the Lipschitz-constant of the fitness function (on the data for given model parameters) as regularizer. For regression models, we prove that this regularizer is equal to the dual norm of the model parameters. We use the Wine Quality dataset, the Boston Housing Market dataset, and the Adult dataset for demonstrating the results of this paper.
Machine Learning,Cryptography and Security,Signal Processing,Optimization and Control,Statistics Theory
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to mitigate the impact of data poisoning attacks in machine learning. Specifically, the author proposes to use the distributionally - robust optimization (DRO) method to combat these attacks. Through this method, the worst - case distribution can be considered when training the model, thereby ensuring the performance of the model on the original data (excluding the poisoned data records). ### Main Contributions 1. **Distributionally - Robust Optimization**: The author defines a neighborhood using the Wasserstein distance and optimizes around the empirical distribution extracted from the poisoned training dataset. This can ensure that the model performs well on the worst - case distribution. 2. **Performance Guarantee**: By optimizing within the neighborhood defined by the Wasserstein distance, the author provides a performance guarantee of the trained model on the original data. 3. **Regularization Method**: The author proves that the distributionally - robust optimization problem can be relaxed into a standard regularized machine - learning problem by introducing a regularization term based on the Lipschitz constant of the loss function. 4. **Regularization of Regression Models**: For linear regression and logistic regression models, the author further simplifies the regularization term and proves its effectiveness. 5. **Experimental Verification**: The author uses three datasets, namely Wine Quality, Boston Housing Market, and Adult, to verify the effectiveness of the proposed method. ### Specific Methods - **Wasserstein Distance**: The Wasserstein distance is used to measure the distance between two probability distributions and can be regarded as an optimal mass - transfer plan. - **Distributionally - Robust Optimization Problem**: By finding the upper bound of the expected loss in the worst - case within the neighborhood defined by the Wasserstein distance, the distributionally - robust optimization problem is transformed into a standard optimization problem with a regularization term. - **Regularization Term**: For the linear regression model, the regularization term can be expressed as the dual norm of the model parameters; for the logistic regression model, the regularization term can be expressed as the dual norm of the model parameters multiplied by a data - related constant. ### Experimental Results - **Wine Quality Dataset**: Under data - modification and label - flipping attacks, the test performance of the regularized model is significantly better than that of the non - regularized model. - **Boston Housing Market Dataset**: Under data - modification attacks, the test performance of the regularized model is also better than that of the non - regularized model. - **Adult Dataset**: Under label - flipping attacks, the test performance of the regularized model also shows stronger robustness. ### Conclusion By using the distributionally - robust optimization method, the impact of data poisoning attacks can be effectively mitigated during the training process, and the performance of the model on the original data can be improved. Future work can further explore the application of this method in more complex machine - learning models (such as neural networks).