Abstract:We use distributionally-robust optimization for machine learning to mitigate the effect of data poisoning attacks. We provide performance guarantees for the trained model on the original data (not including the poison records) by training the model for the worst-case distribution on a neighbourhood around the empirical distribution (extracted from the training dataset corrupted by a poisoning attack) defined using the Wasserstein distance. We relax the distributionally-robust machine learning problem by finding an upper bound for the worst-case fitness based on the empirical sampled-averaged fitness and the Lipschitz-constant of the fitness function (on the data for given model parameters) as regularizer. For regression models, we prove that this regularizer is equal to the dual norm of the model parameters. We use the Wine Quality dataset, the Boston Housing Market dataset, and the Adult dataset for demonstrating the results of this paper.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to mitigate the impact of data poisoning attacks in machine learning. Specifically, the author proposes to use the distributionally - robust optimization (DRO) method to combat these attacks. Through this method, the worst - case distribution can be considered when training the model, thereby ensuring the performance of the model on the original data (excluding the poisoned data records). ### Main Contributions 1. **Distributionally - Robust Optimization**: The author defines a neighborhood using the Wasserstein distance and optimizes around the empirical distribution extracted from the poisoned training dataset. This can ensure that the model performs well on the worst - case distribution. 2. **Performance Guarantee**: By optimizing within the neighborhood defined by the Wasserstein distance, the author provides a performance guarantee of the trained model on the original data. 3. **Regularization Method**: The author proves that the distributionally - robust optimization problem can be relaxed into a standard regularized machine - learning problem by introducing a regularization term based on the Lipschitz constant of the loss function. 4. **Regularization of Regression Models**: For linear regression and logistic regression models, the author further simplifies the regularization term and proves its effectiveness. 5. **Experimental Verification**: The author uses three datasets, namely Wine Quality, Boston Housing Market, and Adult, to verify the effectiveness of the proposed method. ### Specific Methods - **Wasserstein Distance**: The Wasserstein distance is used to measure the distance between two probability distributions and can be regarded as an optimal mass - transfer plan. - **Distributionally - Robust Optimization Problem**: By finding the upper bound of the expected loss in the worst - case within the neighborhood defined by the Wasserstein distance, the distributionally - robust optimization problem is transformed into a standard optimization problem with a regularization term. - **Regularization Term**: For the linear regression model, the regularization term can be expressed as the dual norm of the model parameters; for the logistic regression model, the regularization term can be expressed as the dual norm of the model parameters multiplied by a data - related constant. ### Experimental Results - **Wine Quality Dataset**: Under data - modification and label - flipping attacks, the test performance of the regularized model is significantly better than that of the non - regularized model. - **Boston Housing Market Dataset**: Under data - modification attacks, the test performance of the regularized model is also better than that of the non - regularized model. - **Adult Dataset**: Under label - flipping attacks, the test performance of the regularized model also shows stronger robustness. ### Conclusion By using the distributionally - robust optimization method, the impact of data poisoning attacks can be effectively mitigated during the training process, and the performance of the model on the original data can be improved. Future work can further explore the application of this method in more complex machine - learning models (such as neural networks).

Regularization Helps with Mitigating Poisoning Attacks: Distributionally-Robust Machine Learning Using the Wasserstein Distance

Oblivion: Poisoning Federated Learning by Inducing Catastrophic Forgetting.

A Robust Learning Algorithm for Regression Models Using Distributionally Robust Optimization under the Wasserstein Metric

What Distributions are Robust to Indiscriminate Poisoning Attacks for Linear Learners?

With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models

Distributionally-Robust Machine Learning Using Locally Differentially-Private Data

Lethal Dose Conjecture on Data Poisoning

Robust Distribution Learning with Local and Global Adversarial Corruptions

Hyperparameter Learning under Data Poisoning: Analysis of the Influence of Regularization via Multiobjective Bilevel Optimization

Regularization for Wasserstein distributionally robust optimization

Robustified Multivariate Regression and Classification Using Distributionally Robust Optimization under the Wasserstein Metric

Stronger Data Poisoning Attacks Break Data Sanitization Defenses

On Generalization and Regularization Via Wasserstein Distributionally Robust Optimization

Robust Linear Regression Against Training Data Poisoning

A Robust Learning Approach for Regression Models Based on Distributionally Robust Optimization.

On the Relevance of Byzantine Robust Optimization Against Data Poisoning

Certified Robustness to Data Poisoning in Gradient-Based Training

Universal generalization guarantees for Wasserstein distributionally robust models

A Distributionally Robust Optimization Approach for Multivariate Linear Regression under the Wasserstein Metric

Data Poisoning Attacks on Regression Learning and Corresponding Defenses

Regularization for Adversarial Robust Learning