Learning perturbation sets for robust machine learning

Eric Wong,J. Zico Kolter

DOI: https://doi.org/10.48550/arXiv.2007.08450

2020-10-08

Abstract:Although much progress has been made towards robust deep learning, a significant gap in robustness remains between real-world perturbations and more narrowly defined sets typically studied in adversarial defenses. In this paper, we aim to bridge this gap by learning perturbation sets from data, in order to characterize real-world effects for robust training and evaluation. Specifically, we use a conditional generator that defines the perturbation set over a constrained region of the latent space. We formulate desirable properties that measure the quality of a learned perturbation set, and theoretically prove that a conditional variational autoencoder naturally satisfies these criteria. Using this framework, our approach can generate a variety of perturbations at different complexities and scales, ranging from baseline spatial transformations, through common image corruptions, to lighting variations. We measure the quality of our learned perturbation sets both quantitatively and qualitatively, finding that our models are capable of producing a diverse set of meaningful perturbations beyond the limited data seen during training. Finally, we leverage our learned perturbation sets to train models which are empirically and certifiably robust to adversarial image corruptions and adversarial lighting variations, while improving generalization on non-adversarial data. All code and configuration files for reproducing the experiments as well as pretrained model weights can be found at <a class="link-external link-https" href="https://github.com/locuslab/perturbation_learning" rel="external noopener nofollow">this https URL</a>.

Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the significant robustness gap between perturbations in the real world and the narrow perturbation sets usually defined in current adversarial defense research. Specifically, the authors aim to bridge this gap by learning perturbation sets from data to characterize real - world effects for robust training and evaluation. They use a conditional generator to define perturbation sets within the constrained regions of the latent space and propose the desired properties for measuring the quality of the learned perturbation sets, and theoretically prove that the Conditional Variational Autoencoder (CVAE) naturally meets these criteria. This method can generate a variety of perturbations of different complexity and scale, from basic spatial transformations to common image erosions to illumination changes. In addition, they also use the learned perturbation sets to train models, making them not only empirically and certificatedly robust against adversarial image erosions and adversarial illumination changes, but also able to improve the generalization ability for non - adversarial data. The key contribution of the paper lies in providing a framework that can learn robust models without relying on predefined perturbation sets, thus expanding the application range of adversarial defense methods to deal with a wider range of real - world perturbations. This solves the problem that existing robust learning methods are difficult to handle human invariants beyond mathematically defined perturbation sets, especially in cases where real - world attacks and general robustness concepts are often not representable by formal equations.

Learning perturbation sets for robust machine learning

Adversarially Robust Learning with Unknown Perturbation Sets

Perturbation diversity certificates robust generalization

Adversarial Training and Robustness for Multiple Perturbations

Model-Based Robust Deep Learning: Generalizing to Natural, Out-of-Distribution Data

Robust Universal Adversarial Perturbations

Robust Weight Perturbation for Adversarial Training

Benchmarking Robustness of Deep Learning Classifiers Using Two-Factor Perturbation

Enhancing object detection robustness: A synthetic and natural perturbation approach

A Taxonomy for Learning with Perturbation and Algorithms

Theoretical Understanding of Learning from Adversarial Perturbations

An Empirical Study on the Effect of Training Data Perturbations on Neural Network Robustness

Towards Robustness against Unsuspicious Adversarial Examples

Improving Model Robustness by Adaptively Correcting Perturbation Levels with Active Queries.

Adaptive Perturbation for Adversarial Attack

Wide Two-Layer Networks can Learn from Adversarial Perturbations

One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models

Quantifying the robustness of deep multispectral segmentation models against natural perturbations and data poisoning

Robust Adversarial Perturbation on Deep Proposal-based Models

Analysis of Random Perturbations for Robust Convolutional Neural Networks