Classifying Perturbation Types for Robustness Against Multiple Adversarial Perturbations

Pratyush Maini,Xinyun Chen,Bo Li,D. Song
2020-01-01
Abstract:Despite the recent advances in defenses against adversarial attacks, deep neural networks typically stay vulnerable to adversaries outside the perturbation type they are trained to be robust against. Recent work has aimed to improve the robustness of a single model against the union of multiple perturbation types, e.g., (cid:96) 1 , (cid:96) 2 and (cid:96) ∞ . However, when evaluating their accuracy against each individual perturbation type, they still do not match the performance of models trained specifically for that single perturbation type. To close this gap, we propose Classify Then Predict (CTP) , a two-stage pipeline to improve the robustness against the union of multiple perturbation types. Instead of training a single label predictor for different perturbation types, CTP first classifies the perturbation type of the input, and then leverages a label predictor specifically trained against that adversary to provide the final prediction. We first provide a theoretical analysis to show that adversarial examples with different perturbation types constitute different distributions, which makes it possible to distinguish them. Further, we show that at test time, the adversary faces a natural trade-off between fooling the attack classifier and the robust label predictor, and as a result, is unable to plant strong attacks against the pipeline. On MNIST, our approach achieves a 10% improvement on the overall adversarial accuracy against the union of (cid:96) 1 , (cid:96) 2 , (cid:96) ∞ perturbation balls.
What problem does this paper attempt to address?