Abstract:Adversarial Transferability is an intriguing property of adversarial examples – a perturbation that is crafted against one model is also effective against another model, which may arise from a different model family or training process. To better protect ML systems against such adversarial attacks, several questions are raised: what are the sufﬁcient conditions for adversarial transferability? Is it possible to bound such transferability? Is there a way to reduce the transferability in order to improve the robustness of an ensemble ML model? To answer these questions, in this work we aim to ﬁrst theoretically analyze and outline checkable sufﬁcient conditions for transferability between models; then propose a practical algorithm to reduce transferability between base models within an ensemble to improve its robustness. Our theoretical analysis, as the ﬁrst work, shows that only the orthogonality between gradients of different models is not enough to ensure low adversarial transferability; in the meantime, the model smoothness is an important factor to impact the transferability together with gradient orthogonality. In particular, we provide a lower bound of adversarial transferability based on model gradient similarity, as well as an upper bound for low risk classiﬁers based on gradient orthogonality and model smoothness. We demonstrate that under the condition of gradient orthogonality, smoother classiﬁers will guarantee lower adversarial transferability. Finally, inspired by our theoretical analysis, we propose an effective T ransferability R educed S mooth-ensemble (TRS) training strategy to train a robust ensemble with low transferability by enforcing model smoothness and gradient orthogonality between base models. We conduct extensive experiments on TRS and compare with 6 state-of-the-art ensemble baselines against 8 whitebox attacks on different datasets, showing that the proposed TRS outperforms all baselines signiﬁcantly. We believe our analysis on adversarial transferability will not only provide further understanding on predictions of ML models, but also inspire future research towards developing robust ML models taking these adversarial transferability properties into account.

An Adaptive Model Ensemble Adversarial Attack for Boosting Adversarial Transferability

Boosting the Transferability of Ensemble Adversarial Attack via Stochastic Average Variance Descent

Rethinking Model Ensemble in Transfer-based Adversarial Attacks

Understanding Model Ensemble in Transferable Adversarial Attack

Stochastic Variance Reduced Ensemble Adversarial Attack for Boosting the Adversarial Transferability

An Approach to Improve Transferability of Adversarial Examples

EnsembleFool: A Method to Generate Adversarial Examples Based on Model Fusion Strategy

Ensemble Diversity Facilitates Adversarial Transferability

Enhance Stealthiness and Transferability of Adversarial Attacks with Class Activation Mapping Ensemble Attack

Enhancing Adversarial Examples Transferability via Ensemble Feature Manifolds

Improving the Adversarial Transferability with Relational Graphs Ensemble Adversarial Attack

Boosting Adversarial Transferability with Spatial Adversarial Alignment

Enhancing Adversarial Transferability with Adversarial Weight Tuning

Improving the Transferability of Adversarial Examples with Diverse Gradients.

Generating Transferable Adversarial Examples from the Perspective of Ensemble and Distribution

Adaptive Multi-scale Degradation-Based Attack for Boosting the Adversarial Transferability

Boosting Adversarial Attack Transferability Via Random Block Shuffle

Boosting the Transferability of Video Adversarial Examples Via Temporal Translation.

Enhancing Adversarial Attacks: The Similar Target Method

TRS: Transferability Reduced Ensemble via Encouraging Gradient Diversity and Model Smoothness

Model scheduling and sample selection for ensemble adversarial example attacks