Learning transferable targeted universal adversarial perturbations by sequential meta-learning

Juanjuan Weng,Zhiming Luo,Dazhen Lin,Shaozi Li
DOI: https://doi.org/10.1016/j.cose.2023.103584
IF: 5.105
2024-02-01
Computers & Security
Abstract:Recently, the transferability of adversarial perturbations in non-targeted scenarios has been extensively studied. However, changing the predictions of an unknown model to a pre-defined ‘targeted’ class still remains challenging. In this study, we aim to learn the targeted universal adversarial perturbations (UAPs) with higher transferability by the ensemble of multiple models. First, we observe the phenomenon that the logit of the target class will bias to a specific white-box model in existing ensemble-based attacks. To deal with the issue, we propose a normalized logit loss to narrow the margin of the targeted class's logits among different models. Besides, we introduce a novel sequential meta-learning optimization strategy to further increase transferability, consisting of the inner loop and the outer loop. In the inner loop, we sequentially learn task-specific targeted UAPs for each source model by jointly considering the perturbation from the previous model. In the outer loop, we optimize the task-agnostic targeted UAP by combining the targeted UAPs from the inner loop. Experimental results demonstrate the mutual benefits of the normalized logit loss and the sequential meta-learning optimization strategy for learning targeted adversarial perturbations, outperforming existing ensemble attacks in both white-box and black-box settings. The source code of this study is available at: Link.
computer science, information systems
What problem does this paper attempt to address?