SS-CMT: a label independent cross-modal transferable adversarial video attack with sparse strategy

Shihui Zhang,Zhiguo Cui,Feiyu Li,Xueqiang Han,Zhigang Huang
DOI: https://doi.org/10.1007/s00530-024-01520-8
IF: 3.9
2024-10-21
Multimedia Systems
Abstract:Deep neural networks are vulnerable to adversarial examples which are generated by adding carefully crafted perturbations on benign examples. Some research works explore the transferability of adversarial examples between hetero-modal models from images to videos. However, these works add adversarial perturbations to each frame of the video without considering sparse attack. To bridge this gap, we propose a label independent cross-modal transferable adversarial video attack with sparse strategy called SS-CMT, which efficiently generates sparse adversarial video examples with low perturbations and high transferability. Specifically, we propose a sparse strategy to select sparse frames from benign video examples and propose a cross-modal transferable attack to add adversarial perturbations to sparse frames. Besides, our method does not use the labels of video examples in the process of generating sparse adversarial examples. We also explore attacking an ensemble of alternative models to boost the transferability of generated adversarial examples, called ENS-SS-CMT. Extensive experiments on Kinetics-400 and UCF-101 demonstrate the excellent performance of the proposed methods SS-CMT and ENS-SS-CMT in terms of high transferability, high efficiency, and low perturbations. Among them, the proposed method ENS-SS-CMT outperforms the state-of-the-art method on overall performance.
computer science, information systems, theory & methods
What problem does this paper attempt to address?