Imperceptible Adversarial Attack with Multigranular Spatiotemporal Attention for Video Action Recognition

Guoming Wu,Yangfan Xu,Jun Li,Zhiping Shi,Xianglong Liu
DOI: https://doi.org/10.1109/jiot.2023.3280737
IF: 10.6
2023-01-01
IEEE Internet of Things Journal
Abstract:In recent years, the application of video Internet of Things (IoT) in various cities and public places has brought unprecedented opportunities to the security field and achieved great success. However, the latest research shows that video recognition models are also vulnerable to adversarial examples, but adversarial examples based on physical attacks are easily detected by humans, making it difficult to pass human review. To address this problem, in this article, we propose to introduce a novel multigranular spatiotemporal attention network (MSANet), which can attack the video action recognition models imperceptibly. Specifically, to exploit video motion information more effectively and to reduce the detectability of attack perturbations, we design a multiplexed spatiotemporal attention module to select and enhance spatial regions and temporal frames at coarse-grained and fine-grained levels, respectively, thus maintaining a certain degree of smoothness while reducing the perturbation size and avoiding attacking overfitting. In addition, our proposed MSANet achieves imperceptible perturbations to video sequences through alternate iterative optimization combined with the PGD attack mechanism. extended experimental results on two different models (e.g., TDN and TSM) and two widely used data sets [HMDB-51 (Kuehne et al., 2011) and UCF-101 (Soomro et al., 2012)], compared to the state-of-the-art model, demonstrate the effectiveness of our devised video action recognition attack approach.
What problem does this paper attempt to address?