Diffusion Patch Attack with Spatial-Temporal Cross-Evolution for Video Recognition

Jian Yang,Zhiyu Guan,Jun Li,Zhiping Shi,Xianglong Liu
DOI: https://doi.org/10.1109/tcsvt.2024.3452475
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Deep neural networks (DNNs) have demonstrated excellent performance across various domains. However, recent studies have shown that deep neural networks are vulnerable to adversarial examples, including DNN-based video action recognition models. While much of the existing research on adversarial attacks against video models focuses on perturbation-based attacks, there is limited research on patch-based black-box attacks. Existing patch-based attack algorithms suffer from the problem of a large search space of optimization algorithms and use patches with simple content, leading to suboptimal attack performance or requiring a large number of queries. To address these challenges, we propose the “Diffusion Patch Attack (DPA) with Spatial-Temporal Cross-Evolution (STCE) for Video Recognition”, a novel approach that integrates the excellent properties of the diffusion model into video black-box adversarial attacks for the first time. This integration significantly narrows the parameter search space while enhancing the adversarial content of patches. Moreover, we introduce the spatial-temporal cross-evolutionary algorithm to adapt to the narrowed search space. Specifically, we separate the spatial and temporal parameters and then employ an alternate evolutionary strategy for each parameter type. Extensive experiments conducted on three widely used video action recognition models (C3D, NL, and TPN) and two benchmark datasets (UCF-101 and HMDB-51) demonstrate the superior performance of our approach compared to other state-of-the-art black-box patch attack algorithms.
What problem does this paper attempt to address?