SVASTIN: Sparse Video Adversarial Attack via Spatio-Temporal Invertible Neural Networks

Yi Pan,Jun-Jie Huang,Zihan Chen,Wentao Zhao,Ziyue Wang

2024-06-04

Abstract:Robust and imperceptible adversarial video attack is challenging due to the spatial and temporal characteristics of videos. The existing video adversarial attack methods mainly take a gradient-based approach and generate adversarial videos with noticeable perturbations. In this paper, we propose a novel Sparse Adversarial Video Attack via Spatio-Temporal Invertible Neural Networks (SVASTIN) to generate adversarial videos through spatio-temporal feature space information exchanging. It consists of a Guided Target Video Learning (GTVL) module to balance the perturbation budget and optimization speed and a Spatio-Temporal Invertible Neural Network (STIN) module to perform spatio-temporal feature space information exchanging between a source video and the target feature tensor learned by GTVL module. Extensive experiments on UCF-101 and Kinetics-400 demonstrate that our proposed SVASTIN can generate adversarial examples with higher imperceptibility than the state-of-the-art methods with the higher fooling rate. Code is available at \href{<a class="link-external link-https" href="https://github.com/Brittany-Chen/SVASTIN" rel="external noopener nofollow">this https URL</a>}{<a class="link-external link-https" href="https://github.com/Brittany-Chen/SVASTIN" rel="external noopener nofollow">this https URL</a>}.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to generate adversarial video attacks that are robust and imperceptible to deep neural networks (DNN). Specifically, existing video adversarial attack methods mainly adopt gradient - based methods, and the generated adversarial videos have obvious perturbations, making it difficult to simultaneously ensure the success rate of the attack and visual imperceptibility. ### Main problems: 1. **Spatial and temporal characteristics**: Videos contain spatial and temporal dimensions, which pose challenges to the generation of robust and effective adversarial videos. 2. **Limitations of existing methods**: When generating adversarial videos, existing video adversarial attack methods often produce obviously visible perturbations, which affect the concealment and success rate of the attack. ### Solutions: To solve the above problems, the author proposes a new method - **Sparse Video Adversarial Attack via Spatio - Temporal Invertible Neural Networks (SV ASTIN)**. This method is implemented through the following two modules: 1. **Guided Target Video Learning (GTVL) module**: - It is used to balance the perturbation budget and optimize the speed. - It learns a target feature tensor to guide the generation of adversarial videos. 2. **Spatio - Temporal Invertible Neural Network (STIN) module**: - It performs spatio - temporal feature space information exchange, and uses 3D discrete wavelet transform (3D - DWT) and spatio - temporal affine coupling blocks (ST - ACB) to capture and process spatio - temporal information. - It constrains perturbations to be added only to the high - frequency coefficients of 3D - DWT, thereby improving the imperceptibility of adversarial videos. ### Experimental results: The experimental results show that the adversarial videos generated by the SV ASTIN method on the Kinetics - 400 and UCF - 101 datasets not only have a higher fooling rate, but are also more visually imperceptible, performing better than existing methods. ### Summary: By introducing the STIN and GTVL modules, this paper solves the difficult problem of generating robust and imperceptible adversarial videos, significantly improving the quality and attack effect of adversarial videos.

SVASTIN: Sparse Video Adversarial Attack via Spatio-Temporal Invertible Neural Networks

Enhancing robustness in video recognition models: Sparse adversarial attacks and beyond

Sparse Adversarial Perturbations for Videos

Adversarial Attacks on Video Quality Assessment Models

Imperceptible Adversarial Attack via Invertible Neural Networks

Stealthy and Robust Glitch Injection Attack on Deep Learning Accelerator for Target with Variational Viewpoint.

Imperceptible Adversarial Attack with Multi-granular Spatio-temporal Attention for Video Action Recognition

Boosting the Transferability of Video Adversarial Examples Via Temporal Translation.

Query-Efficient Video Adversarial Attack with Stylized Logo

SPARK: Spatial-Aware Online Incremental Attack Against Visual Tracking

SS-CMT: a label independent cross-modal transferable adversarial video attack with sparse strategy

DVS-Attacks: Adversarial Attacks on Dynamic Vision Sensors for Spiking Neural Networks

SSTA: Salient Spatially Transformed Attack

Invisible Adversarial Attack Against Deep Neural Networks: an Adaptive Penalization Approach

Adversarial Attacks Hidden in Plain Sight

Adversarial Image Generation by Spatial Transformation in Perceptual Colorspaces

Coreset Learning Based Sparse Black-box Adversarial Attack For Video Recognition

Towards Decision-based Sparse Attacks on Video Recognition

Adversarial Attacks against Deep Saliency Models

Efficient Decision-based Black-box Patch Attacks on Video Recognition

Temporal-Distributed Backdoor Attack Against Video Based Action Recognition