Abstract:Addressing the vulnerability of deep neural networks (DNNs) has attracted significant attention in recent years. While recent studies on adversarial attack and defense mainly reside in a single image, few efforts have been made to perform temporal attacks against video sequences. As the temporal consistency between frames is not considered, existing adversarial attack approaches designed for static images do not perform well for deep object tracking. In this work, we generate adversarial examples on top of video sequences to improve the tracking robustness against adversarial attacks under white-box and black-box settings. To this end, we consider motion signals when generating lightweight perturbations over the estimated tracking results frame-by-frame. For the white-box attack, we generate temporal perturbations via known trackers to degrade significantly the tracking performance. We transfer the generated perturbations into unknown targeted trackers for the black-box attack to achieve transferring attacks. Furthermore, we train universal adversarial perturbations and directly add them into all frames of videos, improving the attack effectiveness with minor computational costs. On the other hand, we sequentially learn to estimate and remove the perturbations from input sequences to restore the tracking performance. We apply the proposed adversarial attack and defense approaches to state-of-the-art tracking algorithms. Extensive evaluations on large-scale benchmark datasets, including OTB, VOT, UAV123, and LaSOT, demonstrate that our attack method degrades the tracking performance significantly with favorable transferability to other backbones and trackers. Notably, the proposed defense method restores the original tracking performance to some extent and achieves additional performance gains when not under adversarial attacks.

Adaptive Temporal Grouping for Black-box Adversarial Attacks on Videos

Heuristic Black-box Adversarial Attacks on Video Recognition Models

Efficient Robustness Assessment Via Adversarial Spatial-Temporal Focus on Videos

Reinforcement Learning Based Sparse Black-box Adversarial Attack on Video Recognition Models

Adversarial Attacks on Black Box Video Classifiers: Leveraging the Power of Geometric Transformations

Boosting the Transferability of Video Adversarial Examples Via Temporal Translation.

Sparse Black-Box Video Attack with Reinforcement Learning

Robust Deep Object Tracking against Adversarial Attacks

Imperceptible Adversarial Attack with Multi-granular Spatio-temporal Attention for Video Action Recognition

Imperceptible Adversarial Attack with Multigranular Spatiotemporal Attention for Video Action Recognition

Black-box Adversarial Attacks on Video Recognition Models

Sparse Adversarial Video Attacks Via Superpixel-Based Jacobian Computation

Coreset Learning Based Sparse Black-box Adversarial Attack For Video Recognition

Efficient Sparse Attacks on Videos Using Reinforcement Learning.

Appending Adversarial Frames for Universal Video Attack

Efficient Decision-based Black-box Patch Attacks on Video Recognition

Sparse Adversarial Perturbations for Videos

Improving Query Efficiency of Black-box Adversarial Attack

Cube-Evo: A Query-Efficient Black-Box Attack on Video Classification System

Temporal Shuffling for Defending Deep Action Recognition Models against Adversarial Attacks

Robust Tracking against Adversarial Attacks