Abstract:Skeletal sequences, as well-structured representations of human behaviors, play a vital role in Human Activity Recognition (HAR). The transferability of adversarial skeletal sequences enables attacks in real-world HAR scenarios, such as autonomous driving, intelligent surveillance, and human-computer interactions. However, most existing skeleton-based HAR (S-HAR) attacks are primarily designed for white-box scenarios and exhibit weak adversarial transferability. Therefore, they cannot be considered true transfer-based S-HAR attacks. More importantly, the reason for this failure remains unclear. In this paper, we study this phenomenon through the lens of loss surface, and find that its sharpness contributes to the weak transferability in S-HAR. Inspired by this observation, we assume and empirically validate that smoothening the rugged loss landscape could potentially improve adversarial transferability in S-HAR. To this end, we propose the first \textbf{T}ransfer-based \textbf{A}ttack on \textbf{S}keletal \textbf{A}ction \textbf{R}ecognition, TASAR. TASAR explores the smoothed model posterior without requiring surrogate re-training, which is achieved by a new post-train Dual Bayesian optimization strategy. Furthermore, unlike previous transfer-based attacks that treat each frame independently and overlook temporal coherence within sequences, TASAR incorporates motion dynamics into the Bayesian attack gradient, effectively disrupting the spatial-temporal coherence of S-HARs. To exhaustively evaluate the effectiveness of existing methods and our method, we build the first large-scale robust S-HAR benchmark, comprising 7 S-HAR models, 10 attack methods, 3 S-HAR datasets and 2 defense methods. Extensive results demonstrate the superiority of TASAR. Our benchmark enables easy comparisons for future studies, with the code available in the supplementary material.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the low transferability of adversarial attacks in Skeletal Action Recognition (S - HAR) in practical applications. Specifically, the existing skeletal action recognition attack methods perform well in the white - box environment, but have poor transferability in the black - box environment, which means that the adversarial examples generated by these attack methods cannot be effectively transferred to unseen target models. In addition, the reasons for this phenomenon are still unclear. To study this phenomenon, the authors explore the factors affecting adversarial transferability by analyzing the smoothness of the loss surface. They find that the smoother the loss surface is, the better the transferability of adversarial examples. Based on this observation, the authors propose a new transferability attack method - TASAR (Transfer - based Attack on Skeletal Action Recognition), which smooths the loss surface through the Post - train Dual Bayesian Optimization strategy to improve the transferability of adversarial examples. ### Main contributions: 1. **Systematically study the reasons for the low adversarial transferability in S - HAR**: By analyzing the smoothness of the loss surface, it is revealed that the sharpness of the loss surface is the main reason for the low transferability. 2. **Propose TASAR**: A new post - training dual - Bayesian motion attack method that improves the transferability of adversarial examples by smoothing the loss surface and takes into account spatio - temporal coherence. 3. **Construct the first large - scale S - HAR robustness evaluation benchmark**: RobustBenchHAR, which contains 7 S - HAR models, 10 attack methods, 3 data sets and 2 defense methods, providing a convenient comparison tool for future research. ### Method overview: - **Post - training Bayesian perspective**: By attaching a small multi - layer perceptron (MLP) layer after the pre - trained model and using Monte Carlo sampling to optimize this attached Bayesian model, the loss surface is smoothed without retraining. - **Post - training dual - Bayesian optimization**: Further introduce Gaussian noise to smooth the weights of the attached network to improve the transferability of adversarial examples. - **Temporal motion gradient**: Integrate motion dynamics information in the Bayesian attack gradient to disrupt the spatio - temporal coherence of the S - HAR model, thereby improving the generalization ability of the attack. ### Experimental results: - **Evaluation on multiple data sets and models**: The experimental results show that TASAR exhibits superior performance on a variety of normally trained models, ensemble models and defense models, significantly outperforming the existing S - HAR attack methods and transferability attack methods. In conclusion, through in - depth analysis and innovative methods, this paper solves the problem of low transferability of adversarial attacks in S - HAR and provides important references and tools for future related research.

TASAR: Transfer-based Attack on Skeletal Action Recognition

Boosting Adversarial Transferability for Skeleton-based Action Recognition via Exploring the Model Posterior Space

BASAR:Black-box Attack on Skeletal Action Recognition

SMART: Skeletal Motion Action Recognition aTtack

Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack

Defending Black-Box Skeleton-Based Human Activity Classifiers

Object-Augmented Skeleton-Based Action Recognition

Understanding the Robustness of Skeleton-based Action Recognition under Adversarial Attack

Adversarial Attack on Skeleton-based Human Action Recognition

Towards Physical World Backdoor Attacks against Skeleton Action Recognition

Pedestrian Attribute Recognition Via Spatio-temporal Relationship Learning for Visual Surveillance

Towards Understanding the Adversarial Vulnerability of Skeleton-based Action Recognition

A sparse attack method on skeleton-based human action recognition for intelligent metaverse application

CaSAR: Contact-aware Skeletal Action Recognition

FSAR: Federated Skeleton-based Action Recognition with Adaptive Topology Structure and Knowledge Distillation

ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from Videos

Speckle-Variant Attack: Toward Transferable Adversarial Attack to SAR Target Recognition

Human Action Recognition (HAR) Using Skeleton-based Spatial Temporal Relative Transformer Network: ST-RTR

STST: Spatial-Temporal Specialized Transformer for Skeleton-based Action Recognition

Pose-Guided Robust Action Recognition for Outdoor Internet of Things

SkeleTR: Towrads Skeleton-based Action Recognition in the Wild