Bi-Directional Motion Attention with Contrastive Learning for few-shot Action Recognition.

Hanyu Guo,Wanchuan Yu,Yan Yan,Hanzi Wang
DOI: https://doi.org/10.1109/ICASSP48485.2024.10447209
2024-01-01
Abstract:In recent years, many few-shot action recognition methods have achieved competitive performance by adopting metric-based techniques. However, they suffer from two limitations: (1) Spatio-temporal relationship is modeled independently, overlooking the spatio-temporal correspondence between target objects across video frames. (2) Inter-class similarities are not well exploited in the task. As a result, their performance is significantly constrained by the presence of similar segments among different classes. In this paper, a novel BiMACL method for few-shot action recognition is presented, consisting of a Temporal Difference Spatial Attention Module (TDSAM) that uses motion attention to effectively capture the spatio-temporal correspondence between video frames, and a Contrastive Temporal-Relational CrossTransformers (CTRX) module to alleviate the adverse effects of similar subsequences of frames among distinct classes. Extensive experimental results demonstrate the superiority of our method over most methods for few-shot action recognition. Code is available at https://github.com/YWCandGHY/BiMACL.
What problem does this paper attempt to address?