SCaTNet: A Novel Self-supervised Contrastive Framework with Spatial-Channel Attention and Temporal Transformer for Few-Shot Action Recognition.

Zanxi Ruan,Yingmei Wei,Yanming Guo,Yuxiang Xie,Yifei Yuan
DOI: https://doi.org/10.1145/3639631.3639654
2023-01-01
Abstract:We introduce SCaTNet, an innovative method for few-shot action recognition tasks that leverages the synergies of contrastive learning and advanced attention mechanisms. Distinct from previous few-shot methods, SCaTNet comprehensively explores the potential value of the sample data at both high and low dimensional levels. SCaTNet integrates the Quadruplet Attention mechanism (QA) with a Multimodal Temporal Contrastive Learning (MTCL) strategy, significantly enhancing video recognition and interpretation of action features. Our vast experiments on the SSv2-Small dataset show that SCaTNet’s superior performance is competitive with existing classical state-of-the-art methods, highlighting its effectiveness and practical utility in few-shot action recognition.
What problem does this paper attempt to address?