Text-Guided Graph Temporal Modeling for Few-Shot Video Classification
Fuqin Deng,Jiaming Zhong,Nannan Li,Lanhui Fu,Bingchun Jiang,Ningbo Yi,Feng Qi,He Xin,Tin Lun Lam
DOI: https://doi.org/10.1016/j.engappai.2024.109076
IF: 8
2024-01-01
Engineering Applications of Artificial Intelligence
Abstract:Large-scale pre-trained models and graph neural networks have recently demonstrated remarkable success in few-shot video classification tasks. However, they generally suffer from two key limitations: i) the temporal relations between adjacent frames tends to be ambiguous due to the lack of explicit temporal modeling. ii) the absence of multi-modal semantic knowledge in query videos results in inaccurate prototypes construction and an inability to achieve multi-modal temporal alignment metrics. To address these issues, we develop a Text-guided Graph Temporal Modeling (TgGTM) method that consists of two crucial components: a text-guided feature refinement module and a learnable Query text-token contrastive objective. Specifically, the former leverages the Temporal masking layer to guide the model in learning temporal relationships between adjacent frames. Additionally, it utilizes multi-modal information to refine video prototypes for comprehensive few-shot video classification. The latter addresses the feature discrepancy between multi-modal support features and single-modal query features by aligning a learnable Query text-token with corresponding base class text descriptions. Extensive experiments on four commonly used benchmarks demonstrate the effectiveness of our proposed method, which achieves mean accuracies of 54.4%, 80.3%, 91.9%, and 96.2% for 5-way 1-shot classification on SSV2-Small, HMDB51, Kinetics, and UCF101, respectively. These results are superior compared to existing state-of-the-art methods. A detailed ablation showcases the importance of learning temporal relationships between adjacent frames and obtaining Query text-token. The source code and models will be publicly available at https://github.com/JiaMingZhong2621/TgGTM.