KDM: A knowledge-guided and data-driven method for few-shot video action recognition

Yanfei Qin,Baolin Liu
DOI: https://doi.org/10.1016/j.neucom.2022.09.011
IF: 6
2022-10-21
Neurocomputing
Abstract:Few Shot-Video Action Recognition (FS-VAR) has recently aroused great interest with the rise of meta-learning. The generalization ability of complex meta-learning is limited despite that it is effective and prevalent. In this work, we propose a knowledge-guided and data-driven method for FS-VAR, termed as KDM. FV-R(2 + 1) D is initiated in the paper as the feature extraction architecture, which adopts the self-attention mechanism of BERT to cope the time-series in the video and takes full advantage of the data in the training set to guide the construction of support set features. Meanwhile, a transductive inference is incorporated into the N-way K-shot task of FS-VAR, that is, the samples of query set are taken as data-driver, and the statistical information of unlabeled samples in the task is utilized to optimize a fusion loss. Our claim is supported by conducting extensive experiments on three datasets (Kinetics, Something-Something-v2 (SSv2) and HMDB51) that our proposal outperfoms the state-of-the-art FS-VAR methods (more than 10% average improvement on all settings). In more challenging and realistic FS-VAR scenario, three powerful benchmarks (more ways, more shots and domain shift) are presented, which can be used as benchmarks for future research.
computer science, artificial intelligence
What problem does this paper attempt to address?