Exploring cross-video matching for few-shot video classification via dual-hierarchy graph neural network learning
Fuqin Deng,Jiaming Zhong,Nannan Li,Lanhui Fu,Dong Wang,Tin Lun Lam
DOI: https://doi.org/10.1016/j.imavis.2023.104822
IF: 3.86
2023-09-21
Image and Vision Computing
Abstract:Few-shot video classification methods aim to recognize a new class with only a few training examples. Distinct from previous few-shot methods, we explicitly consider the relations in cross-video domains and take full advantage of the cross-video frame matching in a hierarchy learning fashion. In this paper, we propose a Dual-Hierarchy Graph Neural Network to realize comprehensive cross-video frame matching and video relation modeling. In the first hierarchy of the graph neural network, we build a Cross-video Frame Matching Graph to extract robust frame-level features via accumulating information across frames sampled from both query and support videos. Then, frame representations are accumulated to obtain the video-level features. In the second hierarchy of the graph neural network, we construct a Video Relation Graph by taking the video-level features as nodes, which can adaptively learn positive relations between query and support videos. We get the predicted label of the query video through the matching learning of edges connecting video nodes. We evaluate the model on three benchmarks: HMDB51, Kinetics, and UCF101. Extensive experiments on benchmark datasets demonstrate that our model significantly improves few-shot video classification across a wide range of competitive baselines and showcases the strong generalization of our framework. The source code and models will be publicly available at https://github.com/JiaMingZhong2621/DHGNN .
computer science, artificial intelligence, theory & methods,engineering, electrical & electronic, software engineering,optics