DFSTrack: Dual-stream fusion Siamese network for human pose tracking in videos
Xiangyang Wang,Yuhui Tian,Fudi Geng,Rui Wang
DOI: https://doi.org/10.1016/j.imavis.2024.105117
IF: 3.86
2024-06-13
Image and Vision Computing
Abstract:Human pose tracking is a challenging task that involves estimating the human pose and tracking it across multiple frames in a video sequence. In recent years, deep learning-based methods have made significant progress in this field, achieving state-of-the-art performance. However, due to complex background and occlusion among people missed detection and incorrect association matching are still the challenging problems. To address these issues, we adopt a top-down framework to perform human pose tracking in the paper. We propose a human detection prediction recovery module (HDP module) to recover missed detection, and propose a dual-stream fusion Siamese network for human matching (DFSTrack). Specifically, we design a residual graph convolutional block (RGCN block) for spatial position encoding of human keypoints, and use spatial self-attention and temporal cross-attention to design a dual-stream spatial–temporal fusion transformer (DST Transformer). The graph convolutional block and transformer are cascaded to simultaneously obtain information on the spatial and temporal positions of human keypoints, allowing the Siamese network to solve the erroneous human matching. Experimental results on the PoseTrack17 dataset, PoseTrack18 dataset and PoseTrack21 dataset demonstrate that our proposed method outperforms state-of-the-art methods on human pose tracking tasks.
computer science, artificial intelligence, theory & methods,engineering, electrical & electronic, software engineering,optics