Gaitcotr: Improved Spatial-Temporal Representation for Gait Recognition with a Hybrid Convolution-Transformer Framework

Jingqi Li,Yuzhen Zhang,Hongming Shan,Junping Zhang
DOI: https://doi.org/10.1109/ICASSP49357.2023.10096602
2023-01-01
Abstract:This work presents a novel hybrid convolution-transformer framework for gait recognition, termed GaitCoTr. The developed framework captures the appearance and short-term temporal features by convolution and extracts the long-term temporal features by transformer architecture, achieving a comprehensive spatial-temporal representation of gait. To unleash the potential of this hybrid framework for extracting richness and generalized temporal features, we propose a new variant of transformer tailored for gait, including temporally shifted tokenization, length-flexible position embedding, and inter-frame encoder. In addition, we introduce an auxiliary task—view label prediction—aiming to disentangle view from ID information. Extensive experimental results on two well-known gait benchmark datasets, CASIA-B and GREW, demonstrate the superior performance of the proposed Gait-CoTr.
What problem does this paper attempt to address?