Abstract:Existing methods combining skeleton and silhouette representations demonstrate explicit effectiveness for gait recognition. However, current related methods simply combine the video-level representations of model-based skeleton data and gait silhouettes for retrieval. Therefore, diverse skeleton information is not fully exploited in existing related works: Firstly, the position and movement of bones are not clear from individual silhouettes. This indicates that the frame-level interaction between features of skeletons and silhouettes is critical, which is ignored by previous methods. Secondly, diverse part-level skeleton-guided gait features are not fully captured in existing related approaches. To solve the above issues, we present a novel framework with multi-level skeleton-guided refinement, including frame-level, part-level, and video-level skeleton-guided refinement, for comprehensive skeleton-aided gait representation learning. First, two modules are proposed for frame-level skeleton-guided refinement. Specifically, Visual Skeleton Enhanced Backbone (VSEB) is proposed to visually highlight the global and part-level skeleton regions for the feature of each silhouette frame. Moreover, Cross-Visual-Model Frame-level Interaction (CVMFI) is proposed to further transfer the model-based skeleton information to features of the visual modalities. Secondly, part-level visual and model-based skeleton features are utilized to refine the final gait representation. Concretely, in VSEB, Part Skeleton Enhance Network (PSEN) is proposed to visually enhance the position and movement of part-level skeletons. In addition, Semantic Part Pooling (SPP) is proposed for capturing the model-based skeleton features of different semantic parts. Finally, as the video-level skeleton-guided refinement, multimodal video-level features are combined to boost the final recognition performance. Extensive experimental results on prevailing datasets demonstrate that our approach outperforms most existing methods, including the skeleton-aided multi-modal methods. With the multi-level refinement guided by the skeleton modalities, the framework is expected to provide a deeper understanding of skeleton-aided gait recognition.

Spatiotemporal smoothing aggregation enhanced multi-scale residual deep graph convolutional networks for skeleton-based gait recognition

Skeleton-based abnormal gait recognition with spatio-temporal attention enhanced gait-structural graph convolutional networks

GaitMGL: Multi-Scale Temporal Dimension and Global–Local Feature Fusion for Gait Recognition

Gaitts: indoor gait recognition with multi-scale temporal-spatial information aggregation

Spatial and temporal attention embedded spatial temporal graph convolutional networks for skeleton based gait recognition with multiple IMUs

Condition-Adaptive Graph Convolution Learning for Skeleton-Based Gait Recognition

GaitGS: Temporal Feature Learning in Granularity and Span Dimension for Gait Recognition

Spatiotemporal multi-scale bilateral motion network for gait recognition

GaitCTCG: cross-view gait recognition via cascaded residual temporal shift and comprehensive multi-granularity learning

Learning Rich Features for Gait Recognition by Integrating Skeletons and Silhouettes

GMSN: An efficient multi-scale feature extraction network for gait recognition

SkeletonGait: Gait Recognition Using Skeleton Maps

Multi-Scale Adaptive Aggregate Graph Convolutional Network for Skeleton-Based Action Recognition

Human Gait Recognition Based on Frame-by-Frame Gait Energy Images and Convolutional Long Short-Term Memory

GaitMA: Pose-guided Multi-modal Feature Fusion for Gait Recognition

JointsGait:A model-based Gait Recognition Method based on Gait Graph Convolutional Networks and Joints Relationship Pyramid Mapping

Cross-Spatiotemporal Graph Convolution Networks for Skeleton-Based Parkinsonian Gait MDS-UPDRS Score Estimation

MCDGait: multimodal co-learning distillation network with spatial-temporal graph reasoning for gait recognition in the wild

Gait Recognition With Multi-Level Skeleton-Guided Refinement

Multi-scale Context-aware Network with Transformer for Gait Recognition

Lightweight Multi-Scale Spatiotemporal Graph Convolutional Network for Skeleton-Based Action Recognition