Multifaceted-features Enhancement-Relevant Gait Recognition Method
Hou Saihui,Fu Yang,Li Aoqi,Liu Xu,Cao Chunshui,Huang Yongzhen
DOI: https://doi.org/10.11834/jig.220641
2023-01-01
Journal of Image and Graphics
Abstract:Objective Gait recognition can be focused on the identity labels of pedestrians-relevant recognition according to its walking style. Since it can be manipulated without coordination-derived constraints on a long distance scale, more applications potentials are illustrated for such domains like crime prevention, forensic identification, and public security. However, the process of gait recognition is challenged for many factors like camera views, carrying conditions, and different clothes. Current gait recognition tasks can be divided into two categories: model-based and appearance-based. Specifically, the model-based methods can be used to extract the human body structures for gait analysis. Conventional deep learning and graph convolutional network(GCN) based pose estimation is taken to extract gait features from the pose sequences in terms of hand-crafted features to model the walking process in common. First, model-based methods are robust to carrying and clothing theoretically, which is often challenged for human pose-precise low-resolution problems.Second, the appearance-based methods are oriented to learn gait features in terms of human body structures-potential modeling. The silhouettes are mostly taken as the input, and these methods can be divided into three sub-categories further: template-based, sequence-based, and set-based. Specifically, template-based methods can be used to fuse the silhouettes of a gait circle into a template but the temporal information is sacrificed inevitably. The sequence-based methods can yield the silhouettes of a gait sequence as a video for spatio-temporal features extraction. And, the set-based methods can use the silhouettes of a gait sequence as an insequential set and the permutation invariant is added to the input order. Furthermore, multiple data for gait recognition are categorized into the appearance-based methods, including RGB frames, gray images, and optical flow. Compared to these data modalities and the pose sequences in the model-based methods, the silhouettes are easy to use, which are more suitable for the low-resolution scales. To be noticed, recent silhouettes-based methods for gait recognition can learn multi-part features through slicing the output of the backbone horizontally. However, multi features are extracted solely and the feature-interacted is lacked, which is likely to hinder the recognition accuracy. To resolve this prolbem, we design a new module to enhance the multifaceted feature learning for gait recognition.Method Silhouettebased gait recognition model consists of two parts: backbone-based, and multi-component feature learning. First, we design the backbone in term of the network structures in GaitSet and GaitPart, which can be as two popular methods for silhouette-based gait recognition. For the backbone-relevant, the features are first extracted for each silhouette(regular 2D convolution and max pooling in relevance to spatial dimension), and a set pooling is taken to aggregate the silhouette-level features in a non-squential set(implemented by max pooling along the temporal dimension). Second, we design a new module for multiple-features learning and try to learn more robust and discriminative features for each motion. The independent-shared mechanism is introduced to learn motion-specific features, which is implemented by regional pooling and fully connected layers are sepearated. In particular, the interaction can be strengthened across various motions in terms of the coordinated mechanism, which consists of feature normalization and feature remapping. Feature normalization is parameter-free for weight balancing. And, feature remapping is implemented by a fully connected layer or element-wise multiple implecations.Result The experiments are carried out on Institute of Automation, Chinese Academy of Sciences(CASIA-B) and OUMVLP, and GaitSet GaitPart are as the baselines. The CASIA-B consists of 124 samples and collects the sequences of regular walking, such as walking with bags, and walking in different clothes for each object. The OUMVLP consists of 10 307 samples, which can collect the sequences of regular walking for each sample. Each sequence for CASIA-B and OUMVLP is recorded by 11 cameras and 14 cameras. GaitSet and GaitPart are commonly-used silhouettes methods as input for gait recognition. To learn the multifaceted features for gait recognition, GaitSet is regarded as an unseqential set and the features are sliced horizontally. To learn more specific features, GaitPart is focused on supressing the receptive field of convolutional layers and modeling the micro-motion features. To demonstrate its consistency, the identical-view cases-excluded rank-1 accuracy is taken as the main metric for performance comparison. For example, each of rank-1 accuracy for walking with bags on CASIA-B can be optimized by 1. 62% and 1. 17% based on GaitSet and GaitPart.Conclusion A new module is facilitated to enhance the multi-components learning for gait recognition, which is costeffective and the accuracy is improved in consistency. To be summarized, 1) the lack of interaction to hinder the recognition accuracy is concerned. 2) The independent-shared mechanism is introducted into multifaceted feature learning for gait recognition, and a plug-and-play module is designed to learn more discriminative features for muliple motions. 3) This GaitSet and GaitPart-based method has its potentials for consistent optimization over the baselines under all walking circumstances.