Cross-view Gait Recognition:A Review
Xu Wenzheng,Huang Tianhuan,Ben Xianye,Zeng Yi,Zhang Junping
DOI: https://doi.org/10.11834/jig.220458
2023-01-01
Journal of Image and Graphics
Abstract:Gait recognition is inter-related to pedestrians’ identity. Pedestrians’ gait recognition can be focused on at a distance and it cannot require special acquisition equipment, high image resolution, or explicit cooperation from the person in comparison with recognition methods relevant to the features of face, fingerprint, iris and other biometrics. Moreover, one’s gait is difficult to be hidden or disguised. Gait recognition has a wide range of applications in public surveillance, forensic collection, and daily attendance. In these practical applications, the performance of gait recognition is easily affected by covariates such as viewpoint variations, occlusions, and segmentation error, among which viewpoint variations are one of the main factors affecting the gait recognition performance. The intra-class differences of different viewpoints are often greater than the inter-class differences of the same viewpoint. Therefore, improving the robustness of cross-view gait recognition has become a hot topic. A review of existing cross-view gait recognition methods is critical analyzed. First, current situation is introduced in related to basic concepts, data acquisition methods, application scenarios, and its growing paths.Then, we review video-based cross-view gait recognition methods further. Cross-view gait databases are analyzed in the context of 1) data type, 2) sample size, 3) viewpoint number, 4) acquisition environment, 5) other related covariates, and 6) the characteristics of these databases in detail. Then, cross-view gait classification methods are presented in detail.Unlike most existing reviews that classify gait recognition methods by the basic steps such as data acquisition, feature representation, and classification, we focus on cross-view recognition problems. Specifically, four cross-view gait recognition methods are analyzed on the basis of feature representation and classification(i. e., 3D gait information construction, view transformation model(VTM), view-invariant feature extraction, and the deep learning-based methods). For 3D gait information methods, gait information is extracted from multi-view gait videos and it is used to construct 3D gait models. These methods have good robustness to large view changes, but they often require: complex configurations, expensive highresolution multi-camera systems, and frame synchronization. All of them limit their application to real surveillance scenarios. For VTM methods, singular value decomposition(SVD) and regression-derived view transformation models are introduced to local and global features. The discriminative analysis can be ignored although the VTM may minimize the error between the transformed gait features and the original gait features. For view-invariant feature extraction methods, 1) manual feature extraction, 2) discriminative subspace learning, and 3) metric learning are compared. Among the discriminative subspace learning methods, the canonical correlation analysis(CCA) based methods are highlighted. Despite the advantages of these methods, it is still challenged to sort robust view-invariant subspace or metric for features out. Deep learning based methods for cross-view recognition is mainly composed of convolution neural network(CNN), recurrent neural network(RNN), auto encoder(AE), generative adversarial network(GAN), 3D convolutional neural network(3D CNN), and graph convolutional network(GCN). To summary the potentials of multiple cross-view gait recognition methods, some representative state-of-the-art methods are compared and analyzed further on CASIA-B(CASIA gait database, dataset B), OU-ISIR LP(OU-ISIR gait database, large population dataset) and OU-MVLP(OU-ISIR gait database multiview large population dataset) databases. It is found that the methods using 3D CNN or multiple neural network architectures, which represent gait features with a sequence of silhouettes, achieve good performance. Additionally, deep neural network methods based on body model representation also show excellent performance under the condition with only view variations. Finally, future research directions are predicted for cross-view gait recognition, including 1) the establishment of large-scale gait databases containing complex covariates, 2) cross-database gait recognition, 3) self-supervised learning methods for gait features, 4) disentangled representation learning methods for gait features, 5) further developing modelbased gait representation methods, 6) exploring new methods for temporal feature extraction, 7) multimodal fusion gait recognition, and 8) improving the security of gait recognition systems.