Staged encoder training for cross-camera person re-identification

Zhi Xu,Jiawei Yang,Yuxuan Liu,Longyang Zhao,Jiajia Liu
DOI: https://doi.org/10.1007/s11760-023-02909-0
IF: 1.583
2024-02-18
Signal Image and Video Processing
Abstract:As a cross-camera retrieval problem, person re-identification (ReID) suffers from image style variations caused by camera parameters, lighting and other reasons, which will seriously affect the model recognition accuracy. To address this problem, this paper proposes a two-stage contrastive learning method to gradually reduce the impact of camera variations. In the first stage, we train an encoder for each camera using only images from the respective camera. This ensures that each encoder has better recognition performance on images from its respective camera while being unaffected by camera variations. In the second stage, we encode the same image using all trained encoders to generate a new combination code that is robust against camera variations. We also use Cross-Camera Encouragement (Lin et al., in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020) distance that complements the advantages of combined encoding to further mitigate the impact of camera variations. Our method achieves high accuracy on several commonly used person ReID datasets, e.g., on the Market-1501, achieves 90.8% rank-1 accuracy and 85.2% mAP, outperforming the recent unsupervised works by 12+% in terms of mAP. Code is available at https://github.com/yjwyuanwu/SET.
engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?