Learning Convolutional Multi-Level Transformers for Image-Based Person Re-Identification

Peilei Yan,Xuehu Liu,Pingping Zhang,Huchuan Lu
DOI: https://doi.org/10.1007/s44267-023-00025-8
2023-01-01
Abstract:As a vital vision task, person re-identification (Re-ID) aims to retrieve the same person under non-overlapping cameras. It is a very challenging task due to the presence of complex backgrounds, diverse illuminations and different perspectives. In this work, we integrate the advantages of convolutional neural networks (CNNs) and transformers, and propose a novel learning framework named convolutional multi-level transformer (CMT) for image-based person Re-ID. More specifically, we first propose a scale-aware feature enhancement (SFE) module to extract multi-scale local features from a pre-trained CNN backbone. Then, we introduce a part-aware transformer encoder (PTE) to further mine discriminative local information guided by global semantics. Finally, a deeply-supervised learning (DSL) technique is adopted to optimize the proposed CMT and improve its training efficiency. Extensive experiments on four large-scale Re-ID benchmarks demonstrate that our method performs favorably against several state-of-the-art methods.
What problem does this paper attempt to address?