Cross-Modality Transformer With Modality Mining for Visible-Infrared Person Re-Identification
Tengfei Liang,Yi Jin,Wu Liu,Yidong Li
DOI: https://doi.org/10.1109/tmm.2023.3237155
IF: 7.3
2023-01-01
IEEE Transactions on Multimedia
Abstract:The visible-infrared person re-identification (VI-ReID) is a challenging ReID task, which aims to retrieve and match the same identity's images between the heterogeneous visible and infrared modalities. Thus, the core of this task is to bridge the huge gap between these two modalities. The existing methods mainly face the problem of insufficient perception of modality information, and can not learn good discriminative modality-invariant embeddings for identities, which limits their performance. To solve these problems, we propose a new cross-modality transformer-based method (CMTR) for this visible-infrared person re-identification task, which can explicitly mine the information of each modality and generate better discriminative features based on it. Specifically, to capture inherent characteristics of modalities, we design the novel modality embeddings, which are fused with token embeddings to encode modality information directly. Moreover, to enhance representation of modality embeddings and adjust the distribution of embeddings, we further propose a modality-aware enhancement loss based on the learned modality information, which contains two components to reduce intra-class distance and enlarging inter-class distance simultaneously. To our knowledge, this is the first exploration of applying pure transformer network to the cross-modality re-identification task. We implement extensive experiments on the public SYSU-MM01 and RegDB datasets, and compared with previous methods, our method achieves good performance with more compact and powerful embeddings for the cross-modality retrieval.
computer science, information systems,telecommunications, software engineering