Learning Fine-Grained Modality-Invariant Feature Via Wasserstein Distance for Visible-Thermal Person Re-Identification

Zehua Chai,Yongguo Ling,Shaozi Li
DOI: https://doi.org/10.1109/itme56794.2022.00133
2022-01-01
Abstract:Visible-Thermal Person Re-identification(VT-ReID) focuses on matching query pedestrian images with the gallery sets, in which images of query and gallery sets are captured from different modalities. Compared with traditional person re- identification(Re-ID), VT-ReID concerns the additional cross- modality disparity caused by different spectrum cameras, in addition to intra-class variations. Existing VT-ReID methods commonly design the loss that is particularly for the global feature representations, which have limited the discriminability and ignored the relationship between the semantic parts. In this paper, we propose a novel Cross-modality Wasserstein Triplet Loss(CM-WTL) that can emphasize the semantic part and alleviate the modality discrepancy. This loss is based on the Wasserstein distance metric, which addresses the optimal transport problem by linear programming. It measures how much effort it would take to move the part-level embeddings to align two distributions, such that the distance between local features will be measured by obtaining an optimal transport matrix. In this manner, the obtained optimal transport matrix would assign different weights to relevant and irrelevant parts, resulting the alignment of local feature distribution. Therefore, our proposed CM-WTL can effectively mitigate the cross-modality discrepancy and learn the more fine-grained discriminative information. Extensive experimental results and comparisons demonstrate that our proposed method can achieve comparable performance with recent VT-ReID models.
What problem does this paper attempt to address?