Vision transformer with multiple granularities for person re-identification

Bingcai Chen,Fansheng Zhang,Xin Yang,Qian Ning,Victor C. M. Leung
DOI: https://doi.org/10.1007/s00521-023-08913-2
2023-09-02
Neural Computing and Applications
Abstract:Extracting discriminative features using vision transformer is a popular research direction for person re-identification. However, feature extraction of existing vision transformer is relatively simple. To solve this problem, we design a vision transformer with multiple granularities for person re-identification. We propose three stages of multi-granularity feature extraction, including stage1 (shuffle), stage2 (split and concat) and stage3 (refine and enhance), which help the model to extract local and global fine features in strips. The highlight block is added in stage3 to enhance features by mathematical variation, which will highlight and strengthen the core features that are beneficial to classification. In addition, the loss function is optimized by introducing Circle Loss on top of ID Loss and Triplet Loss, and using the weighted sum of the three as the final loss function. Finally, we evaluated the performance of our method on three standard benchmark datasets: Market-1501, DukeMTMC-reID and MSMT17, and experimental results show that our method is superior to the state-of-the-art methods.
computer science, artificial intelligence
What problem does this paper attempt to address?