Abstract:In recent years, the task of person re-identification (ReID) has placed a critical demand on accurately describing image features. Attention mechanisms, particularly Transformer-like self-attention (TLSA), have gained favor among researchers due to their outstanding feature descriptive performance. However, due to their intricate structures, TLSA models typically require more computational resources. Simultaneously, contrastive learning has significantly enhanced the performance of unsupervised person re-identification. Nevertheless, contrastive learning originates from deep exploration of relationships among multiple samples, making batch size a crucial factor influencing deep learning methods based on the contrastive learning paradigm. Therefore, under the constraint of limited computational resources, traditional TLSA models often struggle to effectively adapt to unsupervised person ReID methods based on the contrastive learning paradigm. In response to the aforementioned issues, we propose a novel and lightweight Multi-Level Attention (MLA) method in this paper, which effectively mitigates the computational resource conflicts of the TLSA model during training under the contrastive learning paradigm. MLA comprises a lightweight multi-head attention module, complemented by a spatial feature weighting module, and an inter-feature cross-attention module to assist it. By fully leveraging the complementary strengths of these attention mechanisms, our approach achieves significant performance improvements in the ReID task. We evaluated the proposed approach on three large-scale real person ReID datasets, namely Market-1501, DukeMTMC-reID, MSMT17, and the virtual person ReID dataset, PersonX. The experimental results demonstrate that our method outperforms state-of-the-art approaches without relying on supplemental pre-training procedures or additional training data.

Multi-level self attention for unsupervised learning person re-identification

Deep Siamese Network with Multi-level Similarity Perception for Person Re-identification

Multi-Level Attention for Unsupervised Person Re-Identification

Learning transformer-based attention region with multiple scales for occluded person re-identification

Research on person re-identification based on multi-level attention model

Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

Adaptive multi-task learning for cross domain and modal person re-identification

MSTN: A Multi-granular Spatial–Temporal Network for video-based person re-identification

Information complementary attention-based multidimension feature learning for person re-identification

Unsupervised Person Re-Identification with Attention-Guided Fine-Grained Features and Symmetric Contrast Learning

A Multi-Scale Spatial-Temporal Attention Model for Person Re-Identification in Videos

Self-Critical Attention Learning for Person Re-Identification

Domain Adaptive Attention Learning for Unsupervised Person Re-Identification

Leader-Based Multi-Scale Attention Deep Architecture for Person Re-Identification

Mask-guided contrastive attention and two-stream metric co-learning for person Re-identification

Multi-scale local-global architecture for person re-identification

Attention: A Big Surprise for Cross-Domain Person Re-Identification

Exploring Stronger Transformer Representation Learning for Occluded Person Re-Identificatio

Exploring Stronger Transformer Representation Learning for Occluded Person Re-Identification

Person Re-identification via Attention Pyramid

Transformer-based Contrastive Learning for Unsupervised Person Re-Identification