Multi-level self attention for unsupervised learning person re-identification

El Saddik, Abdulmotaleb
DOI: https://doi.org/10.1007/s11042-024-19007-z
IF: 2.577
2024-04-25
Multimedia Tools and Applications
Abstract:In recent years, the task of person re-identification (ReID) has placed a critical demand on accurately describing image features. Attention mechanisms, particularly Transformer-like self-attention (TLSA), have gained favor among researchers due to their outstanding feature descriptive performance. However, due to their intricate structures, TLSA models typically require more computational resources. Simultaneously, contrastive learning has significantly enhanced the performance of unsupervised person re-identification. Nevertheless, contrastive learning originates from deep exploration of relationships among multiple samples, making batch size a crucial factor influencing deep learning methods based on the contrastive learning paradigm. Therefore, under the constraint of limited computational resources, traditional TLSA models often struggle to effectively adapt to unsupervised person ReID methods based on the contrastive learning paradigm. In response to the aforementioned issues, we propose a novel and lightweight Multi-Level Attention (MLA) method in this paper, which effectively mitigates the computational resource conflicts of the TLSA model during training under the contrastive learning paradigm. MLA comprises a lightweight multi-head attention module, complemented by a spatial feature weighting module, and an inter-feature cross-attention module to assist it. By fully leveraging the complementary strengths of these attention mechanisms, our approach achieves significant performance improvements in the ReID task. We evaluated the proposed approach on three large-scale real person ReID datasets, namely Market-1501, DukeMTMC-reID, MSMT17, and the virtual person ReID dataset, PersonX. The experimental results demonstrate that our method outperforms state-of-the-art approaches without relying on supplemental pre-training procedures or additional training data.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?