Bottom-up color-independent alignment learning for text–image person re-identification

Guodong Du,Hanyue Zhu,Liyan Zhang
DOI: https://doi.org/10.1016/j.engappai.2024.109421
IF: 8
2024-10-14
Engineering Applications of Artificial Intelligence
Abstract:Text-to-image person re-identification (TIReID) refers to identifying images of a person of interest from a large-scale person image database based on natural language descriptions. Most of existing methods generally rely heavily on color information when matching cross-modal data, which is a kind of overfitting and can be termed as the color over-reliance problem. This problem would distract the model from other tiny but discriminative clues (e.g. clothes details, structural information, etc.), which are essential for both semantic alignment and fine-grained matching, and thus leads to a sub-optimal retrieval performance. To this end, in this paper, we propose a novel Bottom-up Color-independent Alignment Learning Framework (BCALF) for text-based person retrieval to tackle this problem in two folds, decoupling color-independent discrete local features and aggregating multiple key discrete features. We employ color-confused images as an auxiliary modality and perform discrete fine-grained semantic alignment where the minimal semantic units interact within the joint feature space to focus solely on content information. Furthermore, the multiple discrete local features are aggregated into more discriminative non-local decisive features. BCALF achieves semantic alignment from minimal semantic units to non-local aggregation units, which can be understood as a bottom-up process. Experimental results demonstrate that BCALF consistently outperforms previous methods and achieves the state-of-the-art performance on the CUHK-PEDES, ICFG-PEDES and RSTPReid datasets.
automation & control systems,computer science, artificial intelligence,engineering, electrical & electronic, multidisciplinary
What problem does this paper attempt to address?