A Cross-Modal Multi-granularity Attention Network for RGB-IR Person Re-identification

Jianguo Jiang,Kaiyuan Jin,Meibin Qi,Qian Wang,Jingjing Wu,Cuiqun Chen
DOI: https://doi.org/10.1016/j.neucom.2020.03.109
IF: 6
2020-09-01
Neurocomputing
Abstract:<p>Cross-modal person re-identification(Re-id) under infrared light and visible light (RGB-IR) is of great significance for modern video surveillance, especially nighttime surveillance. The existing research results in the single-mode person re-identification field have reached a high level. Cross-model person re-identification, however, is rather challenging for the tremendous cross-modality and intra-modality difference in addition to common issues such as lighting conditions, human posture, camera angle, and etc..The Cross-modal Multi-granularity Attention Network (CMGN) proposed by this paper enables network to learn the common features of different modalities and map them to the same feature space. Major contributions made by this paper includes:1) A new "butterfly" attention module that can be used for cross-modal tasks is designed to constrain the network attention to common areas of different modes. And in the RegDB and SYSU-MM01 dataset reached the effect of State-of-the-art(SOA). 2) An end-to-end multi-granularity feature fusion network dedicated to processing cross-modal problems is proposed.</p>
computer science, artificial intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is person re - identification under the cross - modal condition of infrared light and visible light (RGB - IR). Specifically, the paper focuses on how to effectively identify images of the same person taken by different cameras under different lighting conditions in video surveillance, especially in night - time surveillance. Existing single - modal person re - identification research has reached a relatively high level, but cross - modal person re - identification is still quite challenging due to significant cross - modal differences (such as the heterogeneity between RGB images and infrared images) and common problems (such as lighting conditions, human postures, camera angles, etc.). To address these challenges, the authors propose a cross - modal multi - granularity attention network (CMGN), aiming to enable the network to learn the common features of different modalities and map them into the same feature space. The main contributions of CMGN include: 1. Designing a new "butterfly" attention module to constrain the network's attention to focus on the common areas of different modalities, achieving state - of - the - art (SOA) results on the RegDB and SYSU - MM01 datasets. 2. Proposing an end - to - end multi - granularity feature fusion network specifically for dealing with cross - modal problems. Through these innovations, CMGN can achieve more accurate person re - identification under cross - modal conditions.