Person Re-Identification with IBN Layer and Channel Attention Module for Indoor Scenarios

Hao Wang,Guoan Cheng,Yongdong Li,Guiyan Cai,Lu Sun,Shengke Wang
DOI: https://doi.org/10.1117/12.2680593
2022-01-01
Abstract:Person re-identification technology is being utilized increasingly frequently in autonomous processing and analysis of surveillance video jobs as a result of recent advancements in deep learning, particularly with safety precautions and smart transportation. As a result of the issues with inadequate illumination and reflection in indoor settings, At the moment, much of the related research on human re-identification concentrates on outside situations, with little attention paid to indoor scenarios. These make the process of person re-identification in complicated indoor scenarios very difficult. The indoor person re-identification algorithm is investigated in this research in order to increase the precision of person recognition in indoor settings. The IBN layer is an addition to the Resnet50 backbone network that uses a combination of instance normalization (IN) and batch normalization (BN) to eliminate individual appearance difference while retaining the feature difference of different individuals to address the issues with the obvious difference of light and shade in person images taken by indoor monitoring. To enhance the expressiveness capability of individual features, the attention module based on feature channel is added to the residual network. In specifically, the learning approach automatically determines the value of each channel in a person's attributes in order to amplify the important qualities and suppress the unnecessary ones. On the other hand, to address the issue of it being challenging to distinguish between similar people caused by more interference factors such as occlusion and reflection in indoor scene, we introduce triple loss in the model training process, which can make the model better learn the details of persons. The three primary validation data sets utilized in this study are Market1501, OUC365, and DukeMTMC-reID. The indoor style and high definition in the OUC365 data set are more obvious, the noise is more obvious in the Market1501 data set, and there is a significant difference in the number of photos among various people in the DukeMTMC-reID data set. The proposed method is tested on several data sets in this paper, and successful results are obtained.
What problem does this paper attempt to address?