A Multi-label Image Recognition Algorithm Based on Spatial and Semantic Correlation Interaction.

Jing Cheng,Genlin Ji,Qinkai Yang,Junzhao Hao
DOI: https://doi.org/10.1007/978-981-99-8549-4_2
2024-01-01
Abstract:Multi-Label Image Recognition (MLIR) approaches usually exploit label correlations to achieve good performance. Two types of label correlations principally studied, i.e., the spatial and semantic correlations. However, most of the existing algorithms for multi-label image recognition consider semantic correlations and spatial correlations respectively, and often require additional information support. Although some algorithms simultaneously capture the semantic and spatial correlations of labels, they ignore the intrinsic relationship between the two. Specifically, only considering spatial correlations will misidentify some difficult objects in the image. For example, different categories of objects with similar appearance and close distance are mistaken for the same category, and semantic correlations can constrain the error caused by spatial correlations. In this work, we propose a multi-label image recognition algorithm based on transformer, named Spatial and Semantic Correlation Interaction (SSCI). Transformer is used to model the internal relationship between spatial correlations and semantic correlations to improve the recognition ability of the model for difficult objects. Experiments on the public datasets MS-COCO, VOC2007 and VOC2012 show that the mAP values reach 84.1%, 95.0% and 95.4%, respectively. Compared with other MLIR algorithms, the proposed algorithm can significantly improve the recognition performance of multi-label images.
What problem does this paper attempt to address?