Infrared and Visible Cross-Modal Image Retrieval Through Shared Features

Fangcen Liu,Chenqiang Gao,Yongqing Sun,Yue Zhao,Feng Yang,Anyong Qin,Deyu Meng
DOI: https://doi.org/10.1109/tcsvt.2020.3048945
IF: 5.859
2021-11-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Image retrieval is one of the key techniques of computer vision, and has been studied for a long time. Nevertheless, little attention is paid to infrared and visible cross-modal retrieval which can be widely used in various applications, e.g., infrared and visible surveillance systems. In this paper, we propose a shared features based infrared-visible cross-modal image retrieval method. The similar visual features are extracted from infrared and visible images as the shared features, and the Euclidean distance is used to measure the similarity between these features. The core of the proposed method comes from three aspects: 1) Feature separation network can separate image features into shared features and exclusive features; 2) Maximum Mean Discrepancy (MMD) loss is employed to constrain the distribution of shared features, which can reduce the retrieval error caused by different imaging angles and similarity of infrared images. 3) The cross-layer fusion encoder compensates for the context loss in the convolution of infrared images. Experimental results on the Infrared-Visible dataset demonstrate the proposed method is effective and outperforms the state-of-the-art approaches.
engineering, electrical & electronic
What problem does this paper attempt to address?
This paper attempts to solve the problem of infrared and visible - light cross - modal image retrieval. Specifically, it aims to develop a method that can retrieve the most similar infrared image from a visible - light image, or vice versa, retrieve the most similar visible - light image from an infrared image. This problem is very important in practical applications. For example, in a surveillance system, the suspect image captured by an infrared camera at night can be matched and searched in the visible - light video during the day, and vice versa. ### Main Challenges 1. **Different Imaging Effects**: Infrared images usually have better imaging quality than visible - light images in low - light conditions, but visible - light images have more texture information and important color information. Therefore, during the convolution process, the context information of infrared images will be quickly lost. 2. **Different Imaging Angles**: Even if the infrared and visible - light cameras are shooting the same object, due to different imaging angles, misalignment may occur between image pixels. 3. **Similarity between Infrared Images**: Currently, infrared cameras have a weak ability to distinguish differences in thermal radiation, resulting in high similarity between infrared images, which is likely to cause misjudgment. ### Solutions To solve the above problems, the author proposes an infrared - visible - light cross - modal image retrieval method based on shared features. The core of this method includes the following three aspects: 1. **Feature Separation Network**: Divide image features into shared features and unique features. Shared features are used for cross - modal matching, while unique features retain the unique information of their respective modalities. 2. **Maximum Mean Discrepancy (MMD) Loss**: Used to constrain the distribution of shared features and reduce retrieval errors caused by different imaging angles and infrared image similarity. 3. **Cross - layer Fusion Encoder**: Compensate for the context information lost in the convolution process of infrared images. Through these technical means, this method can extract potential similar features between infrared and visible - light images, thereby achieving effective cross - modal image retrieval. ### Summary The main contributions of this paper are: - Proposing a novel infrared - visible - light cross - modal image retrieval method based on shared feature extraction. - Introducing a cross - layer fusion encoder and MMD loss to reduce context information loss during the convolution process and make the shared features of the two modalities have the same distribution. - The experimental results on the Infrared - Visible dataset show that this method is effective and superior to the existing baseline methods. Hope this summary can help you understand the core problem of this paper and its solutions. If you have any further questions or need more detailed explanations, please feel free to let me know!