Automatic Captioning Based on Visible and Infrared Images

Wang Yan,Lou Shuli,Wang Kai,Yuan Xiaohu,Liu Huaping
DOI: https://doi.org/10.1109/icra57147.2024.10610654
2024-01-01
Abstract:In this paper, we tackle the task of image captioning with the complementarity of visible light images and infrared images. To address this problem, we propose an RGBIR image fusion captioning model, which can take full advantage of visible light images and infrared images under different conditions. Meanwhile, we develop a wearable environment-assisted system. In addition, we collect and annotate a new dataset containing 3510 pairs of RGB-IR images to support model training. Finally, we conduct extensive experiments to evaluate the model and system. Experimental results show that our new method and system significantly outperform baselines on multiple metrics and have potential practical value.
What problem does this paper attempt to address?