RegSeg: An End-to-End Network for Multimodal RGB-Thermal Registration and Semantic Segmentation
Wenjie Lai,Fanyu Zeng,Xiao Hu,Shaowei He,Ziji Liu,Yadong Jiang
DOI: https://doi.org/10.1109/tip.2024.3501077
IF: 10.6
2024-12-07
IEEE Transactions on Image Processing
Abstract:The misalignment between RGB and thermal images significantly impairs RGB-Thermal semantic segmentation accuracy. Current non-end-to-end methods treat RGB-Thermal registration independently of semantic segmentation, resulting in fusion errors, redundant computations, and poor real-time performance. Semantic segmentation accuracy directly correlates with registration precision: better registration yields more accurate segmentation. Moreover, regions with identical semantic labels, indicating the same object, tend to share similar registration offsets. Based on these correlations, we propose an end-to-end multimodal registration and segmentation method using flexible deformation fields. Our method utilizes a shared encoder for registration and semantic segmentation to reduce redundancy. Unlike traditional non-end-to-end approaches, it directly registers high-level perceptual features, thereby optimizing computational efficiency and real-time performance. Additionally, we employ a flexible deformation field to register RGB-Thermal data, addressing limitations of traditional affine transformations in handling non-coplanar and non-rigid registrations. However, the increased flexibility of deformation fields compared to affine transformations, and the sacrificing of geometric feature preservation, pose training challenges. To overcome this, we introduce a semantic alignment loss function to train the alignment module. This function calculates the semantic segmentation loss between the predictions from registered thermal features and RGB semantic labels. It shortens the gradient backpropagation path, aligning the objectives of registration and segmentation. We validate our end-to-end approach through extensive experiments, achieving significant performance enhancements. On the IR SEG dataset, our end-to-end method achieves state-of-the-art results with a mean Intersection over Union (mIoU) of 61.1% and a mean accuracy (mAcc) of 76.0%.
computer science, artificial intelligence,engineering, electrical & electronic