IIMT-net: Poly-1 Weights Balanced Multi-Task Network for Semantic Segmentation and Depth Estimation Using Interactive Information
Mengfei He,Zhiyou Yang,Guangben Zhang,Yan Long,Huaibo Song
DOI: https://doi.org/10.1016/j.imavis.2024.105109
IF: 3.86
2024-01-01
Image and Vision Computing
Abstract:Semantic segmentation and depth estimation are two basic researchable problems in computer vision. In common, we explore the two tasks separately. However, in some scenes, such as autonomous driving, they need be done at the same time. Meanwhile, there exists interconnected information between two tasks, which can jointly promote the performances of them. Thus, we explore the two tasks based on multi-task learning to jointly train the tasks and gain predictions together. In this paper, we build Interactive Information Multi-Task Network (IIMT-Net) incorporating the information interactive modules, trained with proposed task-balancing strategy. To be specific, we construct the principal part of encoder and decoder based on Transformer to well capture the global information. For better utilization of the task interaction between two tasks, we also add information fusion modules in two sub-decoders. In addition, the task-balancing strategy, Poly-1 weights, is designed as the balance among samples with different degrees of difficulty to ensure the network won't be biased towards any task severely. The proposed approach's exceptional performance has been extensively showcased through experimental results on the NYU Depth V2 dataset, the Cityscapes dataset, and the SUN RGB-D dataset. Our model can complete the predictions of semantic segmentation task and depth estimation task together and obtain mIoU values of 46.66% on the NYU Depth V2 dataset, 66.37% on the Cityscapes dataset, and 49.89% on the SUN RGB-D dataset, respectively with rmse values of 0.648, 6.630 and 0.401 for depth estimation task, which outperform most existing methods in multi-task learning.