CI-Net: a joint depth estimation and semantic segmentation network using contextual information

Tianxiao Gao,Wu Wei,Zhongbin Cai,Zhun Fan,Sheng Quan Xie,Xinmei Wang,Qiuda Yu
DOI: https://doi.org/10.1007/s10489-022-03401-x
IF: 5.3
2022-04-09
Applied Intelligence
Abstract:Monocular depth estimation and semantic segmentation are two fundamental goals of scene understanding. Due to the advantages of task interaction, many works have studied the joint-task learning algorithm. However, most existing methods fail to fully leverage the semantic labels, ignoring the provided context structures and only using them to supervise the prediction of segmentation split, which limits the performance of both tasks. In this paper, we propose a network injected with contextual information (CI-Net) to solve this problem. Specifically, we introduce a self-attention block in the encoder to generate an attention map. With supervision from the ideal attention map created by semantic label, the network is embedded with contextual information so that it could understand the scene better and utilize correlated features to make accurate prediction. Besides, a feature-sharing module (FSM) is constructed to make the task-specific features deeply fused, and a consistency loss is devised to ensure that the features mutually guided. We extensively evaluate the proposed CI-Net on NYU-Depth-v2, SUN-RGBD, and Cityscapes datasets. The experimental results validate that our proposed CI-Net could effectively improve the accuracy of semantic segmentation and depth estimation.
computer science, artificial intelligence
What problem does this paper attempt to address?