Abstract:Semantic segmentation and depth estimation are two important tasks in the area of image processing. Traditionally, these two tasks are addressed in an independent manner. However, for those applications where geometric and semantic information is required, such as robotics or autonomous navigation,depth or semantic segmentation alone are not sufficient. In this paper, depth estimation and semantic segmentation are addressed together from a single input image through a hybrid convolutional network. Different from the state of the art methods where features are extracted by a sole feature extraction network for both tasks, the proposed HybridNet improves the features extraction by separating the relevant features for one task from those which are relevant for both. Experimental results demonstrate that HybridNet results are comparable with the state of the art methods, as well as the single task methods that HybridNet is based on.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to estimate depth and perform semantic segmentation simultaneously from a single input image. Traditionally, these two tasks are usually processed independently. However, in this paper, the authors propose a new Hybrid Convolutional Network (HybridNet), which can handle these two tasks simultaneously through a unified framework. The main motivation of this method lies in that for applications requiring geometric and semantic information (such as robotics and autonomous navigation), relying on either depth or semantic segmentation alone is not sufficient. HybridNet improves the feature extraction process by separating the features relevant to one task from those relevant to both tasks, thus achieving an improvement in the accuracy of the estimated information. Specifically, HybridNet consists of two major parts: a depth estimation network (the blue part) and a semantic segmentation network (the green part). These two networks are connected through a feature network block (the part where blue and green intersect). The feature network is based on VGG - net and is responsible for extracting robust feature information from the input image. The depth estimation network first estimates the global depth map of the scene from the input image through a global depth network, and then refines this global depth map locally through a refinement depth network to obtain the final depth map. The semantic segmentation network uses the feature information extracted by the feature network to estimate the class score map through an up - sampling network, where the number of channels is equal to the number of labels. The training parameters of the entire network are learned in a way in the feature network so that the extracted features contain both depth information and semantic information, thus achieving the simultaneous solution of the two tasks and being able to provide performance gains for each other. Experimental results show that HybridNet outperforms existing multi - task methods and individual task - solving methods on multiple evaluation metrics, demonstrating the effectiveness and superiority of this method.

Hybridnet for depth estimation and semantic segmentation

A Depth Estimation Framework Based on Unsupervised Learning and Cross-Modal Translation

Binocular Depth Estimation Using Convolutional Neural Network With Siamese Branches.

HDNet: Hybrid Distance Network for semantic segmentation

Depth Estimation from Single Monocular Images Using Deep Hybrid Network

SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from Monocular Images

Semantic Segmentation and Depth Estimation of Urban Road Scene Images Using Multi-Task Networks

SemSegDepth: A Combined Model for Semantic Segmentation and Depth Completion

Hybrid Dilated Convolution Network Using Attentive Kernels for Real-Time Semantic Segmentation

EHANet: Efficient Hybrid Attention Network Towards Real-time Semantic Segmentation

JSH-Net: joint semantic segmentation and height estimation using deep convolutional networks from single high-resolution remote sensing imagery

CI-Net: a joint depth estimation and semantic segmentation network using contextual information

Novel Hybrid Neural Network for Dense Depth Estimation Using On-Board Monocular Images

Efficient Depth Fusion Transformer for Aerial Image Semantic Segmentation

Panoptic-Depth Color Map for Combination of Depth and Image Segmentation

Semantics Recalibration and Detail Enhancement Network for Real‐time Semantic Segmentation

Incorporating Luminance, Depth and Color Information by a Fusion-based Network for Semantic Segmentation

Dual-Path Feature Fusion Network for Semantic Segmentation of Remote Sensing Images

HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation

Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

A hybrid approach consisting of 3D depthwise separable convolution and depthwise squeeze-and-excitation network for hyperspectral image classification