Hybridnet for depth estimation and semantic segmentation

Dalila Sánchez-Escobedo,Xiao Lin,Josep R. Casas,Montse Pardàs
DOI: https://doi.org/10.1109/ICASSP.2018.8462433
2024-02-10
Abstract:Semantic segmentation and depth estimation are two important tasks in the area of image processing. Traditionally, these two tasks are addressed in an independent manner. However, for those applications where geometric and semantic information is required, such as robotics or autonomous navigation,depth or semantic segmentation alone are not sufficient. In this paper, depth estimation and semantic segmentation are addressed together from a single input image through a hybrid convolutional network. Different from the state of the art methods where features are extracted by a sole feature extraction network for both tasks, the proposed HybridNet improves the features extraction by separating the relevant features for one task from those which are relevant for both. Experimental results demonstrate that HybridNet results are comparable with the state of the art methods, as well as the single task methods that HybridNet is based on.
Computer Science
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to estimate depth and perform semantic segmentation simultaneously from a single input image. Traditionally, these two tasks are usually processed independently. However, in this paper, the authors propose a new Hybrid Convolutional Network (HybridNet), which can handle these two tasks simultaneously through a unified framework. The main motivation of this method lies in that for applications requiring geometric and semantic information (such as robotics and autonomous navigation), relying on either depth or semantic segmentation alone is not sufficient. HybridNet improves the feature extraction process by separating the features relevant to one task from those relevant to both tasks, thus achieving an improvement in the accuracy of the estimated information. Specifically, HybridNet consists of two major parts: a depth estimation network (the blue part) and a semantic segmentation network (the green part). These two networks are connected through a feature network block (the part where blue and green intersect). The feature network is based on VGG - net and is responsible for extracting robust feature information from the input image. The depth estimation network first estimates the global depth map of the scene from the input image through a global depth network, and then refines this global depth map locally through a refinement depth network to obtain the final depth map. The semantic segmentation network uses the feature information extracted by the feature network to estimate the class score map through an up - sampling network, where the number of channels is equal to the number of labels. The training parameters of the entire network are learned in a way in the feature network so that the extracted features contain both depth information and semantic information, thus achieving the simultaneous solution of the two tasks and being able to provide performance gains for each other. Experimental results show that HybridNet outperforms existing multi - task methods and individual task - solving methods on multiple evaluation metrics, demonstrating the effectiveness and superiority of this method.