Survey on Semantic Stereo Matching / Semantic Depth Estimation

Viny Saajan Victor,Peter Neigel
DOI: https://doi.org/10.48550/arXiv.2109.10123
2021-09-21
Abstract:Stereo matching is one of the widely used techniques for inferring depth from stereo images owing to its robustness and speed. It has become one of the major topics of research since it finds its applications in autonomous driving, robotic navigation, 3D reconstruction, and many other fields. Finding pixel correspondences in non-textured, occluded and reflective areas is the major challenge in stereo matching. Recent developments have shown that semantic cues from image segmentation can be used to improve the results of stereo matching. Many deep neural network architectures have been proposed to leverage the advantages of semantic segmentation in stereo matching. This paper aims to give a comparison among the state of art networks both in terms of accuracy and in terms of speed which are of higher importance in real-time applications.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
This paper aims to address the challenges of stereo matching in finding pixel correspondences in non - textured, occluded and reflective regions. Specifically, the paper explores how to use semantic cues in image segmentation to improve the accuracy of stereo matching. The paper also makes a comparative analysis of the performance of the current state - of - the - art network architectures in terms of accuracy and speed, which are particularly important for real - time applications. By integrating semantic segmentation and depth estimation, the paper proposes several methods to improve the effect of stereo matching. These methods include but are not limited to: 1. **Joint feature extraction**: Extract features that are common to stereo matching and semantic segmentation from the input stereo images. 2. **Disparity estimation**: Use deep convolutional layers to extract features specific to disparity estimation and create a cost volume through regression to obtain an initial disparity map. 3. **Semantic segmentation**: Extract semantic labels of the image, which helps to improve disparity estimation in non - textured, occluded and reflective regions. 4. **Disparity refinement**: Use semantic cues to refine the initial disparity, especially in difficult - to - handle regions, to improve the accuracy of the final disparity map. In addition, the paper also discusses the application of different loss functions, such as $L_1$ smooth loss, Softmax cross - entropy loss, photometric loss, regularization loss, consistency loss, smooth loss and cross - domain discontinuity loss. These loss functions are used to optimize model performance during the training process. In summary, the main objective of this paper is to explore and compare various methods for semantic stereo matching, especially in the trade - off between accuracy and real - time performance, and to provide guidance for practical applications.