S3Net: Innovating Stereo Matching and Semantic Segmentation with a Single-Branch Semantic Stereo Network in Satellite Epipolar Imagery

Qingyuan Yang,Guanzhou Chen,Xiaoliang Tan,Tong Wang,Jiaqi Wang,Xiaodong Zhang
DOI: https://doi.org/10.1109/IGARSS53475.2024.10640492
2024-10-01
Abstract:Stereo matching and semantic segmentation are significant tasks in binocular satellite 3D reconstruction. However, previous studies primarily view these as independent parallel tasks, lacking an integrated multitask learning framework. This work introduces a solution, the Single-branch Semantic Stereo Network (S3Net), which innovatively combines semantic segmentation and stereo matching using Self-Fuse and Mutual-Fuse modules. Unlike preceding methods that utilize semantic or disparity information independently, our method dentifies and leverages the intrinsic link between these two tasks, leading to a more accurate understanding of semantic information and disparity estimation. Comparative testing on the US3D dataset proves the effectiveness of our S3Net. Our model improves the mIoU in semantic segmentation from 61.38 to 67.39, and reduces the D1-Error and average endpoint error (EPE) in disparity estimation from 10.051 to 9.579 and 1.439 to 1.403 respectively, surpassing existing competitive methods. Our codes are available at:<a class="link-external link-https" href="https://github.com/CVEO/S3Net" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper aims to address the issue of the independence between stereo matching and semantic segmentation tasks in satellite imagery. Traditional research methods usually treat these two tasks as independent, lacking an integrated multi-task learning framework. This leads to insufficient utilization of semantic information and disparity estimation, affecting the accuracy and robustness of the tasks. Specifically, the paper proposes the Single-branch Semantic Stereo Network (S3Net), which innovatively combines semantic segmentation and stereo matching tasks through Self-Fuse and Mutual-Fuse modules. This approach not only enhances the understanding of semantic information but also improves the accuracy of disparity estimation. Experimental results show that S3Net outperforms existing competitive methods on the US3D dataset, particularly achieving significant improvements in the mIoU metric for semantic segmentation and the D1-Error and average endpoint error (EPE) for disparity estimation.