SSNet: a joint learning network for semantic segmentation and disparity estimation

Dayu Jia,Yanwei Pang,Jiale Cao,Pan Jing
DOI: https://doi.org/10.1007/s00371-024-03336-z
IF: 2.835
2024-04-04
The Visual Computer
Abstract:Joint learning for semantic segmentation and disparity estimation is adopted to scene parsing for mutual benefit. However, existing joint learning approaches unify the two task briefly which may result in negative feature mixing. In order to solve the problem, a win–win approach Stereo Semantic Network (SSNet) is proposed for pixel-wise scene parsing. SSNet is the first Transformer based end-to-end joint learning model for semantic segmentation and disparity estimation. The main novelty lies in the proposed Transformer Feature Separation Module (TFSM) which is designed to separate features for segmentation prediction and disparity regression according to the characteristics of the two tasks. The segmentation and disparity results are supervised jointly with a weighted summation loss function to improve the performance of both tasks. Experimental results on Cityscapes Dataset and KITTI 2015 Dataset demonstrate that SSNet outperforms state-of-the-art joint learning approaches.
computer science, software engineering
What problem does this paper attempt to address?