SG-RoadSeg: End-to-End Collision-Free Space Detection Sharing Encoder Representations Jointly Learned Via Unsupervised Deep Stereo

Wu Zhiyuan,Li Jiaqi,Feng Yi,Liu Chengju,Ye Wei,Chen Qijun,Fan Rui
DOI: https://doi.org/10.1109/icra57147.2024.10611191
2024-01-01
Abstract:Collision-free space detection is of utmost importance for autonomous robot perception and navigation. State-of-the-art (SoTA) approaches generally extract features from RGB images and an additional source or modality of 3-D information, such as depth or disparity images, using a pair of independent encoders. The extracted features are subsequently fused and decoded to yield semantic predictions of collision-free spaces. Such feature-fusion approaches become infeasible in scenarios, where the sensor for 3-D information acquisition is unavailable, or just when multi-sensor calibration falls short of the necessary precision. To overcome these limitations, this paper introduces a novel end-to-end collision-free space detection network, referred to as SG-RoadSeg, built upon our previous work SNE-RoadSeg. A key contribution of this paper is a strategy for sharing encoder representations that are co-learned through both semantic segmentation and unsupervised stereo matching tasks, enabling the features extracted from RGB images to contain both semantic and spatial geometric information. The unsupervised deep stereo serves as an auxiliary functionality, capable of generating accurate disparity maps that can be used by other perception tasks that require depth-related data. Comprehensive experimental results on the KITTI road and semantics datasets validate the effectiveness of our proposed architecture and encoder representation sharing strategy. SG-RoadSeg also demonstrates superior performance than other SoTA collision-free space detection approaches. Our source code, demo video, and supplement are publicly available at mias.group/SG-RoadSeg.
What problem does this paper attempt to address?