Built-in Depth-Semantic Coupled Encoding for Scene Parsing, Vehicle Detection, and Road Segmentation

Shiyu Liu,Haofeng Zhang,Ling Shao,Jingyu Yang
DOI: https://doi.org/10.1109/tits.2020.2987819
IF: 8.5
2021-09-01
IEEE Transactions on Intelligent Transportation Systems
Abstract:Recent representative scene parsing methods based on Convolutional Neural Networks (CNN) have greatly improved spatial resolution of pixel-wise labelling by exploiting multi-scale features and refined boundaries. However, the vast majority of previous works only utilize the color or textural information of images, without considering the depth information, which is beneficial for semantic reasoning. In this paper, we take advantages of the mutual benefit and strong correlation between depth information and semantic information in scene parsing by introducing the Built-in Depth-Semantic Coupled Encoding (BDSCE) module, which adaptively fuses RGB and depth features, and selectively highlights the depth-discriminative features. The proposed BDSCE module is compatible with existing CNN based methods, and can greatly improve scene parsing performance, particularly in those categories that have clear depth distinction and might be misclassified with RGB-only features. Furthermore, we also extend our proposed module to other urban scene semantic reasoning tasks such as vehicle detection and road segmentation, which are implemented by effectively learning and exploiting the encoded depth-semantics and transferring the learned representations with fine-tuning. The extensive experiments on the popular datasets Cityscapes and KITTI demonstrate that our method performs quite well and can significantly improve the state-of-the-art methods.
engineering, electrical & electronic,transportation science & technology, civil
What problem does this paper attempt to address?