Mobile-Seed: Joint Semantic Segmentation and Boundary Detection for Mobile Robots

Youqi Liao,Shuhao Kang,Jianping Li,Yang Liu,Yun Liu,Zhen Dong,Bisheng Yang,Xieyuanli Chen
DOI: https://doi.org/10.1109/LRA.2024.3373235
2024-03-11
Abstract:Precise and rapid delineation of sharp boundaries and robust semantics is essential for numerous downstream robotic tasks, such as robot grasping and manipulation, real-time semantic mapping, and online sensor calibration performed on edge computing units. Although boundary detection and semantic segmentation are complementary tasks, most studies focus on lightweight models for semantic segmentation but overlook the critical role of boundary detection. In this work, we introduce Mobile-Seed, a lightweight, dual-task framework tailored for simultaneous semantic segmentation and boundary detection. Our framework features a two-stream encoder, an active fusion decoder (AFD) and a dual-task regularization approach. The encoder is divided into two pathways: one captures category-aware semantic information, while the other discerns boundaries from multi-scale features. The AFD module dynamically adapts the fusion of semantic and boundary information by learning channel-wise relationships, allowing for precise weight assignment of each channel. Furthermore, we introduce a regularization loss to mitigate the conflicts in dual-task learning and deep diversity supervision. Compared to existing methods, the proposed Mobile-Seed offers a lightweight framework to simultaneously improve semantic segmentation performance and accurately locate object boundaries. Experiments on the Cityscapes dataset have shown that Mobile-Seed achieves notable improvement over the state-of-the-art (SOTA) baseline by 2.2 percentage points (pp) in mIoU and 4.2 pp in mF-score, while maintaining an online inference speed of 23.9 frames-per-second (FPS) with 1024x2048 resolution input on an RTX 2080 Ti GPU. Additional experiments on CamVid and PASCAL Context datasets confirm our method's generalizability. Code and additional results are publicly available at <a class="link-external link-https" href="https://whu-usi3dv.github.io/Mobile-Seed/" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence,Robotics
What problem does this paper attempt to address?
The problem this paper attempts to address is how to design a lightweight framework for simultaneous semantic segmentation and boundary detection in mobile robot applications. Although these two tasks are complementary, most existing research either focuses on lightweight semantic segmentation models while neglecting the importance of boundary detection or designs complex architectures to improve performance but overlooks the computational burden. Therefore, this paper proposes a new method called Mobile-Seed, which aims to improve the performance of semantic segmentation and accurately locate object boundaries through a dual-task framework while maintaining real-time processing capabilities. Specifically, the Mobile-Seed framework in the paper includes the following key components: 1. **Dual-Stream Encoder**: One stream captures category-aware semantic information, and the other distinguishes boundaries from multi-scale features. 2. **Active Fusion Decoder (AFD)**: Dynamically adjusts the fusion of semantic and boundary information by learning the inter-channel correlations to precisely allocate the weight of each channel. 3. **Dual-Task Regularization Loss**: Introduces regularization loss to alleviate conflicts brought by Deep Diverse Supervision (DDS), enabling the semantic segmentation and boundary detection tasks to promote each other. Experimental results show that Mobile-Seed improves mIoU (mean Intersection over Union) and mF-score (mean F-score) by 2.2 percentage points and 4.2 percentage points, respectively, compared to existing methods on the Cityscapes dataset, while maintaining an online inference speed of 23.9 frames per second. Additionally, experiments on the CamVid and PASCAL Context datasets also validate its generalization capability. In summary, the main contribution of this paper is the proposal of a lightweight joint semantic segmentation and boundary detection framework that not only improves segmentation accuracy but also accurately detects object boundaries in complex scenes, making it suitable for real-time applications such as mobile robots.