Abstract:Precise and rapid delineation of sharp boundaries and robust semantics is essential for numerous downstream robotic tasks, such as robot grasping and manipulation, real-time semantic mapping, and online sensor calibration performed on edge computing units. Although boundary detection and semantic segmentation are complementary tasks, most studies focus on lightweight models for semantic segmentation but overlook the critical role of boundary detection. In this work, we introduce Mobile-Seed, a lightweight, dual-task framework tailored for simultaneous semantic segmentation and boundary detection. Our framework features a two-stream encoder, an active fusion decoder (AFD) and a dual-task regularization approach. The encoder is divided into two pathways: one captures category-aware semantic information, while the other discerns boundaries from multi-scale features. The AFD module dynamically adapts the fusion of semantic and boundary information by learning channel-wise relationships, allowing for precise weight assignment of each channel. Furthermore, we introduce a regularization loss to mitigate the conflicts in dual-task learning and deep diversity supervision. Compared to existing methods, the proposed Mobile-Seed offers a lightweight framework to simultaneously improve semantic segmentation performance and accurately locate object boundaries. Experiments on the Cityscapes dataset have shown that Mobile-Seed achieves notable improvement over the state-of-the-art (SOTA) baseline by 2.2 percentage points (pp) in mIoU and 4.2 pp in mF-score, while maintaining an online inference speed of 23.9 frames-per-second (FPS) with 1024x2048 resolution input on an RTX 2080 Ti GPU. Additional experiments on CamVid and PASCAL Context datasets confirm our method's generalizability. Code and additional results are publicly available at <a class="link-external link-https" href="https://whu-usi3dv.github.io/Mobile-Seed/" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem this paper attempts to address is how to design a lightweight framework for simultaneous semantic segmentation and boundary detection in mobile robot applications. Although these two tasks are complementary, most existing research either focuses on lightweight semantic segmentation models while neglecting the importance of boundary detection or designs complex architectures to improve performance but overlooks the computational burden. Therefore, this paper proposes a new method called Mobile-Seed, which aims to improve the performance of semantic segmentation and accurately locate object boundaries through a dual-task framework while maintaining real-time processing capabilities. Specifically, the Mobile-Seed framework in the paper includes the following key components: 1. **Dual-Stream Encoder**: One stream captures category-aware semantic information, and the other distinguishes boundaries from multi-scale features. 2. **Active Fusion Decoder (AFD)**: Dynamically adjusts the fusion of semantic and boundary information by learning the inter-channel correlations to precisely allocate the weight of each channel. 3. **Dual-Task Regularization Loss**: Introduces regularization loss to alleviate conflicts brought by Deep Diverse Supervision (DDS), enabling the semantic segmentation and boundary detection tasks to promote each other. Experimental results show that Mobile-Seed improves mIoU (mean Intersection over Union) and mF-score (mean F-score) by 2.2 percentage points and 4.2 percentage points, respectively, compared to existing methods on the Cityscapes dataset, while maintaining an online inference speed of 23.9 frames per second. Additionally, experiments on the CamVid and PASCAL Context datasets also validate its generalization capability. In summary, the main contribution of this paper is the proposal of a lightweight joint semantic segmentation and boundary detection framework that not only improves segmentation accuracy but also accurately detects object boundaries in complex scenes, making it suitable for real-time applications such as mobile robots.

Mobile-Seed: Joint Semantic Segmentation and Boundary Detection for Mobile Robots

A Scalable Real-time Semantic Segmentation Network for Autonomous Driving

A new real-time image semantic segmentation framework based on a lightweight deep convolutional encoder-decoder architecture for robotic environment sensing

An RGB-D Fusion Based Semantic Segmentation Algorithm Based on Neighborhood Metric Relations

Efficient Dual-Branch Bottleneck Networks of Semantic Segmentation Based on CCD Camera

A Boundary Guided Cross Fusion Approach for Remote Sensing Image Segmentation

AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation

Multi-view Incremental Segmentation of 3D Point Clouds for Mobile Robots

An Onboard Point Cloud Semantic Segmentation System for Robotic Platforms

A Mobile Robot Visual SLAM System With Enhanced Semantics Segmentation

BENet: boundary-enhanced network for real-time semantic segmentation

Multi-Scale Depthwise Separable Convolution for Semantic Segmentation in Street–Road Scenes

Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes

Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations

Multi-modal LiDAR Point Cloud Semantic Segmentation with Salience Refinement and Boundary Perception

MAE-BG: dual-stream boundary optimization for remote sensing image semantic segmentation

MobileSAM-Track: Lightweight One-Shot Tracking and Segmentation of Small Objects on Edge Devices

Boundary-Guided Lightweight Semantic Segmentation With Multi-Scale Semantic Context

Boundary-Aware Geometric Encoding for Semantic Segmentation of Point Clouds

SeaFormer++: Squeeze-enhanced Axial Transformer for Mobile Visual Recognition

Robust 3D Semantic Segmentation Method Based on Multi-Modal Collaborative Learning