Abstract:Computational resources are limited on real-time embedded devices, so the available computing cost of deployment on the target platform must be considered. We develop a feature extraction module based on the MobileNet backbone that can be adjusted in terms of computational complexity and capacity using the depth multiplier parameter, classifier depth, and kernel depth. These three parameters allow us to control the count of channels within the network, effectively managing the model's capacity and computational requirements. To achieve semantic segmentation, we incorporate additional components, such as an extension module. This extension module typically includes 1x1 pointwise convolutional layers for pixel-level classification and a transposed convolutional layer for upsampling the output to the original input image size. By combining the feature extraction module with this extension module, we can create a complete architecture capable of performing semantic segmentation tasks. The feature extraction module provides the initial feature extraction and the extension module adds the necessary components for accurate pixel-wise classification and upsampling. Compared to Hardware-aware Neural Architecture Search (NAS), pruning, runtime pruning, and knowledge distillation methods, our model has several advantages in terms of modular design, structural controllability, ease of implementation, and cost-effectiveness. Our computational efficiency, as measured by FLOPS, is highly competitive. Our method is distinguished by solving the problem of MobileNet's inability to adjust the size and number of convolution kernels. It achieves this through adaptable parameter tuning, including MobileNet's depth multiplier, the kernel size in the FCN head's Separable Convolution layer, and the depth of the first Point-wise Convolution layer. These adjustments are customized to match hardware's max multiply-accumulates (MACs), optimizing network capacity and maximizing resource utilization.

Cross-scale feature extraction module for efficient RGBD images semantic segmentation

A Scalable Real-time Semantic Segmentation Network for Autonomous Driving

Real-time Semantic Segmentation in Traffic Scene Using Cross Stage Partial-based Encoder–decoder Network

An RGB-D Fusion Based Semantic Segmentation Algorithm Based on Neighborhood Metric Relations

Multiscale Feature Extraction Network for Real-time Semantic Segmentation of Road Scenes on the Autonomous Robot

Towards Robotic Semantic Segmentation of Supporting Surfaces

Cross-scale Graph Interaction Network for Semantic Segmentation of Remote Sensing Images

RGB-D Image Semantic Segmentation Based on Multi-Modal Adaptive Convolution

Multi-Scale Depthwise Separable Convolution for Semantic Segmentation in Street–Road Scenes

Dsmrseg: Dual-Stage Feature Pyramid And Multi-Range Context Aggregation For Real-Time Semantic Segmentation

Cross-Scale Feature Propagation Network for Semantic Segmentation of High-Resolution Remote Sensing Images

Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes

A new real-time image semantic segmentation framework based on a lightweight deep convolutional encoder-decoder architecture for robotic environment sensing

DWRSeg: Rethinking Efficient Acquisition of Multi-scale Contextual Information for Real-time Semantic Segmentation

Efficient Multi-scale Network for Semantic Segmentation of fine-Resolution Remotely Sensed Images

Interactive Efficient Multi-Task Network for RGB-D Semantic Segmentation

CrossFormer Embedding DeepLabv3+ for Remote Sensing Images Semantic Segmentation

Rgb-t semantic segmentation based on cross-operational fusion attention in autonomous driving scenario

Spatial-information Guided Adaptive Context-aware Network for Efficient RGB-D Semantic Segmentation

ResSCNN: A Semantic Segmentation Method for Fast Processing of Large-Scale Input

Fast Semantic Segmentation for Scene Perception