Abstract:Computational resources are limited on real-time embedded devices, so the available computing cost of deployment on the target platform must be considered. We develop a feature extraction module based on the MobileNet backbone that can be adjusted in terms of computational complexity and capacity using the depth multiplier parameter, classifier depth, and kernel depth. These three parameters allow us to control the count of channels within the network, effectively managing the model's capacity and computational requirements. To achieve semantic segmentation, we incorporate additional components, such as an extension module. This extension module typically includes 1x1 pointwise convolutional layers for pixel-level classification and a transposed convolutional layer for upsampling the output to the original input image size. By combining the feature extraction module with this extension module, we can create a complete architecture capable of performing semantic segmentation tasks. The feature extraction module provides the initial feature extraction and the extension module adds the necessary components for accurate pixel-wise classification and upsampling. Compared to Hardware-aware Neural Architecture Search (NAS), pruning, runtime pruning, and knowledge distillation methods, our model has several advantages in terms of modular design, structural controllability, ease of implementation, and cost-effectiveness. Our computational efficiency, as measured by FLOPS, is highly competitive. Our method is distinguished by solving the problem of MobileNet's inability to adjust the size and number of convolution kernels. It achieves this through adaptable parameter tuning, including MobileNet's depth multiplier, the kernel size in the FCN head's Separable Convolution layer, and the depth of the first Point-wise Convolution layer. These adjustments are customized to match hardware's max multiply-accumulates (MACs), optimizing network capacity and maximizing resource utilization.

S3-Net: A Fast Scene Understanding Network by Single-Shot Segmentation for Autonomous Driving

S3-Net: A Fast and Lightweight Video Scene Understanding Network by Single-shot Segmentation.

Up-to-Down Network: Fusing Multi-Scale Context for 3D Semantic Scene Completion

A Scalable Real-time Semantic Segmentation Network for Autonomous Driving

S3Net: 3D LiDAR Sparse Semantic Segmentation Network

NDNet: Spacewise Multiscale Representation Learning via Neighbor Decoupling for Real-Time Driving Scene Parsing

Driving Scene Perception Network: Real-time Joint Detection, Depth Estimation and Semantic Segmentation

S$^3$M-Net: Joint Learning of Semantic Segmentation and Stereo Matching for Autonomous Driving

ABSSNet: Attention-Based Spatial Segmentation Network for Traffic Scene Understanding

(AF)2-S3Net: Attentive Feature Fusion with Adaptive Feature Selection for Sparse Semantic Segmentation Network

DSNet for Real-Time Driving Scene Semantic Segmentation

A Scene Understanding Network Based on Driving Scene

Progressive Scene Segmentation Based on Self-Attention Mechanism.

Spatial-Assistant Encoder-Decoder Network for Real Time Semantic Segmentation

SRNet: A 3D Scene Recognition Network Using Static Graph and Dense Semantic Fusion.

Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes

Real-Time Segmentation of Unstructured Environments by Combining Domain Generalization and Attention Mechanisms

FBSNet: A Fast Bilateral Symmetrical Network for Real-Time Semantic Segmentation

RGB and LiDAR Fusion-based 3D Semantic Segmentation for Autonomous Driving

BiSeNet V3: Bilateral Segmentation Network with Coordinate Attention for Real-time Semantic Segmentation