Abstract:Computational resources are limited on real-time embedded devices, so the available computing cost of deployment on the target platform must be considered. We develop a feature extraction module based on the MobileNet backbone that can be adjusted in terms of computational complexity and capacity using the depth multiplier parameter, classifier depth, and kernel depth. These three parameters allow us to control the count of channels within the network, effectively managing the model's capacity and computational requirements. To achieve semantic segmentation, we incorporate additional components, such as an extension module. This extension module typically includes 1x1 pointwise convolutional layers for pixel-level classification and a transposed convolutional layer for upsampling the output to the original input image size. By combining the feature extraction module with this extension module, we can create a complete architecture capable of performing semantic segmentation tasks. The feature extraction module provides the initial feature extraction and the extension module adds the necessary components for accurate pixel-wise classification and upsampling. Compared to Hardware-aware Neural Architecture Search (NAS), pruning, runtime pruning, and knowledge distillation methods, our model has several advantages in terms of modular design, structural controllability, ease of implementation, and cost-effectiveness. Our computational efficiency, as measured by FLOPS, is highly competitive. Our method is distinguished by solving the problem of MobileNet's inability to adjust the size and number of convolution kernels. It achieves this through adaptable parameter tuning, including MobileNet's depth multiplier, the kernel size in the FCN head's Separable Convolution layer, and the depth of the first Point-wise Convolution layer. These adjustments are customized to match hardware's max multiply-accumulates (MACs), optimizing network capacity and maximizing resource utilization.

AtICNet: Semantic Segmentation with Atrous Spatial Pyramid Pooling in Image Cascade Network

A Scalable Real-time Semantic Segmentation Network for Autonomous Driving

DSNet: A Novel Way to Use Atrous Convolutions in Semantic Segmentation

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

Real-Time High-Performance Semantic Image Segmentation of Urban Street Scenes

ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-high Resolution Segmentation

DPNet: Dual-Pyramid Semantic Segmentation Network Based on Improved Deeplabv3 Plus

LiDAR Point Clouds Semantic Segmentation in Autonomous Driving Based on Asymmetrical Convolution

APDC-Net: Attention Pooling-Based Convolutional Network for Aerial Scene Classification

Semantic Segmentation of Aerial Imagery Via Split-Attention Networks with Disentangled Nonlocal and Edge Supervision

DPANET:Dual Pooling Attention Network for Semantic Segmentation

PCANet: Pyramid convolutional attention network for semantic segmentation

Semantic Segmentation Network Based on Adaptive Attention and Deep Fusion Utilizing a Multi-Scale Dilated Convolutional Pyramid

Aerial-BiSeNet: A real-time semantic segmentation network for high resolution aerial imagery

Rethinking Atrous Convolution for Semantic Image Segmentation

FCPFNet: Feature Complementation Network with Pyramid Fusion for Semantic Segmentation

Implementation of a Lightweight Semantic Segmentation Algorithm in Road Obstacle Detection

ELKPPNet: An Edge-aware Neural Network with Large Kernel Pyramid Pooling for Learning Discriminative Features in Semantic Segmentation

Adaptive multi-scale dual attention network for semantic segmentation

Semantic Segmentation for Urban-Scene Images

Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs