Abstract:Computational resources are limited on real-time embedded devices, so the available computing cost of deployment on the target platform must be considered. We develop a feature extraction module based on the MobileNet backbone that can be adjusted in terms of computational complexity and capacity using the depth multiplier parameter, classifier depth, and kernel depth. These three parameters allow us to control the count of channels within the network, effectively managing the model's capacity and computational requirements. To achieve semantic segmentation, we incorporate additional components, such as an extension module. This extension module typically includes 1x1 pointwise convolutional layers for pixel-level classification and a transposed convolutional layer for upsampling the output to the original input image size. By combining the feature extraction module with this extension module, we can create a complete architecture capable of performing semantic segmentation tasks. The feature extraction module provides the initial feature extraction and the extension module adds the necessary components for accurate pixel-wise classification and upsampling. Compared to Hardware-aware Neural Architecture Search (NAS), pruning, runtime pruning, and knowledge distillation methods, our model has several advantages in terms of modular design, structural controllability, ease of implementation, and cost-effectiveness. Our computational efficiency, as measured by FLOPS, is highly competitive. Our method is distinguished by solving the problem of MobileNet's inability to adjust the size and number of convolution kernels. It achieves this through adaptable parameter tuning, including MobileNet's depth multiplier, the kernel size in the FCN head's Separable Convolution layer, and the depth of the first Point-wise Convolution layer. These adjustments are customized to match hardware's max multiply-accumulates (MACs), optimizing network capacity and maximizing resource utilization.

SegTransConv: Transformer and CNN Hybrid Method for Real-Time Semantic Segmentation of Autonomous Vehicles

SemiCVT: Semi-Supervised Convolutional Vision Transformer for Semantic Segmentation

A Scalable Real-time Semantic Segmentation Network for Autonomous Driving

Omnisupervised Omnidirectional Semantic Segmentation

Real-time Semantic Segmentation in Traffic Scene Using Cross Stage Partial-based Encoder–decoder Network

Dual-resolution Transformer Combined with Multi-Layer Separable Convolution Fusion Network for Real-Time Semantic Segmentation

Rethinking Transformers for Semantic Segmentation of Remote Sensing Images.

STransFuse: Fusing Swin Transformer and Convolutional Neural Network for Remote Sensing Image Semantic Segmentation

SDPT: Semantic-Aware Dimension-Pooling Transformer for Image Segmentation

Transformer and CNN Hybrid Deep Neural Network for Semantic Segmentation of Very-High-Resolution Remote Sensing Imagery

MFTransNet: A Multi-Modal Fusion with CNN-Transformer Network for Semantic Segmentation of HSR Remote Sensing Images

TCNet: Multiscale Fusion of Transformer and CNN for Semantic Segmentation of Remote Sensing Images

LACTNet: A Lightweight Real-Time Semantic Segmentation Network Based on an Aggregated Convolutional Neural Network and Transformer

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

LACTNet: A Lightweight Real-time Semantic Segmentation Network Based on Aggregation CNN and Transformer

A Hybrid CNN-transformer Network: Accurate and Efficient Semantic Segmentation of Crops and Weeds on Resource-Constrained Embedded Devices

Real-Time Semantic Segmentation via Multiply Spatial Fusion Network

TransCloudSeg: Ground-Based Cloud Image Segmentation with Transformer

RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer

Hybrid Attention Fusion Embedded in Transformer for Remote Sensing Image Semantic Segmentation

Trans4Trans: Efficient Transformer for Transparent Object and Semantic Scene Segmentation in Real-World Navigation Assistance