Abstract:Computational resources are limited on real-time embedded devices, so the available computing cost of deployment on the target platform must be considered. We develop a feature extraction module based on the MobileNet backbone that can be adjusted in terms of computational complexity and capacity using the depth multiplier parameter, classifier depth, and kernel depth. These three parameters allow us to control the count of channels within the network, effectively managing the model's capacity and computational requirements. To achieve semantic segmentation, we incorporate additional components, such as an extension module. This extension module typically includes 1x1 pointwise convolutional layers for pixel-level classification and a transposed convolutional layer for upsampling the output to the original input image size. By combining the feature extraction module with this extension module, we can create a complete architecture capable of performing semantic segmentation tasks. The feature extraction module provides the initial feature extraction and the extension module adds the necessary components for accurate pixel-wise classification and upsampling. Compared to Hardware-aware Neural Architecture Search (NAS), pruning, runtime pruning, and knowledge distillation methods, our model has several advantages in terms of modular design, structural controllability, ease of implementation, and cost-effectiveness. Our computational efficiency, as measured by FLOPS, is highly competitive. Our method is distinguished by solving the problem of MobileNet's inability to adjust the size and number of convolution kernels. It achieves this through adaptable parameter tuning, including MobileNet's depth multiplier, the kernel size in the FCN head's Separable Convolution layer, and the depth of the first Point-wise Convolution layer. These adjustments are customized to match hardware's max multiply-accumulates (MACs), optimizing network capacity and maximizing resource utilization.

ESDAR-net: Towards High-Accuracy and Real-Time Driver Action Recognition for Embedded Systems

Unifying Terrain Awareness Through Real-Time Semantic Segmentation

A Scalable Real-time Semantic Segmentation Network for Autonomous Driving

E-DNet: An End-to-End Dual-Branch Network for Driver Steering Intention Detection

Depth Video-Based Secondary Action Recognition in Vehicles via Convolutional Neural Network and Bidirectional Long Short-Term Memory with Spatial Enhanced Attention Mechanism

A Multi-Semantic Driver Behavior Recognition Model of Autonomous Vehicles Using Confidence Fusion Mechanism

Embedded and Real-Time Vehicle Detection System for Challenging On-Road Scenes

TEINet: Towards an Efficient Architecture for Video Recognition.

Driver Behavior Recognition via Interwoven Deep Convolutional Neural Nets With Multi-Stream Inputs

TransDARC: Transformer-based Driver Activity Recognition with Latent Space Feature Calibration

Driver behavior recognition based on deep convolutional neural networks

STDA: Spatio-Temporal Dual-Encoder Network Incorporating Driver Attention to Predict Driver Behaviors Under Safety-Critical Scenarios

Deep learning approach for accurate and stable recognition of driver's lateral intentions using naturalistic driving data

Real-Time Human Action Recognition on Embedded Platforms

Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition

Driver Activity Recognition for Intelligent Vehicles: A Deep Learning Approach

Real-Time Activity Recognition and Intention Recognition Using a Vision-based Embedded System

A Hybrid Deep Learning Model for Recognizing Actions of Distracted Drivers

NDNet: Spacewise Multiscale Representation Learning via Neighbor Decoupling for Real-Time Driving Scene Parsing

DSDFormer: An Innovative Transformer-Mamba Framework for Robust High-Precision Driver Distraction Identification