Abstract:Computational resources are limited on real-time embedded devices, so the available computing cost of deployment on the target platform must be considered. We develop a feature extraction module based on the MobileNet backbone that can be adjusted in terms of computational complexity and capacity using the depth multiplier parameter, classifier depth, and kernel depth. These three parameters allow us to control the count of channels within the network, effectively managing the model's capacity and computational requirements. To achieve semantic segmentation, we incorporate additional components, such as an extension module. This extension module typically includes 1x1 pointwise convolutional layers for pixel-level classification and a transposed convolutional layer for upsampling the output to the original input image size. By combining the feature extraction module with this extension module, we can create a complete architecture capable of performing semantic segmentation tasks. The feature extraction module provides the initial feature extraction and the extension module adds the necessary components for accurate pixel-wise classification and upsampling. Compared to Hardware-aware Neural Architecture Search (NAS), pruning, runtime pruning, and knowledge distillation methods, our model has several advantages in terms of modular design, structural controllability, ease of implementation, and cost-effectiveness. Our computational efficiency, as measured by FLOPS, is highly competitive. Our method is distinguished by solving the problem of MobileNet's inability to adjust the size and number of convolution kernels. It achieves this through adaptable parameter tuning, including MobileNet's depth multiplier, the kernel size in the FCN head's Separable Convolution layer, and the depth of the first Point-wise Convolution layer. These adjustments are customized to match hardware's max multiply-accumulates (MACs), optimizing network capacity and maximizing resource utilization.

A Low-Power Graph Convolutional Network Processor With Sparse Grouping for 3D Point Cloud Semantic Segmentation in Mobile Devices

A Scalable Real-time Semantic Segmentation Network for Autonomous Driving

Edge Segmentation: Empowering Mobile Telemedicine with Compressed Cellular Neural Networks

A 3D Tiled Low Power Accelerator for Convolutional Neural Network

A 28-Nm Energy-Efficient Sparse Neural Network Processor for Point Cloud Applications Using Block-Wise Online Neighbor Searching

Accelerating DNN-based 3D point cloud processing for mobile computing

A 3.89-Gops/mw Scalable Recurrent Neural Network Processor with Improved Efficiency on Memory and Computation

An Energy-Efficient, Unified CNN Accelerator for Real-Time Multi-Object Semantic Segmentation for Autonomous Vehicle

GSECnet: Ground Segmentation of Point Clouds for Edge Computing

MLGCN: an ultra efficient graph convolutional neural model for 3D point cloud analysis

A Demonstration Platform for Large-Scaled Point Cloud Network Based on 28nm 2D/3D Unified Sparse Convolution Accelerator.

Three-Dimensional Point Cloud Semantic Segmentation Network Based on Spatial Graph Convolution Network

A 28nm 2D/3D Unified Sparse Convolution Accelerator with Block-Wise Neighbor Searcher for Large-Scaled Voxel-Based Point Cloud Network.

An Energy-Efficient Convolutional Neural Network Processor Architecture Based on a Systolic Array

DyGA: A Hardware-Efficient Accelerator with Traffic-Aware Dynamic Scheduling for Graph Convolutional Networks.

PCSCNet: Fast 3D Semantic Segmentation of LiDAR Point Cloud for Autonomous Car using Point Convolution and Sparse Convolution Network

Semantic Segmentation Optimized for Low Compute Embedded Devices

An Efficient FPGA Accelerator for Point Cloud

An energy-efficient deep convolutional neural networks coprocessor for multi-object detection

DPC-Net: Distributed Point Convolution Network for large-scale point clouds semantic segmentation

EDGCNet: Joint Dynamic Hyperbolic Graph Convolution and Dual Squeeze-and-attention for 3D Point Cloud Segmentation