Abstract:We have witnessed an increasing usage of multi-modal DNNs with multi-task heads on edge computing scenarios. These networks typically process inputs of different modalities first, then extract features for unified fusion, and finally input the fused features into multi-task heads. Such networks are often used to determine pose and navigate movement direction via multi-modal data obtained from diverse sensory equipment, therefore necessitating low inference latency. An edge device cluster with high-speed interconnection can be employed to support such DNN workload for scaled-out performance.For accelerating model inference on edge devices, previous researchers have proposed methods including model pruning, quantization, etc. However, these methods failed to take advantage of the structural features of multi-modal DNNs with multi-task heads and may impair the model’s prediction accuracy.Based on the intrinsic structure of multi-modal DNNs with multi-task heads, we propose Sub-model Parallelism to achieve scalable execution speedup. Sub-model Parallelism is a scale-out deployment method that first assigns preprocessing tasks of different modalities to different edge devices, then delivers them to a device for modality feature fusion, and finally distributes the fused features to other devices responsible for different task head computations. We run experiments on BEVFusion network and achieve an approximately 30% reduction in latency using two Jetson Orin devices connected by Remote Direct Memory Access (RDMA). Furthermore, we conduct a series of simulation experiments to cover scale-out scenarios and also achieve a good level of latency reduction. We hope that our proposed method can provide valuable experience for the optimized scale-out deployment of large multi-modal DNNs with multi-task heads on multiple edge devices.

Model Parallelism Optimization for Distributed DNN Inference on Edge Devices.

Efficient Partitioning and Communication Scheme-Based Distributed Edge Computing to Accelerate Deep Neural Network

Extendable Multi-Device Collaborative Pipeline Parallel Inference in the Edge-Cloud Scenario

Joint Optimization of Device Placement and Model Partitioning for Cooperative DNN Inference in Heterogeneous Edge Computing

EdgeLD: Locally Distributed Deep Learning Inference on Edge Device Clusters

Communication-Efficient Separable Neural Network for Distributed Inference on Edge Devices

EdgeSP: Scalable Multi-device Parallel DNN Inference on Heterogeneous Edge Clusters

CoEdge: Cooperative DNN Inference With Adaptive Workload Partitioning Over Heterogeneous Edge Devices

AccEPT: an Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training

Joint Architecture Design and Workload Partitioning for DNN Inference on Industrial IoT Clusters

Accelerating DNN Inference by Edge-Cloud Collaboration

Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy

Distributed DNN Inference with Fine-grained Model Partitioning in Mobile Edge Computing Networks

Sub-model Parallelism: A Scale-out Deployment Method for Large Multi-modal DNNs

ADDA: Adaptive Distributed DNN Inference Acceleration in Edge Computing Environment

Accelerating Deep Neural Network Tasks Through Edge-Device Adaptive Inference

Distributed Assignment With Load Balancing for DNN Inference at the Edge

EdgeMesh: A Hybrid Distributed Training Mechanism for Heterogeneous Edge Devices.

Toward Collaborative Inferencing of Deep Neural Networks on Internet-of-Things Devices

Joint multi-user DNN partitioning and task offloading in mobile edge computing

A DNN inference acceleration algorithm combining model partition and task allocation in heterogeneous edge computing system