Abstract:Deep Learning approaches based on Convolutional Neural Networks (CNNs) are extensively utilized and very successful in a wide range of application areas, including image classification and speech recognition. For the execution of trained CNNs, i.e. model inference, we nowadays witness a shift from the Cloud to the Edge. Unfortunately, deploying and inferring large, compute and memory intensive CNNs on edge devices is challenging because these devices typically have limited power budgets and compute/memory resources. One approach to address this challenge is to leverage all available resources across multiple edge devices to deploy and execute a large CNN by properly partitioning the CNN and running each CNN partition on a separate edge device. Although such distribution, deployment, and execution of large CNNs on multiple edge devices is a desirable and beneficial approach, there currently does not exist a design and programming framework that takes a trained CNN model, together with a CNN partitioning specification, and fully automates the CNN model splitting and deployment on multiple edge devices to facilitate distributed CNN inference at the Edge. Therefore, in this paper, we propose a novel framework, called AutoDiCE, for automated splitting of a CNN model into a set of sub-models and automated code generation for distributed and collaborative execution of these sub-models on multiple, possibly heterogeneous, edge devices, while supporting the exploitation of parallelism among and within the edge devices. Our experimental results show that AutoDiCE can deliver distributed CNN inference with reduced energy consumption and memory usage per edge device, and improved overall system throughput at the same time.

PICO: Pipeline Inference Framework for Versatile CNNs on Diverse Mobile Devices

MOC: Multi-Objective Mobile CPU-GPU Co-Optimization for Power-Efficient DNN Inference

Extendable Multi-Device Collaborative Pipeline Parallel Inference in the Edge-Cloud Scenario

Automated Exploration and Implementation of Distributed CNN Inference at the Edge

High-Throughput CNN Inference on Embedded ARM big.LITTLE Multi-Core Processors

DeepSlicing: Collaborative and Adaptive CNN Inference with Low Latency

Cooperative Inference with Interleaved Operator Partitioning for CNNs

Enhancing Distributed In-Situ CNN Inference in the Internet of Things

AutoDiCE: Fully Automated Distributed CNN Inference at the Edge

Cappuccino: Efficient Inference Software Synthesis for Mobile System-on-Chips

Researching the CNN Collaborative Inference Mechanism for Heterogeneous Edge Devices

CoCoPIE: Making Mobile AI Sweet As PIE --Compression-Compilation Co-Design Goes a Long Way

A Unified Optimization Approach for CNN Model Inference on Integrated GPUs

DeeperThings: Fully Distributed CNN Inference on Resource-Constrained Edge Devices

DistrEdge: Speeding up Convolutional Neural Network Inference on Distributed Edge Devices

IOS: Inter-Operator Scheduler for CNN Acceleration

Accelerating DNN Inference with Heterogeneous Multi-DPU Engines

EdgeCI: Distributed Workload Assignment and Model Partitioning for CNN Inference on Edge Clusters

INCA: INterruptible CNN Accelerator for Multi-tasking in Embedded Robots

Distributed Convolutional Neural Network Training on Mobile and Edge Clusters