HTVM: Efficient Neural Network Deployment On Heterogeneous TinyML Platforms

Josse Van Delm,Maarten Vandersteegen,Alessio Burrello,Giuseppe Maria Sarda,Francesco Conti,Daniele Jahier Pagliari,Luca Benini,Marian Verhelst
DOI: https://doi.org/10.1109/DAC56929.2023.10247664
2024-06-12
Abstract:Optimal deployment of deep neural networks (DNNs) on state-of-the-art Systems-on-Chips (SoCs) is crucial for tiny machine learning (TinyML) at the edge. The complexity of these SoCs makes deployment non-trivial, as they typically contain multiple heterogeneous compute cores with limited, programmer-managed memory to optimize latency and energy efficiency. We propose HTVM - a compiler that merges TVM with DORY to maximize the utilization of heterogeneous accelerators and minimize data movements. HTVM allows deploying the MLPerf(TM) Tiny suite on DIANA, an SoC with a RISC-V CPU, and digital and analog compute-in-memory AI accelerators, at 120x improved performance over plain TVM deployment.
Programming Languages,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of efficiently deploying deep neural networks (DNNs) on heterogeneous TinyML platforms such as embedded systems. Specifically, it focuses on how to optimize the performance and energy efficiency of DNNs on resource - constrained edge devices, especially in modern system - on - chips (SoCs) that contain multiple heterogeneous computing cores and have limited, programmer - managed memory. #### Main problems: 1. **Complexity**: Modern SoCs usually contain multiple heterogeneous computing cores, which makes the deployment of DNNs complex. 2. **Memory limitations**: The computing cores in these SoCs usually have limited memory and need to be optimized to reduce data movement and improve latency and energy efficiency. 3. **Hardware - specific optimization**: Existing tool chains are either too general to fully utilize the hardware features of dedicated accelerators or too specific to adapt to different SoC architectures. #### Solutions: To solve these problems, the authors propose HTVM (Heterogeneous TinyML Virtual Machine), which is a compiler tool chain that combines the advantages of TVM (Tensor Virtual Machine) and DORY. The main contributions of HTVM include: 1. **Extending the TVM compilation process**: By introducing a memory - planning backend (based on DORY), HTVM can generate code and optimize data movement, maximizing the use of dedicated accelerator hardware. 2. **Hardware - aware tiling**: HTVM enables large layers to be efficiently executed on memory - constrained devices through hardware - aware tiling techniques. 3. **Multi - accelerator support**: HTVM can schedule multiple heterogeneous accelerators, reducing the number of kernel calls on the CPU and thus reducing the total latency. 4. **Performance verification**: HTVM has been extensively benchmarked on the DIANA platform, demonstrating significant performance improvements compared to other tool chains. Through these improvements, HTVM can achieve efficient DNN deployment on TinyML platforms, significantly improving performance and reducing memory footprint.