Abstract:With the advancement of Deep Neural Networks (DNN) and large amounts of sensor data from Internet of Things (IoT) systems, the research community has worked to reduce the computational and resource demands of DNN to compute on low-resourced microcontrollers (MCUs). However, most of the current work in embedded deep learning focuses on solving a single task efficiently, while the multi-tasking nature and applications of IoT devices demand systems that can handle a diverse range of tasks (activity, voice, and context recognition) with input from a variety of sensors, simultaneously. In this paper, we propose YONO, a product quantization (PQ) based approach that compresses multiple heterogeneous models and enables in-memory model execution and switching for dissimilar multi-task learning on MCUs. We first adopt PQ to learn codebooks that store weights of different models. Also, we propose a novel network optimization and heuristics to maximize the compression rate and minimize the accuracy loss. Then, we develop an online component of YONO for efficient model execution and switching between multiple tasks on an MCU at run time without relying on an external storage device. YONO shows remarkable performance as it can compress multiple heterogeneous models with negligible or no loss of accuracy up to 12.37$\times$. Besides, YONO's online component enables an efficient execution (latency of 16-159 ms per operation) and reduces model loading/switching latency and energy consumption by 93.3-94.5% and 93.9-95.0%, respectively, compared to external storage access. Interestingly, YONO can compress various architectures trained with datasets that were not shown during YONO's offline codebook learning phase showing the generalizability of our method. To summarize, YONO shows great potential and opens further doors to enable multi-task learning systems on extremely resource-constrained devices.

Enabling Large Neural Networks on Tiny Microcontrollers with Swapping

Neural networks on microcontrollers: saving memory at inference via operator reordering

SwapNet: Efficient Swapping for DNN Inference on Edge AI Devices Beyond the Memory Budget

SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers

vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs

YONO: Modeling Multiple Heterogeneous Neural Networks on Microcontrollers

MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning

MCUFormer: Deploying Vision Tranformers on Microcontrollers with Limited Memory.

MCUFormer: Deploying Vision Transformers on Microcontrollers with Limited Memory

$μ$NAS: Constrained Neural Architecture Search for Microcontrollers

Efficient Neural Network Deployment for Microcontroller

MCUNet: Tiny Deep Learning on IoT Devices

A 3.89-Gops/mw Scalable Recurrent Neural Network Processor with Improved Efficiency on Memory and Computation

A Scatter-and-Gather Spiking Convolutional Neural Network on a Reconfigurable Neuromorphic Hardware

Leveraging Automated Mixed-Low-Precision Quantization for tiny edge microcontrollers

Efficient Neural Networks for Tiny Machine Learning: A Comprehensive Review

Pex: Memory-efficient Microcontroller Deep Learning through Partial Execution

RNC: Efficient RRAM-aware NAS and Compilation for DNNs on Resource-Constrained Edge Devices

ML-MCU: A Framework to Train ML Classifiers on MCU-based IoT Edge Devices

tinySNN: Towards Memory- and Energy-Efficient Spiking Neural Networks

Towards Efficient Neural Networks On-a-chip: Joint Hardware-Algorithm Approaches