Abstract:Microcontroller Units (MCUs) are ideal platforms for edge applications due to their low cost and energy consumption, and are widely used in various applications, including personalized machine learning tasks, where customized models can enhance the task adaptation. However, existing approaches for local on-device personalization mostly support simple ML architectures or require complex local pre-training/training, leading to high energy consumption and negating the low-energy advantage of MCUs. In this paper, we introduce $MicroT$, an efficient and low-energy MCU personalization approach. $MicroT$ includes a robust, general, but tiny feature extractor, developed through self-supervised knowledge distillation, which trains a task-specific head to enable independent on-device personalization with minimal energy and computational requirements. MicroT implements an MCU-optimized early-exit inference mechanism called stage-decision to further reduce energy costs. This mechanism allows for user-configurable exit criteria (stage-decision ratio) to adaptively balance energy cost with model performance. We evaluated MicroT using two models, three datasets, and two MCU boards. $MicroT$ outperforms traditional transfer learning (TTL) and two SOTA approaches by 2.12 - 11.60% across two models and three datasets. Targeting widely used energy-aware edge devices, MicroT's on-device training requires no additional complex operations, halving the energy cost compared to SOTA approaches by up to 2.28X while keeping SRAM usage below 1MB. During local inference, MicroT reduces energy cost by 14.17% compared to TTL across two boards and two datasets, highlighting its suitability for long-term use on energy-aware resource-constrained MCUs.

What problem does this paper attempt to address?

### Problems Addressed by the Paper The paper aims to address the high energy consumption issue faced by microcontroller units (MCUs) during local personalization on edge devices. Specifically, existing local device personalization methods either support only simple machine learning architectures or require complex local pre-training or training processes, leading to high energy consumption and undermining the low-energy advantage of MCUs. ### Main Contributions 1. **Low-Energy Model Personalization Framework**: The paper proposes a low-energy model personalization framework named **𝑀𝑖𝑐𝑟𝑜𝑇**, designed specifically for resource-constrained MCUs. This framework aims to reduce energy consumption while maintaining personalization performance. 2. **Combination of Self-Supervised Learning and Knowledge Distillation**: 𝑀𝑖𝑐𝑟𝑜𝑇 leverages self-supervised learning (SSL) and knowledge distillation (KD) to obtain a robust yet compact feature extractor. By training large models on cloud devices and generating a synthetic dataset, this dataset is then used to train a small feature extractor, achieving high performance on limited MCU resources. 3. **MCU-Optimized Early Exit Mechanism**: 𝑀𝑖𝑐𝑟𝑜𝑇 introduces an MCU-optimized early exit mechanism (stage-decision), dividing the feature extractor into partial and full models. Using classification confidence as the exit criterion, it dynamically balances energy consumption and model performance. 4. **Flexible Stage Decision Ratio Configuration**: Users can adjust the stage decision ratio based on actual needs, flexibly choosing the balance point between performance and energy consumption. ### Key Technologies of the Solution 1. **Separation of Feature Extractor and Classifier**: 𝑀𝑖𝑐𝑟𝑜𝑇 adopts the Transfer Learning (𝑇𝑇𝐿) paradigm, separating the model into a feature extractor and a classifier. The feature extractor is trained on cloud devices, while the classifier is trained on the MCU, reducing local computational burden and energy consumption. 2. **Self-Supervised Learning**: Enhances the generalization ability of the feature extractor through self-supervised learning, enabling it to adapt to diverse local data. 3. **Knowledge Distillation**: Uses knowledge distillation to transfer the knowledge of large models to the small feature extractor, achieving high performance on resource-constrained MCUs. 4. **Model Partitioning and Joint Training**: By partitioning the model into partial and full models and employing joint training, it further optimizes performance and reduces energy consumption during the inference process. ### Experimental Results - **Performance Improvement**: 𝑀𝑖𝑐𝑟𝑜𝑇 achieves an accuracy improvement of 2.12% to 11.60% over traditional Transfer Learning (𝑇𝑇𝐿) and two other state-of-the-art methods across two models and three datasets. - **Energy Consumption Reduction**: During the MCU training phase, 𝑀𝑖𝑐𝑟𝑜𝑇 saves 2.03 to 2.28 times more energy compared to state-of-the-art methods; during the MCU inference phase, 𝑀𝑖𝑐𝑟𝑜𝑇 reduces energy consumption by 14.17% compared to 𝑇𝑇𝐿. ### Conclusion By introducing the **𝑀𝑖𝑐𝑟𝑜𝑇** framework, the paper successfully addresses the high energy consumption issue of MCUs during local personalization on edge devices. Through techniques such as self-supervised learning, knowledge distillation, model partitioning, and joint training, it achieves low-energy, high-performance model personalization, suitable for long-term operation on resource-constrained MCUs.

Low-Energy On-Device Personalization for MCUs

Explore Training of Deep Convolutional Neural Networks on Battery-powered Mobile Devices: Design and Application

Towards Machine Learning and Inference for Resource-constrained MCUs

MOC: Multi-Objective Mobile CPU-GPU Co-Optimization for Power-Efficient DNN Inference

Decoupled Access-Execute enabled DVFS for tinyML deployments on STM32 microcontrollers

ML-MCU: A Framework to Train ML Classifiers on MCU-based IoT Edge Devices

vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs

TinyTrain: Resource-Aware Task-Adaptive Sparse Training of DNNs at the Data-Scarce Edge

iMCU: A 28-nm Digital In-Memory Computing-Based Microcontroller Unit for TinyML

Efficient Neural Networks for Tiny Machine Learning: A Comprehensive Review

Optimizing TinyML: The Impact of Reduced Data Acquisition Rates for Time Series Classification on Microcontrollers

Custom Hardware Inference Accelerator for TensorFlow Lite for Microcontrollers

Automated deep‐learning model optimization framework for microcontrollers

An Ultra-low Power TinyML System for Real-time Visual Processing at Edge

MCUFormer: Deploying Vision Tranformers on Microcontrollers with Limited Memory.

Efficient Neural Network Deployment for Microcontroller

Optimizing the Deployment of Tiny Transformers on Low-Power MCUs

DSORT-MCU: Detecting Small Objects in Real-Time on Microcontroller Units

Reduced Precision Floating-Point Optimization for Deep Neural Network On-Device Learning on MicroControllers

MCUFormer: Deploying Vision Transformers on Microcontrollers with Limited Memory

MCUNet: Tiny Deep Learning on IoT Devices