Low-Energy On-Device Personalization for MCUs

Yushan Huang,Ranya Aloufi,Xavier Cadet,Yuchen Zhao,Payam Barnaghi,Hamed Haddadi
2024-10-01
Abstract:Microcontroller Units (MCUs) are ideal platforms for edge applications due to their low cost and energy consumption, and are widely used in various applications, including personalized machine learning tasks, where customized models can enhance the task adaptation. However, existing approaches for local on-device personalization mostly support simple ML architectures or require complex local pre-training/training, leading to high energy consumption and negating the low-energy advantage of MCUs. In this paper, we introduce $MicroT$, an efficient and low-energy MCU personalization approach. $MicroT$ includes a robust, general, but tiny feature extractor, developed through self-supervised knowledge distillation, which trains a task-specific head to enable independent on-device personalization with minimal energy and computational requirements. MicroT implements an MCU-optimized early-exit inference mechanism called stage-decision to further reduce energy costs. This mechanism allows for user-configurable exit criteria (stage-decision ratio) to adaptively balance energy cost with model performance. We evaluated MicroT using two models, three datasets, and two MCU boards. $MicroT$ outperforms traditional transfer learning (TTL) and two SOTA approaches by 2.12 - 11.60% across two models and three datasets. Targeting widely used energy-aware edge devices, MicroT's on-device training requires no additional complex operations, halving the energy cost compared to SOTA approaches by up to 2.28X while keeping SRAM usage below 1MB. During local inference, MicroT reduces energy cost by 14.17% compared to TTL across two boards and two datasets, highlighting its suitability for long-term use on energy-aware resource-constrained MCUs.
Machine Learning,Hardware Architecture
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper aims to address the high energy consumption issue faced by microcontroller units (MCUs) during local personalization on edge devices. Specifically, existing local device personalization methods either support only simple machine learning architectures or require complex local pre-training or training processes, leading to high energy consumption and undermining the low-energy advantage of MCUs. ### Main Contributions 1. **Low-Energy Model Personalization Framework**: The paper proposes a low-energy model personalization framework named **π‘€π‘–π‘π‘Ÿπ‘œπ‘‡**, designed specifically for resource-constrained MCUs. This framework aims to reduce energy consumption while maintaining personalization performance. 2. **Combination of Self-Supervised Learning and Knowledge Distillation**: π‘€π‘–π‘π‘Ÿπ‘œπ‘‡ leverages self-supervised learning (SSL) and knowledge distillation (KD) to obtain a robust yet compact feature extractor. By training large models on cloud devices and generating a synthetic dataset, this dataset is then used to train a small feature extractor, achieving high performance on limited MCU resources. 3. **MCU-Optimized Early Exit Mechanism**: π‘€π‘–π‘π‘Ÿπ‘œπ‘‡ introduces an MCU-optimized early exit mechanism (stage-decision), dividing the feature extractor into partial and full models. Using classification confidence as the exit criterion, it dynamically balances energy consumption and model performance. 4. **Flexible Stage Decision Ratio Configuration**: Users can adjust the stage decision ratio based on actual needs, flexibly choosing the balance point between performance and energy consumption. ### Key Technologies of the Solution 1. **Separation of Feature Extractor and Classifier**: π‘€π‘–π‘π‘Ÿπ‘œπ‘‡ adopts the Transfer Learning (𝑇𝑇𝐿) paradigm, separating the model into a feature extractor and a classifier. The feature extractor is trained on cloud devices, while the classifier is trained on the MCU, reducing local computational burden and energy consumption. 2. **Self-Supervised Learning**: Enhances the generalization ability of the feature extractor through self-supervised learning, enabling it to adapt to diverse local data. 3. **Knowledge Distillation**: Uses knowledge distillation to transfer the knowledge of large models to the small feature extractor, achieving high performance on resource-constrained MCUs. 4. **Model Partitioning and Joint Training**: By partitioning the model into partial and full models and employing joint training, it further optimizes performance and reduces energy consumption during the inference process. ### Experimental Results - **Performance Improvement**: π‘€π‘–π‘π‘Ÿπ‘œπ‘‡ achieves an accuracy improvement of 2.12% to 11.60% over traditional Transfer Learning (𝑇𝑇𝐿) and two other state-of-the-art methods across two models and three datasets. - **Energy Consumption Reduction**: During the MCU training phase, π‘€π‘–π‘π‘Ÿπ‘œπ‘‡ saves 2.03 to 2.28 times more energy compared to state-of-the-art methods; during the MCU inference phase, π‘€π‘–π‘π‘Ÿπ‘œπ‘‡ reduces energy consumption by 14.17% compared to 𝑇𝑇𝐿. ### Conclusion By introducing the **π‘€π‘–π‘π‘Ÿπ‘œπ‘‡** framework, the paper successfully addresses the high energy consumption issue of MCUs during local personalization on edge devices. Through techniques such as self-supervised learning, knowledge distillation, model partitioning, and joint training, it achieves low-energy, high-performance model personalization, suitable for long-term operation on resource-constrained MCUs.