ElasticAI: Creating and Deploying Energy-Efficient Deep Learning Accelerator for Pervasive Computing

Chao Qian,Tianheng Ling,Gregor Schiele
DOI: https://doi.org/10.1109/PerComWorkshops56833.2023.10150398
2024-08-29
Abstract:Deploying Deep Learning (DL) on embedded end devices is a scorching trend in pervasive computing. Since most Microcontrollers on embedded devices have limited computing power, it is necessary to add a DL accelerator. Embedded Field Programmable Gate Arrays (FPGAs) are suitable for deploying DL accelerators for embedded devices, but developing an energy-efficient DL accelerator on an FPGA is not easy. Therefore, we propose the ElasticAI-Workflow that aims to help DL developers to create and deploy DL models as hardware accelerators on embedded FPGAs. This workflow consists of two key components: the ElasticAI-Creator and the Elastic Node. The former is a toolchain for automatically generating DL accelerators on FPGAs. The latter is a hardware platform for verifying the performance of the generated accelerators. With this combination, the performance of the accelerator can be sufficiently guaranteed. We will demonstrate the potential of our approach through a case study.
Hardware Architecture,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the lack of energy efficiency and computing power when deploying deep learning (DL) models on embedded devices. Specifically, due to the limited computing power of most embedded microcontrollers (MCUs), it becomes very difficult to directly run complex deep - learning models on these devices, especially in terms of meeting real - time requirements. To solve this problem, the author proposes a workflow named ElasticAI, which aims to help deep - learning developers create and deploy deep - learning accelerators that can run efficiently on embedded FPGAs. ### Main problems: 1. **Insufficient computing power**: Microcontrollers on embedded devices usually have limited computing power and cannot execute deep - learning models efficiently. 2. **Energy - efficiency requirements**: In order to prolong battery life and reduce energy consumption, it is necessary to ensure that the deep - learning accelerator has high energy efficiency. 3. **High development difficulty**: Designing a deep - learning accelerator suitable for embedded FPGAs requires in - depth FPGA engineering knowledge, which is a challenge for most deep - learning developers. 4. **Inaccurate power consumption measurement**: Existing hardware platforms can only provide overall power consumption measurements and cannot provide fine - grained power consumption data, making it difficult to guide optimization work. ### Main components of ElasticAI - Workflow: 1. **ElasticAI - Creator**: This is a tool chain that can help developers automatically convert deep - learning models in PyTorch into RTL representations suitable for FPGAs. This greatly reduces the requirement for FPGA expertise. 2. **Elastic Node**: This is a customized hardware platform used to verify the energy efficiency of the generated deep - learning accelerator in a real - environment. It can provide fine - grained power consumption measurements, thereby guiding optimization work. Through the combination of these two components, ElasticAI - Workflow can effectively help developers create and deploy efficient deep - learning accelerators while ensuring that their performance and energy efficiency meet application requirements. ### Key points of the solution: - **Automated conversion**: Through ElasticAI - Creator, developers can convert deep - learning models into hardware accelerators without in - depth knowledge of FPGAs. - **Fine - grained power consumption measurement**: Elastic Node provides detailed power consumption data, allowing developers to optimize according to the actual measurement results. - **Feedback loop**: The entire workflow includes multiple stages, and each stage can be optimized and adjusted according to the report until the performance and energy efficiency requirements are met. In summary, the core problem of this paper is to improve the energy efficiency and computing power of deep - learning models on embedded devices, and the proposed ElasticAI - Workflow provides a systematic solution to address these challenges.