ZIP-CNN: Design Space Exploration for CNN Implementation within a MCU

Thomas Garbay,Khalil Hachicha,Petr Dobias,Andrea Pinna,Karim Hocine,Wilfried Dron,Pedro Lusich,Imane Khalis,Bertrand Granado
DOI: https://doi.org/10.1145/3691343
2024-09-27
ACM Transactions on Embedded Computing Systems
Abstract:Embedded systems based on Microcontroller Units (MCUs) often gather significant quantities of data and solve various issues. Convolutional Neural Networks (CNNs) have proven their effectiveness in solving computer vision and natural language processing tasks. However, implementing CNNs within MCUs is challenging due to their high inference costs, which varies widely depending on hardware targets and CNN topologies. Despite state-of-the-art advancements, no efficient design space exploration solutions handle the wide variety of implementation solutions. In this article, we introduce the ZIP-CNN design space exploration methodology, which facilitates CNN implementation within MCUs. We developed a model that quantitatively estimates the latency, energy consumption, and memory space required to run a CNN within an MCU. This model accounts for algorithmic reductions such as knowledge distillation, pruning, or quantization and applies to any CNN topology. To demonstrate the efficiency of our methodology, we investigated LeNet5, ResNet8, and ResNet26 within three different MCUs. We made materials and supplementary results available in a GitHub repository: https://github.com/ThGbay/ZIP-CNN . The proposed method was empirically verified on three hardware targets running at 14 different operating frequencies. The three CNN topologies investigated were implemented in their default configuration in FP32, and also reduced with INT8 quantization, pruning at five different rates and with knowledge distillation. The estimates of our model are very reliable with an error of 3.29% to 15.23% for latency, 3.12% to 10.34% for energy consumption, and 1.95% to 6.31% for memory space. These results are based on on-device measurements.
computer science, software engineering, hardware & architecture
What problem does this paper attempt to address?