TinyissimoYOLO: A Quantized, Low-Memory Footprint, TinyML Object Detection Network for Low Power Microcontrollers

Julian Moosmann,Marco Giordano,Christian Vogt,Michele Magno
DOI: https://doi.org/10.1109/AICAS57966.2023.10168657
2023-07-12
Abstract:This paper introduces a highly flexible, quantized, memory-efficient, and ultra-lightweight object detection network, called TinyissimoYOLO. It aims to enable object detection on microcontrollers in the power domain of milliwatts, with less than 0.5MB memory available for storing convolutional neural network (CNN) weights. The proposed quantized network architecture with 422k parameters, enables real-time object detection on embedded microcontrollers, and it has been evaluated to exploit CNN accelerators. In particular, the proposed network has been deployed on the MAX78000 microcontroller achieving high frame-rate of up to 180fps and an ultra-low energy consumption of only 196{\mu}J per inference with an inference efficiency of more than 106 MAC/Cycle. TinyissimoYOLO can be trained for any multi-object detection. However, considering the small network size, adding object detection classes will increase the size and memory consumption of the network, thus object detection with up to 3 classes is demonstrated. Furthermore, the network is trained using quantization-aware training and deployed with 8-bit quantization on different microcontrollers, such as STM32H7A3, STM32L4R9, Apollo4b and on the MAX78000's CNN accelerator. Performance evaluations are presented in this paper.
Computer Vision and Pattern Recognition,Hardware Architecture,Image and Video Processing
What problem does this paper attempt to address?
The paper aims to address the problem of achieving efficient, low-power object detection on resource-constrained microcontrollers (μCs). Specifically, the paper proposes a lightweight object detection network named TinyissimoYOLO, designed to achieve high-precision object detection with less than 0.5MB of memory and operate at milliwatt-level power consumption. The network reduces the number of parameters to 422k through quantization techniques and can be deployed on various microcontrollers, including STM32H7A3, STM32L4R9, Apollo4b, and MAX78000 with a built-in CNN accelerator. Experimental results show that TinyissimoYOLO performs excellently in terms of network performance, inference efficiency, and energy consumption, achieving up to 180fps frame rate and ultra-low energy consumption per inference (196µJ) on the MAX78000. Additionally, the paper explores the impact of different dataset constraints on model performance and demonstrates the capability of multi-class object detection. Overall, this study provides a new solution for real-time object detection on edge devices.