MCUBench: A Benchmark of Tiny Object Detectors on MCUs

Sudhakar Sah,Darshan C. Ganji,Matteo Grimaldi,Ravish Kumar,Alexander Hoffman,Honnesh Rohmetra,Ehsan Saboori
2024-09-28
Abstract:We introduce MCUBench, a benchmark featuring over 100 YOLO-based object detection models evaluated on the VOC dataset across seven different MCUs. This benchmark provides detailed data on average precision, latency, RAM, and Flash usage for various input resolutions and YOLO-based one-stage detectors. By conducting a controlled comparison with a fixed training pipeline, we collect comprehensive performance metrics. Our Pareto-optimal analysis shows that integrating modern detection heads and training techniques allows various YOLO architectures, including legacy models like YOLOv3, to achieve a highly efficient tradeoff between mean Average Precision (mAP) and latency. MCUBench serves as a valuable tool for benchmarking the MCU performance of contemporary object detectors and aids in model selection based on specific constraints.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the challenges of deploying YOLO - series object detection models on resource - constrained microcontroller units (MCUs). Specifically, the goals of the paper are: 1. **Establish a benchmarking framework**: - Provide a benchmarking platform named MCUBench, which evaluates the performance of more than 100 YOLO - based object detection models on different MCUs. - These models are trained on the VOC dataset and their inference performance is evaluated on multiple MCU platforms. 2. **Optimize model performance**: - By controlling experimental conditions (such as a unified training process), study the impact of different components (backbone, neck, head) of the YOLO model on MCU performance. - Explore how modern detection heads and advanced training techniques enable different YOLO architectures (including older models such as YOLOv3) to achieve an efficient trade - off between mAP (mean Average Precision) and low latency on MCUs. 3. **Provide a model selection tool**: - Provide trained model weights for application developers, so that they can select and fine - tune models according to specific accuracy, latency, RAM, and Flash requirements without independently training multiple models. 4. **Fill the gaps in existing benchmarking**: - Existing benchmarking is either for extremely simple deep - learning tasks or exceeds the memory and computational limits of MCUs. MCUBench focuses on complex datasets and models to support more advanced industrial application scenarios. Through these efforts, the paper hopes to promote the development of the TinyML (Tiny Machine Learning) field, especially in achieving efficient object detection tasks on edge devices. ### Key formulas and concepts - **mAP (mean Average Precision)**: An indicator for measuring the accuracy of object detection models, defined as the mean of the average precision of all classes: \[ \text{mAP}=\frac{1}{N}\sum_{i = 1}^{N}\text{AP}_i \] where \(N\) is the number of classes and \(\text{AP}_i\) is the average precision of the \(i\)-th class. - **CIoU Loss**: An improved loss function for improving the accuracy of bounding box regression: \[ \text{CIoU}=\text{IoU}-\frac{(arctan(w_1h_1)-arctan(w_2h_2))^2}{\pi^2} \] where \(\text{IoU}\) is the intersection - over - union ratio, and \(w_1, h_1\) and \(w_2, h_2\) are the widths and heights of the predicted box and the ground - truth box respectively. - **DFL Loss (Distributed Focal Loss)**: A loss function for classification tasks, especially suitable for imbalanced datasets: \[ \text{DFL}=-\sum_{i = 1}^{C}y_i\log(p_i) \] where \(C\) is the number of classes, \(y_i\) is the true label, and \(p_i\) is the predicted probability. Through these formulas and methods, the paper provides detailed performance evaluation and optimization strategies to help researchers and developers better understand and deploy efficient object detection models on MCUs.