Abstract:We introduce MCUBench, a benchmark featuring over 100 YOLO-based object detection models evaluated on the VOC dataset across seven different MCUs. This benchmark provides detailed data on average precision, latency, RAM, and Flash usage for various input resolutions and YOLO-based one-stage detectors. By conducting a controlled comparison with a fixed training pipeline, we collect comprehensive performance metrics. Our Pareto-optimal analysis shows that integrating modern detection heads and training techniques allows various YOLO architectures, including legacy models like YOLOv3, to achieve a highly efficient tradeoff between mean Average Precision (mAP) and latency. MCUBench serves as a valuable tool for benchmarking the MCU performance of contemporary object detectors and aids in model selection based on specific constraints.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the challenges of deploying YOLO - series object detection models on resource - constrained microcontroller units (MCUs). Specifically, the goals of the paper are: 1. **Establish a benchmarking framework**: - Provide a benchmarking platform named MCUBench, which evaluates the performance of more than 100 YOLO - based object detection models on different MCUs. - These models are trained on the VOC dataset and their inference performance is evaluated on multiple MCU platforms. 2. **Optimize model performance**: - By controlling experimental conditions (such as a unified training process), study the impact of different components (backbone, neck, head) of the YOLO model on MCU performance. - Explore how modern detection heads and advanced training techniques enable different YOLO architectures (including older models such as YOLOv3) to achieve an efficient trade - off between mAP (mean Average Precision) and low latency on MCUs. 3. **Provide a model selection tool**: - Provide trained model weights for application developers, so that they can select and fine - tune models according to specific accuracy, latency, RAM, and Flash requirements without independently training multiple models. 4. **Fill the gaps in existing benchmarking**: - Existing benchmarking is either for extremely simple deep - learning tasks or exceeds the memory and computational limits of MCUs. MCUBench focuses on complex datasets and models to support more advanced industrial application scenarios. Through these efforts, the paper hopes to promote the development of the TinyML (Tiny Machine Learning) field, especially in achieving efficient object detection tasks on edge devices. ### Key formulas and concepts - **mAP (mean Average Precision)**: An indicator for measuring the accuracy of object detection models, defined as the mean of the average precision of all classes: \[ \text{mAP}=\frac{1}{N}\sum_{i = 1}^{N}\text{AP}_i \] where \(N\) is the number of classes and \(\text{AP}_i\) is the average precision of the \(i\)-th class. - **CIoU Loss**: An improved loss function for improving the accuracy of bounding box regression: \[ \text{CIoU}=\text{IoU}-\frac{(arctan(w_1h_1)-arctan(w_2h_2))^2}{\pi^2} \] where \(\text{IoU}\) is the intersection - over - union ratio, and \(w_1, h_1\) and \(w_2, h_2\) are the widths and heights of the predicted box and the ground - truth box respectively. - **DFL Loss (Distributed Focal Loss)**: A loss function for classification tasks, especially suitable for imbalanced datasets: \[ \text{DFL}=-\sum_{i = 1}^{C}y_i\log(p_i) \] where \(C\) is the number of classes, \(y_i\) is the true label, and \(p_i\) is the predicted probability. Through these formulas and methods, the paper provides detailed performance evaluation and optimization strategies to help researchers and developers better understand and deploy efficient object detection models on MCUs.

MCUBench: A Benchmark of Tiny Object Detectors on MCUs

YOLOBench: Benchmarking Efficient Object Detectors on Embedded Systems

DSORT-MCU: Detecting Small Objects in Real-Time on Microcontroller Units

microYOLO: Towards Single-Shot Object Detection on Microcontrollers

Benchmarking Object Detection Deep Learning Models in Embedded Devices

TinyissimoYOLO: A Quantized, Low-Memory Footprint, TinyML Object Detection Network for Low Power Microcontrollers

MODIPHY: Multimodal Obscured Detection for IoT using PHantom Convolution-Enabled Faster YOLO

Benchmarking Deep Learning Models for Object Detection on Edge Computing Devices

I-YOLO: a novel single-stage framework for small object detection

A Novel Pre-Processing Approach and Benchmarking Analysis for Faster, Robust, and Improved Small Object Detection Methods

ML-MCU: A Framework to Train ML Classifiers on MCU-based IoT Edge Devices

Tiny SSD: A Tiny Single-Shot Detection Deep Convolutional Neural Network for Real-Time Embedded Object Detection

MLonMCU: TinyML Benchmarking with Fast Retargeting

YOLO Adaptive Developments in Complex Natural Environments for Tiny Object Detection

MC-YOLOv5: A Multi-Class Small Object Detection Algorithm

Mob-YOLO: A Lightweight UAV Object Detection Method

An Empirical Study of Object Detectors and Its Verification on the Embedded Object Detection Model Competition

Mixed YOLOv3-LITE: A Lightweight Real-Time Object Detection Method

Research Towards Yolo-Series Algorithms: Comparison and Analysis of Object Detection Models for Real-Time UAV Applications

A Comparative Analysis of Modern Object Detection Algorithms: YOLO vs. SSD vs. Faster R-CNN