Multi-Module Model Refinement for Real-Time Object Detection

Kejian Xu,Jinglong Chen,Yi Ning
DOI: https://doi.org/10.1109/aips64124.2024.00060
2024-01-01
Abstract:In the vast domain of computer vision, real-time object detection continues to occupy a paramount position. The intricate balancing act between accuracy and speed in object detectors presents a formidable hurdle for both academic scholars and industrial experts. Despite the remarkable proficiency displayed by recent transformer-based models in harnessing the attention mechanism, resulting in substantial performance gains over CNNs, their heavy computational demands often hinder their practical application in real-time detection environments. In this study, we have chosen YOLOv9 as our sturdy foundation and embarked on a journey of multi-module model refinement, ultimately culminating in the development of a novel, high-performing object detector dubbed MR-YOLOv9. To capitalize further on the representational prowess of feature images, we have opted for BiFPN as a formidable feature extractor, replacing the baseline's original FPN structure. This enhancement bolsters the efficiency and efficacy of feature fusion by strengthening the bidirectional flow of features, thereby elevating object detection performance. Moreover, we introduce a groundbreaking loss function, Inner-CIoU, which further sharpens the performance of our object detector when combined with the innovative Inner-IoU loss function. To validate our proposed methodologies, we have conducted extensive ablation studies on the MS-COCO 2017 detection datasets. Remarkably, our MR-YOLOv9 has been trained solely on the MS-COCO dataset, without leveraging any prior knowledge, yet it achieves an impressive 56.0% AP on the COCO 2017 test set, surpassing YOLOv9-E by a significant margin of 0.4% AP, all without incurring additional inference costs.
What problem does this paper attempt to address?