Real-time Object Detection Method with Single-Domain Generalization Based on YOLOv8

Yipeng Zhou,Huaming Qian
DOI: https://doi.org/10.1007/s11554-024-01572-z
IF: 2.293
2024-01-01
Journal of Real-Time Image Processing
Abstract:The prevailing models for object detection are often beset by a dearth of generalizability across domains. Specifically, while these models may perform exceptionally well on a given dataset, their efficacy can plummet when confronted with novel domains that lie beyond their training purview. The single-domain generalization methods based on Faster R-CNN are constrained by the underlying strategies, which not only exhibit slow speeds and suboptimal accuracy levels but also demonstrate inadequate generalization. This paper proposes a Complementary Pseudo Multi-domain Generation Method based on YOLOv8 (Y-CPMG). The methodology fortifies the generalization prowess by fabricating a spectrum of pseudo domain information within the feature space. To elaborate, we harness the capabilities of pre-trained visual-language model, leveraging textual prompts to extract domain-specific feature enhancements. These enhancements are then amalgamated with the original images to simulate multi-domain scenarios. Building on this foundation, we delve deeper into the nuances of the real world by introducing normalization perturbation (NP) to uncover a variety of latent domain styles. This approach addresses potential limitations in visual-language models when emulating scenes of diverse styles. Empirical evaluations conducted across a spectrum of weather-diverse public datasets have demonstrated that the proposed method achieves a marked enhancement in performance for the task of domain generalization object detection. With an input dimension of 3 × 608 × 1088, the detection speed reaches 38 FPS, which represents a 65.2 % improvement over Faster R-CNN-based methods, fully meeting the requirements for real-time processing.
What problem does this paper attempt to address?