ESC-YOLO: optimizing apple fruit recognition with efficient spatial and channel features in YOLOX

Jun Sun,Yifei Peng,Chen Chen,Bing Zhang,Zhaoqi Wu,Yilin Jia,Lei Shi
DOI: https://doi.org/10.1007/s11554-024-01540-7
IF: 2.293
2024-09-01
Journal of Real-Time Image Processing
Abstract:Accurate localization of apple fruits and recognition of occlusion types in complex orchard environments play an important role in precision agriculture. This work proposes an efficient fruit recognition model called Efficient Spatial and Channel Feature YOLOX (ESC-YOLO). ESC-YOLO is built upon YOLOX and fully leverages and emphasizes spatial channel information, ensuring coherence between global information and local features. The optimization strategies for the backbone network involve adopting EfficientViT as the foundational backbone, integrating Spatial and Channel Reconstruction Convolution (SCConv) into the input stem to reorganize spatial channel features and reduce redundancy, and constructing the Efficient-MBConv module, which is optimally combined with the EfficientViTBlock for feature extraction. The optimization strategies for the neck network involve utilizing the Centralized Feature Pyramid Net (CFPNet) as the neck network and employing a Simple, Parameter-Free Attention Module (SimAM) to enhance model performance. In this work, we adopted the lightweight model of the ESC-YOLO for performance evaluation, namely ESC-YOLO-S. It achieves a 4.26% improvement in Top-1 mean Average Precision (mAP) compared to YOLOX-S and significantly reduces the false and missed detections caused by various types of occlusions. Therefore, the improved model meets the requirements for high-precision identification in complex orchard environments.
computer science, artificial intelligence,engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?