Improving real-time apple fruit detection: Multi-modal data and depth fusion with non-targeted background removal

Shaghaf Kaukab,Komal,Bhupendra M Ghodki,Hena Ray,Yogesh B. Kalnar,Kairam Narsaiah,Jaskaran S. Brar
DOI: https://doi.org/10.1016/j.ecoinf.2024.102691
IF: 5.1
2024-06-26
Ecological Informatics
Abstract:In automated fruit detection, RGB-Depth (RGB-D) images aid the detection model with additional depth information to enhance detection accuracy. However, outdoor depth images are usually of low quality, which limits the quality of depth data. In this study, an approach/technique for real-time apple fruit detection in a high-density orchard environment by using multi-modal data is presented. Non-targeted background removal using the depth fusion (NBR-DF) method was developed to reduce the high noise condition of depth images. The noise occurred due to the uncontrolled lighting condition and holes with incomplete depth information in the depth images. NBR-DF technique follows three primary steps: pre-processing of depth images (point cloud generation), target object extraction, and background removal. The NBR-DF method serves as a pipeline to pre-process multi-modal data to enhance features of depth images by filling holes to eliminate noise generated by depth holes. Further, the NBR-DF implemented with the YOLOv5 enhances the detection accuracy in dense orchard conditions by using multi-modal information as input. An attention-based depth fusion module that adaptively fuses the multi-modal features was developed. The integration of the depth-attention matrix involved pooling operations and sigmoid normalization, both of which are efficient methods for summarizing and normalizing depth information. The fusion module improves the identification of multiscale objects and strengthens the network's resistance to noise. The network then detects the fruit position using multiscale information from the RGB-D images in highly complex orchard environments. The detection results were compared and validated with other methods using different input modals and fusion strategies. The results showed that the detection accuracy using the NBR-DF approach achieved an average precision rate of 0.964 in real time. The performance comparison with other state-of-the-art methods and the model generalization study also establish that the present advanced depth-fusion attention mechanism and effective preprocessing steps in NBR-DF-YOLOv5 significantly surpass those in performance. In conclusion, the developed NBR-DF technique showed the potential to improve real-time apple fruit detection using multi-modal information.
ecology
What problem does this paper attempt to address?