Detection in Complex Scenes Using Rgb and Depth Multimodal Feature Fusion.

Shengli Yan,Yuan Rao,Wenhui Hou
DOI: https://doi.org/10.1109/ICASSP48485.2024.10448205
2024-01-01
Abstract:Unlike RGB images, depth images are robust to complex scenes of densely planted orchards. In this paper, we propose a fruit detection method using a multimodal feature fusion module (MMFF) of RGB and depth images. A dual-stream convolutional neural network is adopted in our method for feature extraction to capture multi-scale information of RGB images and depth images based on feature pyramids. The multimodal feature fusion module can filter similar and different features between modalities to suppress the same features and fuse different features. In addition, we use a multi-scale feature fusion method to fuse more information and improve the accuracy of fruit detection. To validate the effectiveness of our method, experimental research is conducted on a self-created pear dataset with multiple modalities. Extensive experiments demonstrate that our proposed approach can achieve state-of-the-art performance at low computation cost.
What problem does this paper attempt to address?