RTMFusion: an Enhanced Dual-Stream Architecture Algorithm Fusing RGB and Depth Features for Instance Segmentation of Tomato Organs

Jiacheng Rong,Wanli Zheng,Zhongxian Qi,Ting Yuan,Pengbo Wang
DOI: https://doi.org/10.1016/j.measurement.2024.115484
IF: 5.6
2025-01-01
Measurement
Abstract:Accurate and rapid acquisition of semantically annotated point clouds from tomato cultivation regions is crucial for autonomous tomato harvesting robots. This entails the robot's ability to perceive tomato point clouds from various locations and identify the stems and pedicels connecting the fruits. However, amidst background interference from similar stem and pedicel features, the heterogeneous growth postures of tomato clusters and the slender and short nature of pedicels pose challenges in semantic point cloud segmentation and matching of different tomato fruits, pedicels, and stems. Leveraging the salient feature representation of depth images in fine contours and distance dimensions, this paper introduces RTMFusion, a lightweight dual-stream backbone instance segmentation model that integrates color images and depth images. A Depth Enhanced Feature Fusion module is designed for multi-modal feature fusion. Fruit-bearing organ matching is facilitated by a method based on distance constraints and conditional search. Key characteristics are extracted from semantic point clouds, and an optimal condition search is conducted on candidates meeting constraint conditions. On the test dataset, our model improved the mask mAP by 4.6 % compared to the baseline, achieving 72.1 % with a speed of 17.1 ms. Our matching method achieved 99.2 % pedicel matching accuracy and 97.3 % stem matching accuracy on 150 greenhouse samples. These results demonstrate the effectiveness of accurately identifying and matching tomato organs in real-world scenarios, addressing the challenge from complex greenhouse environments. This integrated visual system enhances the success rate of visual identification and the precision of cutting position localization for tomato harvesting robots.
What problem does this paper attempt to address?