Semantics-Aware Next-best-view Planning for Efficient Search and Detection of Task-relevant Plant Parts

Akshay K. Burusa,Joost Scholten,David Rapado Rincon,Xin Wang,Eldert J. van Henten,Gert Kootstra
2024-05-10
Abstract:To automate harvesting and de-leafing of tomato plants using robots, it is important to search and detect the task-relevant plant parts. This is challenging due to high levels of occlusion in tomato plants. Active vision is a promising approach to viewpoint planning, which helps robots to deliberately plan camera viewpoints to overcome occlusion and improve perception accuracy. However, current active-vision algorithms cannot differentiate between relevant and irrelevant plant parts and spend time on perceiving irrelevant plant parts, making them inefficient for targeted perception. We propose a semantics-aware active-vision strategy that uses semantic information to identify the relevant plant parts and prioritise them during view planning. We evaluated our strategy on the task of searching and detecting the relevant plant parts using simulation and real-world experiments. In simulation, using 3D models of tomato plants with varying structural complexity, our semantics-aware strategy could search and detect 81.8% of all the relevant plant parts using nine viewpoints. It was significantly faster and detected more plant parts than predefined, random, and volumetric active-vision strategies. Our strategy was also robust to uncertainty in plant and plant-part position, plant complexity, and different viewpoint-sampling strategies. Further, in real-world experiments, our strategy could search and detect 82.7% of all the relevant plant parts using seven viewpoints, under real-world conditions with natural variation and occlusion, natural illumination, sensor noise, and uncertainty in camera poses. Our results clearly indicate the advantage of using semantics-aware active vision for targeted perception of plant parts and its applicability in real-world setups. We believe that it can significantly improve the speed and robustness of automated harvesting and de-leafing in tomato crop production.
Robotics,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of how robots can efficiently search for and detect plant parts related to tasks (such as tomatoes, pedicels, and petioles) during the automated picking and defoliation processes in tomato greenhouses. Specifically: 1. **Detection challenges in highly occluded environments**: In tomato greenhouses, severe occlusion between plants makes it difficult for robots to accurately detect target parts. This makes methods relying solely on 2D image detection insufficient for estimating cutting points, thereby affecting the accuracy of automated operations. 2. **Limitations of existing active vision algorithms**: Although existing active vision algorithms can overcome occlusion by planning camera viewing angles, they are unable to distinguish between relevant and irrelevant plant parts, thus wasting time on unnecessary perception and reducing efficiency. 3. **The need for the application of semantic information**: In order to improve detection efficiency, a strategy that can use semantic information (such as category labels) to identify and prioritize relevant plant parts is required. To solve the above problems, the authors propose a semantics - aware active vision strategy. This strategy, by introducing an attention mechanism, preferentially selects viewing angles that can obtain more new information, thereby more efficiently searching for and detecting target plant parts. ### Specific research content - **Problem description**: Given a tomato plant located in a limited 3D space \( V \subset \mathbb{R}^3 \), the task is to use a robot with an RGB - D camera to explore the plant and detect all objects of interest (OOIs). Initially, the robot only knows the approximate location of the plant but is not sure of the specific location. Therefore, it is necessary to gradually detect all OOIs through a series of viewing angle selections. - **Method overview**: - **Perception module**: Use a convolutional neural network (such as Mask R - CNN) to detect OOIs. - **3D scene representation module**: Combine OOI information from multiple viewing angles to generate an OctoMap containing semantic information. - **Viewpoint planning module**: According to the currently known information, select the next best viewing angle to maximize the acquisition of new semantic information. - **Experimental verification**: The effectiveness of this method has been verified through simulation and actual experiments. The results show that in the simulation environment, this method can detect 81.8% of relevant plant parts within nine viewing angles; in the actual greenhouse environment, it can detect 82.7% of relevant plant parts within seven viewing angles, which is significantly better than other methods. ### Key formulas - **Semantic information gain**: \[ I_{\text{sem}}(x)= - p_s(x)\log_2(p_s(x))-(1 - p_s(x))\log_2(1 - p_s(x)) \] where \( p_s(x) \) is the confidence that point \( x \) belongs to a certain category. - **Expected semantic information gain**: \[ G_{\text{sem}}(\xi)=\sum_{x\in(X_\xi\cap B)}I_{\text{sem}}(x) \] where \( X_\xi \) is the set of all voxels expected to be visible from viewing angle \( \xi \), and \( B \) is the set of voxels in the region of interest. - **Total viewpoint utility**: \[ U_{\text{sem}} = G_{\text{sem}}(\xi)\times e^{-d} \] where \( d \) is the Euclidean distance between the current viewing angle and the candidate viewing angle. ### Conclusion This research shows that by introducing semantic information and an attention mechanism, the speed and robustness of robots in searching for and detecting specific plant parts in complex greenhouse environments can be significantly improved, thereby providing strong support for the automated picking and defoliation of tomato crops.