Active search and coverage using point-cloud reinforcement learning

Matthias Rosynski,Alexandru Pop,Lucian Busoniu
DOI: https://doi.org/10.1109/ICSTCC59206.2023.10308458
2023-12-19
Abstract:We consider a problem in which the trajectory of a mobile 3D sensor must be optimized so that certain objects are both found in the overall scene and covered by the point cloud, as fast as possible. This problem is called target search and coverage, and the paper provides an end-to-end deep reinforcement learning (RL) solution to solve it. The deep neural network combines four components: deep hierarchical feature learning occurs in the first stage, followed by multi-head transformers in the second, max-pooling and merging with bypassed information to preserve spatial relationships in the third, and a distributional dueling network in the last stage. To evaluate the method, a simulator is developed where cylinders must be found by a Kinect sensor. A network architecture study shows that deep hierarchical feature learning works for RL and that by using farthest point sampling (FPS) we can reduce the amount of points and achieve not only a reduction of the network size but also better results. We also show that multi-head attention for point-clouds helps to learn the agent faster but converges to the same outcome. Finally, we compare RL using the best network with a greedy baseline that maximizes immediate rewards and requires for that purpose an oracle that predicts the next observation. We decided RL achieves significantly better and more robust results than the greedy strategy.
Systems and Control,Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the problem of optimizing the trajectory of mobile sensors in a 3D environment to quickly find and cover certain target objects. Specifically, this problem is referred to as target search and coverage. The authors propose an end-to-end deep learning solution based on point-cloud reinforcement learning (PCRL) to efficiently find and cover target objects by optimizing the sensor's path. ### Problem Background In many practical applications, such as search and rescue operations, autonomous aerial vehicles, underwater or ground vehicle missions, finding target objects in the environment is an important task. Traditional solutions often rely on predefined path planning or heuristic methods, but these methods tend to perform poorly in complex environments. Therefore, the authors propose a reinforcement learning framework based on point cloud representation to achieve more efficient search and coverage. ### Main Contributions 1. **End-to-end deep reinforcement learning framework**: This framework combines various deep learning techniques, including deep hierarchical feature learning, multi-head attention mechanism, max pooling, and distributional dueling network, to optimize the sensor's path planning. 2. **Point cloud representation**: Using point clouds as input representation instead of traditional image or voxel representation to reduce computational and memory costs while preserving spatial information. 3. **Experimental validation**: By developing a simulator, the proposed framework's effectiveness and robustness in finding and covering target objects were validated and compared with a greedy baseline strategy. ### Method Overview - **Network Structure**: - **Embedding Stage**: Using PointNet++ for deep hierarchical feature learning. - **Multi-head Attention Stage**: Introducing a multi-head attention mechanism to help the network learn faster. - **Max Pooling Stage**: Preserving spatial relationships to prevent information loss. - **RL Stage**: Using a distributional dueling network structure to improve learning efficiency and stability. - **Reward Function**: The reward function is designed as the difference between the number of new target points and new floor points found, encouraging the agent to quickly find and cover target objects. - **Experimental Setup**: Experiments were conducted in a simulated environment using a Kinect sensor to find cylindrical target objects. The method's effectiveness was evaluated by comparing the performance of different network structures and baseline strategies. ### Experimental Results - **Impact of Network Structure**: Experiments show that using a multi-head attention mechanism and farthest point sampling (FPS) can significantly improve learning speed and final performance. - **Comparison with Baseline Strategy**: Compared to the greedy baseline strategy, the proposed RL method demonstrates significantly better performance and robustness in finding and covering target objects. ### Conclusion The paper successfully proposes a point-cloud reinforcement learning method for optimizing target search and coverage tasks in a 3D environment. Experimental results show that this method outperforms traditional greedy strategies in terms of efficiency and robustness.