Deep Reinforcement Learning for Time-Critical Wilderness Search And Rescue Using Drones

Jan-Hendrik Ewers,David Anderson,Douglas Thomson
2024-05-22
Abstract:Traditional search and rescue methods in wilderness areas can be time-consuming and have limited coverage. Drones offer a faster and more flexible solution, but optimizing their search paths is crucial. This paper explores the use of deep reinforcement learning to create efficient search missions for drones in wilderness environments. Our approach leverages a priori data about the search area and the missing person in the form of a probability distribution map. This allows the deep reinforcement learning agent to learn optimal flight paths that maximize the probability of finding the missing person quickly. Experimental results show that our method achieves a significant improvement in search times compared to traditional coverage planning and search planning algorithms. In one comparison, deep reinforcement learning is found to outperform other algorithms by over $160\%$, a difference that can mean life or death in real-world search operations. Additionally, unlike previous work, our approach incorporates a continuous action space enabled by cubature, allowing for more nuanced flight patterns.
Robotics,Machine Learning,Systems and Control
What problem does this paper attempt to address?
The paper discusses how to utilize Deep Reinforcement Learning (DRL) to optimize path planning for unmanned aerial vehicles (UAVs) in Wilderness Search And Rescue (WiSAR) scenarios. Traditional methods are time-consuming and have limited coverage, while UAVs can provide faster and more flexible solutions. However, it is crucial to effectively plan the search path of UAVs. In the study, the authors propose a method that utilizes Probability Distribution Map (PDM) as prior information, enabling the DRL agent to learn the optimal flight path to maximize the probability of finding missing individuals. Experimental results show that this method significantly reduces search time compared to traditional coverage planning and search planning algorithms. For example, in certain comparisons, the performance of the DRL algorithm is over 160% better than other algorithms, which could be a life-or-death difference in real-world search and rescue operations. Additionally, their method introduces a continuous action space achieved through trilinear interpolation, making the flight mode more refined and avoiding the noise problem of reward signals. The paper also compares it with other relevant works and highlights the uniqueness of their approach in incorporating PDM as a part of the DRL observation space and using continuous PDM during the evaluation phase. In summary, the main objective of the paper is to address the efficient utilization of UAVs for wilderness search and rescue by optimizing the UAV's search path through deep reinforcement learning, thereby reducing the time to find missing individuals.