Improving Learnt Local MAPF Policies with Heuristic Search

Rishi Veerapaneni,Qian Wang,Kevin Ren,Arthur Jakobsson,Jiaoyang Li,Maxim Likhachev
2024-03-30
Abstract:Multi-agent path finding (MAPF) is the problem of finding collision-free paths for a team of agents to reach their goal locations. State-of-the-art classical MAPF solvers typically employ heuristic search to find solutions for hundreds of agents but are typically centralized and can struggle to scale when run with short timeouts. Machine learning (ML) approaches that learn policies for each agent are appealing as these could enable decentralized systems and scale well while maintaining good solution quality. Current ML approaches to MAPF have proposed methods that have started to scratch the surface of this potential. However, state-of-the-art ML approaches produce "local" policies that only plan for a single timestep and have poor success rates and scalability. Our main idea is that we can improve a ML local policy by using heuristic search methods on the output probability distribution to resolve deadlocks and enable full horizon planning. We show several model-agnostic ways to use heuristic search with learnt policies that significantly improve the policies' success rates and scalability. To our best knowledge, we demonstrate the first time ML-based MAPF approaches have scaled to high congestion scenarios (e.g. 20% agent density).
Multiagent Systems,Artificial Intelligence,Robotics
What problem does this paper attempt to address?
The paper primarily aims to address several key challenges in the Multi-Agent Path Finding (MAPF) problem, particularly how to improve machine learning-based methods to achieve better performance and scalability. Specifically, the goals of the paper can be summarized as follows: 1. **Improve the quality of machine learning-based local policies**: Current machine learning-based methods tend to learn local policies, i.e., predicting single-step actions for each agent. While these methods can make quick decisions, they perform poorly in long-term planning and deadlock avoidance. The paper proposes a method to improve these local policies by applying heuristic search over the predicted probability distributions. 2. **Address collision issues**: In multi-agent environments, collisions between agents are a common problem. Traditional machine learning methods usually employ simple collision shielding techniques to handle collisions, which can lead to deadlocks. The paper proposes a technique called CS-PIBT (Collision Shield with PIBT), utilizing the Priority Inheritance with Backtracking (PIBT) algorithm to more effectively resolve collisions and reduce the occurrence of deadlocks. 3. **Achieve full-time domain planning**: To overcome the limitations of machine learning-based local policies in long-term planning, the paper combines these policies with the heuristic search method LaCAM, thereby achieving full-time domain planning. This approach not only effectively avoids deadlocks but also ensures theoretical completeness. 4. **Integrate heuristic information with learned strategies**: The paper also explores how to combine heuristic information with learned strategies, proposing several different combination methods, including random conflict resolution based on heuristic information, conflict resolution using learned strategies, and combining both types of information for decision-making. Through the above methods, the paper aims to demonstrate how combining heuristic search techniques with machine learning-based strategies can significantly improve the success rate and scalability of multi-agent pathfinding tasks. Experimental results show that the proposed approach achieves significant effects in high-density scenarios, especially when using PIBT for collision shielding and integrating learned strategies with LaCAM.