Abstract:Multi-agent path finding (MAPF) is the problem of finding collision-free paths for a team of agents to reach their goal locations. State-of-the-art classical MAPF solvers typically employ heuristic search to find solutions for hundreds of agents but are typically centralized and can struggle to scale when run with short timeouts. Machine learning (ML) approaches that learn policies for each agent are appealing as these could enable decentralized systems and scale well while maintaining good solution quality. Current ML approaches to MAPF have proposed methods that have started to scratch the surface of this potential. However, state-of-the-art ML approaches produce "local" policies that only plan for a single timestep and have poor success rates and scalability. Our main idea is that we can improve a ML local policy by using heuristic search methods on the output probability distribution to resolve deadlocks and enable full horizon planning. We show several model-agnostic ways to use heuristic search with learnt policies that significantly improve the policies' success rates and scalability. To our best knowledge, we demonstrate the first time ML-based MAPF approaches have scaled to high congestion scenarios (e.g. 20% agent density).

What problem does this paper attempt to address?

The paper primarily aims to address several key challenges in the Multi-Agent Path Finding (MAPF) problem, particularly how to improve machine learning-based methods to achieve better performance and scalability. Specifically, the goals of the paper can be summarized as follows: 1. **Improve the quality of machine learning-based local policies**: Current machine learning-based methods tend to learn local policies, i.e., predicting single-step actions for each agent. While these methods can make quick decisions, they perform poorly in long-term planning and deadlock avoidance. The paper proposes a method to improve these local policies by applying heuristic search over the predicted probability distributions. 2. **Address collision issues**: In multi-agent environments, collisions between agents are a common problem. Traditional machine learning methods usually employ simple collision shielding techniques to handle collisions, which can lead to deadlocks. The paper proposes a technique called CS-PIBT (Collision Shield with PIBT), utilizing the Priority Inheritance with Backtracking (PIBT) algorithm to more effectively resolve collisions and reduce the occurrence of deadlocks. 3. **Achieve full-time domain planning**: To overcome the limitations of machine learning-based local policies in long-term planning, the paper combines these policies with the heuristic search method LaCAM, thereby achieving full-time domain planning. This approach not only effectively avoids deadlocks but also ensures theoretical completeness. 4. **Integrate heuristic information with learned strategies**: The paper also explores how to combine heuristic information with learned strategies, proposing several different combination methods, including random conflict resolution based on heuristic information, conflict resolution using learned strategies, and combining both types of information for decision-making. Through the above methods, the paper aims to demonstrate how combining heuristic search techniques with machine learning-based strategies can significantly improve the success rate and scalability of multi-agent pathfinding tasks. Experimental results show that the proposed approach achieves significant effects in high-density scenarios, especially when using PIBT for collision shielding and integrating learned strategies with LaCAM.

Improving Learnt Local MAPF Policies with Heuristic Search

Learning to Cooperate: Application of Deep Reinforcement Learning for Online AGV Path Finding.

Moving Forward in Formation: A Decentralized Hierarchical Learning Approach to Multi-Agent Moving Together

HiMAP: Learning Heuristics-Informed Policies for Large-Scale Multi-Agent Pathfinding

Anytime Multi-Agent Path Finding via Machine Learning-Guided Large Neighborhood Search

Multi-Agent Path Finding Method Based on Evolutionary Reinforcement Learning

Work Smarter Not Harder: Simple Imitation Learning with CS-PIBT Outperforms Large Scale Imitation Learning for MAPF

Learn to Follow: Decentralized Lifelong Multi-agent Pathfinding via Planning and Learning

Multi-Agent Path Finding via Reinforcement Learning with Hybrid Reward

ALPHA: Attention-based Long-horizon Pathfinding in Highly-structured Areas

Scaling Lifelong Multi-Agent Path Finding to More Realistic Settings: Research Challenges and Opportunities

When to Switch: Planning and Learning for Partially Observable Multi-Agent Pathfinding

Multi-agent Pathfinding with Local and Global Guidance

Traffic Flow Optimisation for Lifelong Multi-Agent Path Finding

Ensembling Prioritized Hybrid Policies for Multi-agent Pathfinding

Multiagent Path Finding Using Deep Reinforcement Learning Coupled With Hot Supervision Contrastive Loss

Dynamic Programming based Local Search approaches for Multi-Agent Path Finding problems on Directed Graphs

MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale

A Comprehensive Review on Leveraging Machine Learning for Multi-Agent Path Finding

Multi-Agent Path Finding with Heterogeneous Geometric and Kinematic Constraints in Continuous Space

Multi-Agent Path Finding with Delay Probabilities