Learning Online Belief Prediction for Efficient POMDP Planning in Autonomous Driving

Zhiyu Huang,Chen Tang,Chen Lv,Masayoshi Tomizuka,Wei Zhan

2024-06-18

Abstract:Effective decision-making in autonomous driving relies on accurate inference of other traffic agents' future behaviors. To achieve this, we propose an online belief-update-based behavior prediction model and an efficient planner for Partially Observable Markov Decision Processes (POMDPs). We develop a Transformer-based prediction model, enhanced with a recurrent neural memory model, to dynamically update latent belief state and infer the intentions of other agents. The model can also integrate the ego vehicle's intentions to reflect closed-loop interactions among agents, and it learns from both offline data and online interactions. For planning, we employ a Monte-Carlo Tree Search (MCTS) planner with macro actions, which reduces computational complexity by searching over temporally extended action steps. Inside the MCTS planner, we use predicted long-term multi-modal trajectories to approximate future updates, which eliminates iterative belief updating and improves the running efficiency. Our approach also incorporates deep Q-learning (DQN) as a search prior, which significantly improves the performance of the MCTS planner. Experimental results from simulated environments validate the effectiveness of our proposed method. The online belief update model can significantly enhance the accuracy and temporal consistency of predictions, leading to improved decision-making performance. Employing DQN as a search prior in the MCTS planner considerably boosts its performance and outperforms an imitation learning-based prior. Additionally, we show that the MCTS planning with macro actions substantially outperforms the vanilla method in terms of performance and efficiency.

Robotics

What problem does this paper attempt to address?

This paper attempts to address the decision-making problem of autonomous driving systems in uncertain environments, particularly how to accurately predict the behavior of other traffic participants (such as human drivers) to improve the safety navigation performance of autonomous vehicles. Specifically, the paper proposes an online belief update behavior prediction model and an efficient partially observable Markov decision process (POMDP) planner to achieve the following goals: 1. **Online Behavior Prediction**: By combining Transformer and Recurrent Neural Network (RNN), dynamically update the latent belief states of other traffic participants and infer their intentions. The model also considers the intention of the ego vehicle to reflect the closed-loop interaction between agents. 2. **Efficient Planning**: Employ a macro-action-based Monte Carlo Tree Search (MCTS) planner to reduce computational complexity by searching for long-term action sequences. In the MCTS planner, predicted multi-modal long-term trajectories are used to approximate future updates, thereby eliminating iterative belief updates and improving operational efficiency. 3. **Reinforcement Learning Guidance**: Use Deep Q-Learning (DQN) as a search prior to significantly enhance the performance of the MCTS planner. 4. **Online Learning Framework**: Establish an online learning framework to update the belief model and Q-value network, and validate the effectiveness of the method through real-world driving datasets and simulated driving environments. Overall, this paper aims to improve the decision-making capability and prediction accuracy of autonomous driving systems in uncertain environments by integrating deep learning and POMDP planning.

Learning Online Belief Prediction for Efficient POMDP Planning in Autonomous Driving

Learning Interaction-aware Motion Prediction Model for Decision-making in Autonomous Driving

Learning Hierarchical Behavior and Motion Planning for Autonomous Driving.

Efficient Game-Theoretic Planning with Prediction Heuristic for Socially-Compliant Autonomous Driving

Planning by Simulation: Motion Planning with Learning-based Parallel Scenario Prediction for Autonomous Driving

Hybrid-Prediction Integrated Planning for Autonomous Driving

LeTS-Drive: Driving in a Crowd by Learning from Tree Search

Conditional Predictive Behavior Planning with Inverse Reinforcement Learning for Human-like Autonomous Driving

Interactive Prediction and Decision-Making for Autonomous Vehicles: Online Active Learning with Traffic Entropy Minimization

Differentiable Integrated Motion Prediction and Planning with Learnable Cost Function for Autonomous Driving

Safe POMDP Online Planning among Dynamic Agents via Adaptive Conformal Prediction

Adaptive Online Packing-guided Search for POMDPs

BoT-Drive: Hierarchical Behavior and Trajectory Planning for Autonomous Driving using POMDPs

Hybrid Heuristic Online Planning for POMDPs

Intention-Aware Navigation in Crowds with Extended-Space POMDP Planning

Decision-Making for Autonomous Vehicles with Interaction-Aware Behavioral Prediction and Social-Attention Neural Network

Decision Making for Autonomous Driving in Interactive Merge Scenarios via Learning-based Prediction

Closing the Planning-Learning Loop with Application to Autonomous Driving

PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving

Predictive Maneuver Planning with Deep Reinforcement Learning (PMP-DRL) for comfortable and safe autonomous driving

BetaZero: Belief-State Planning for Long-Horizon POMDPs using Learned Approximations