Learning Online Belief Prediction for Efficient POMDP Planning in Autonomous Driving

Zhiyu Huang,Chen Tang,Chen Lv,Masayoshi Tomizuka,Wei Zhan
2024-06-18
Abstract:Effective decision-making in autonomous driving relies on accurate inference of other traffic agents' future behaviors. To achieve this, we propose an online belief-update-based behavior prediction model and an efficient planner for Partially Observable Markov Decision Processes (POMDPs). We develop a Transformer-based prediction model, enhanced with a recurrent neural memory model, to dynamically update latent belief state and infer the intentions of other agents. The model can also integrate the ego vehicle's intentions to reflect closed-loop interactions among agents, and it learns from both offline data and online interactions. For planning, we employ a Monte-Carlo Tree Search (MCTS) planner with macro actions, which reduces computational complexity by searching over temporally extended action steps. Inside the MCTS planner, we use predicted long-term multi-modal trajectories to approximate future updates, which eliminates iterative belief updating and improves the running efficiency. Our approach also incorporates deep Q-learning (DQN) as a search prior, which significantly improves the performance of the MCTS planner. Experimental results from simulated environments validate the effectiveness of our proposed method. The online belief update model can significantly enhance the accuracy and temporal consistency of predictions, leading to improved decision-making performance. Employing DQN as a search prior in the MCTS planner considerably boosts its performance and outperforms an imitation learning-based prior. Additionally, we show that the MCTS planning with macro actions substantially outperforms the vanilla method in terms of performance and efficiency.
Robotics
What problem does this paper attempt to address?
This paper attempts to address the decision-making problem of autonomous driving systems in uncertain environments, particularly how to accurately predict the behavior of other traffic participants (such as human drivers) to improve the safety navigation performance of autonomous vehicles. Specifically, the paper proposes an online belief update behavior prediction model and an efficient partially observable Markov decision process (POMDP) planner to achieve the following goals: 1. **Online Behavior Prediction**: By combining Transformer and Recurrent Neural Network (RNN), dynamically update the latent belief states of other traffic participants and infer their intentions. The model also considers the intention of the ego vehicle to reflect the closed-loop interaction between agents. 2. **Efficient Planning**: Employ a macro-action-based Monte Carlo Tree Search (MCTS) planner to reduce computational complexity by searching for long-term action sequences. In the MCTS planner, predicted multi-modal long-term trajectories are used to approximate future updates, thereby eliminating iterative belief updates and improving operational efficiency. 3. **Reinforcement Learning Guidance**: Use Deep Q-Learning (DQN) as a search prior to significantly enhance the performance of the MCTS planner. 4. **Online Learning Framework**: Establish an online learning framework to update the belief model and Q-value network, and validate the effectiveness of the method through real-world driving datasets and simulated driving environments. Overall, this paper aims to improve the decision-making capability and prediction accuracy of autonomous driving systems in uncertain environments by integrating deep learning and POMDP planning.