Abstract:To solve the problem of lateral and logitudinal joint decision-making of multi-vehicle cooperative driving for connected and automated vehicles (CAVs), this paper proposes a Monte Carlo tree search (MCTS) method with parallel update for multi-agent Markov game with limited horizon and time discounted setting. By analyzing the parallel actions in the multi-vehicle joint action space in the partial-steady-state traffic flow, the parallel update method can quickly exclude potential dangerous actions, thereby increasing the search depth without sacrificing the search breadth. The proposed method is tested in a large number of randomly generated traffic flow. The experiment results show that the algorithm has good robustness and better performance than the SOTA reinforcement learning algorithms and heuristic methods. The vehicle driving strategy using the proposed algorithm shows rationality beyond human drivers, and has advantages in traffic efficiency and safety in the coordinating zone.
Multiagent Systems,Artificial Intelligence,Computer Science and Game Theory,Systems and Control
What problem does this paper attempt to address?
This paper aims to solve the problem of horizontal and vertical joint decision - making in multi - vehicle cooperative driving for connected and autonomous vehicles (CAVs). Specifically, the paper proposes a method based on Monte Carlo Tree Search (MCTS) with a parallel update function, which is suitable for multi - agent Markov games within a finite time - horizon and takes into account the time - discount setting. By analyzing parallel actions in the multi - vehicle joint action space in some steady - state traffic flows, the proposed parallel update method can quickly eliminate potentially dangerous actions, thus increasing the search depth without sacrificing the search width. This method was tested in a large number of randomly generated traffic flows, and the experimental results show that the algorithm has good robustness and its performance is better than existing reinforcement learning algorithms and heuristic methods. The vehicle driving strategy using this algorithm shows more rationality than human drivers and can improve traffic efficiency and safety in the coordination area.
### Main contributions of the paper:
1. **Value - based MCTS method**: Proposed a value - based MCTS method for two - dimensional joint decision - making in multi - vehicle cooperation. This algorithm shows strong environmental adaptability and can easily handle randomly generated traffic scenarios, and its performance exceeds existing state - of - the - art reinforcement learning algorithms and rule - based methods.
2. **Parallel extension of the standard tree update method**: Extended the standard tree update method of MCTS to a parallel form, effectively improving the search efficiency of the joint strategy in multi - agent systems. This method simultaneously increases the breadth and depth of the search under the same number of rollouts and is suitable for problems with similar steady - state transitions.
3. **Experimental verification**: Experiments were carried out in a large number of randomly generated scenarios, and the cooperative driving behaviors of CAVs were observed. This algorithm shows more rationality than typical human drivers and can optimize traffic conditions in a long - time - horizon.
### Method overview:
- **Multi - agent Markov game modeling**: Model multi - vehicle cooperative driving as a multi - agent Markov game and define various components of the game, such as the state space, joint action space, state - transition probability distribution, reward function, etc.
- **MCTS method**: The MCTS method includes four steps: selection, expansion, simulation, and back - propagation. The paper specifically introduces the parallel update method to accelerate the search process and improve search efficiency by identifying parallel actions.
- **Reward function design**: The reward function aims to improve overall traffic efficiency and safety, including speed rewards, intention rewards, collision penalties, and lane - change frequency rewards. For some steady - state update systems, the paper proposes a specific reward function design to better capture the differences between actions.
### Experimental setup and results:
- **Simulation environment**: Use the Flow framework to construct simulation scenarios, including two CAVs controlled by the MCTS algorithm and four human - driven vehicles (HDVs). Experimental parameters include the initial position, speed, and acceleration of vehicles.
- **Experimental results**: In 200 experiments, this algorithm shows good robustness and performance, can effectively handle complex traffic scenarios, and improve traffic efficiency and safety.
### Conclusion:
The method proposed in the paper shows significant advantages in multi - vehicle cooperative driving, especially in improving traffic efficiency and safety. Through the parallel update technology, the MCTS method has been significantly improved in search efficiency, providing strong support for the efficient cooperative driving strategies of CAVs.