Automated vehicle's behavior decision making using deep reinforcement learning and high-fidelity simulation environment

Yingjun Ye,Xiaohui Zhang,Jian Sun
DOI: https://doi.org/10.48550/arXiv.1804.06264
2018-04-17
Abstract:Automated vehicles are deemed to be the key element for the intelligent transportation system in the future. Many studies have been made to improve the Automated vehicles' ability of environment recognition and vehicle control, while the attention paid to decision making is not enough though the decision algorithms so far are very preliminary. Therefore, a framework of the decision-making training and learning is put forward in this paper. It consists of two parts: the deep reinforcement learning training program and the high-fidelity virtual simulation environment. Then the basic microscopic behavior, car-following, is trained within this framework. In addition, theoretical analysis and experiments were conducted on setting reward function for accelerating training using deep reinforcement learning. The results show that on the premise of driving comfort, the efficiency of the trained Automated vehicle increases 7.9% compared to the classical traffic model, intelligent driver model. Later on, on a more complex three-lane section, we trained the integrated model combines both car-following and lane-changing behavior, the average speed further grows 2.4%. It indicates that our framework is effective for Automated vehicle's decision-making learning.
Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the deficiency of autonomous vehicles (AVs) in decision - making. Although much progress has been made in environmental recognition and vehicle control, there are still deficiencies in the research of decision - making algorithms, especially in terms of the difficulty in obtaining training data, the difficulty in labeling decision - making behaviors, and the limitations of existing imitation learning algorithms. Therefore, this paper proposes a decision - training framework based on deep reinforcement learning (DRL) and high - fidelity virtual simulation environments, aiming to improve the decision - making ability of autonomous vehicles. Specifically, this framework consists of two parts: a deep reinforcement learning training program and a high - fidelity virtual simulation environment. Through this framework, researchers first trained the basic micro - behavior - car - following (CF), and explored methods of setting reward functions to accelerate DRL training in theoretical analysis and experiments. The results show that, on the premise of ensuring driving comfort, the efficiency of the trained autonomous vehicle is 7.9% higher than that of the classical traffic model (such as the intelligent driver model, IDM). In addition, in a more complex three - lane scenario, the researchers further trained a comprehensive model combining car - following and lane - changing (LC) behaviors, and the average speed was further increased by 2.4%. This indicates that the proposed framework is effective in the decision - learning of autonomous vehicles. ### Main contributions of the paper: 1. **Propose an integrated framework based on VISSIM high - fidelity simulation and DRL** for decision - training. 2. **Adopt the deep deterministic policy gradient (DDPG) algorithm**, which is improved to adapt to the continuous state of autonomous driving. 3. **Explore three methods of setting reward functions**, analyze their influence mechanisms, and discuss methods of verifying effectiveness and accelerating convergence. 4. **Test the car - following model on a single - lane highway section**, and extend it to a comprehensive model combining car - following and lane - changing behaviors on a three - lane highway section. The results show that this platform is effective in learning driving tasks. ### Key technical points: - **High - fidelity virtual simulation environment**: Use VISSIM to provide more realistic traffic scenarios, including road infrastructure and the behaviors of all road users. - **Deep reinforcement learning (DRL)**: Adopt the DDPG algorithm, which combines the deep Q - network (DQN), the deterministic policy gradient (DPG), and the Actor - Critic algorithm, and is suitable for continuous state and action spaces. - **Design of reward functions**: Through multiple attempts of different forms of reward functions (additive form, multiplicative form, normalized form, normalized form after amplification, and regularized bounded form), the most effective method of setting reward functions was found. ### Experimental results: - **Car - following behavior**: On a single - lane highway section, the trained autonomous vehicle can maintain a safe distance from the vehicle in front and avoid frequent or drastic acceleration and deceleration, improving driving comfort and safety. - **Comprehensive behavior**: On a three - lane highway section, the comprehensive model combining car - following and lane - changing behaviors further increases the average speed, verifying the effectiveness of the framework. In short, this paper significantly improves the decision - making ability and driving performance of autonomous vehicles by proposing a decision - training framework based on high - fidelity simulation and deep reinforcement learning.