Abstract:Trajectory planning method is a research hotspot in autonomous driving. Existing reinforcement learning-based trajectory planning methods suffer from unstable performance due to the strong randomness of network weight parameter updates during the training process. Therefore, this paper proposes a novel trajectory planning method based on deep reinforcement learning trust region policy optimization (TRPO). Firstly, in order to enhance the robustness of the trajectory planning method based on deep reinforcement learning TRPO, a TRPO-LSTM based decision model was proposed. More specifically, a long short term memory (LSTM) based state feature extraction network was designed and embeded into a TRPO-based decision model to enhance the ability of TRPO to extract information from the environmental state space. Secondly, in order to make the planned trajectory adaptive to the dynamic changes of traffic environment, we presented a novel TRPO-LSTM trajectory fitting algorithm. To the best of our knowledge, this is the first work aiming at applying the TRPO-LSTM based decision model in the trajectory fitting process to search the optimal longitudinal trajectory speed. Finally, the proposed trajectory planning method was implemented and simulated on the CARLA simulator. The experimental results show that, compared with existing trajectory planning methods based on deep reinforcement learning algorithms, our proposed method achieves a cumulative reward improvement of over 28.9% in the scenario of four lane highway, and has better robustness. Meanwhile, the proposed method can achieve a lower collision rate of 0.93% while improving the average speed and comfort of vehicle driving.

Trust Region Policy Optimization

An Off-Policy Trust Region Policy Optimization Method with Monotonic Improvement Guarantee for Deep Reinforcement Learning

A Novel Trajectory Planning Method Based on Trust Region Policy Optimization

A Stochastic Trust-Region Framework for Policy Optimization

Trust Region-Guided Proximal Policy Optimization

Matrix Low-Rank Trust Region Policy Optimization

Learning to Constrain Policy Optimization with Virtual Trust Region

Absolute Policy Optimization

Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning

Embedding Safety into RL: A New Take on Trust Region Methods

On-Policy Trust Region Policy Optimisation with Replay Buffers

EnTRPO: Trust Region Policy Optimization Method with Entropy Regularization

Multi-Agent Trust Region Policy Optimization

Simple Policy Optimization

Separated Trust Regions Policy Optimization Method

Hindsight Trust Region Policy Optimization

Reflective Policy Optimization

Supported Trust Region Optimization for Offline Reinforcement Learning

Uncertainty-Aware Policy Optimization: A Robust, Adaptive Trust Region Approach

Truly Proximal Policy Optimization

Faded-Experience Trust Region Policy Optimization for Model-Free Power Allocation in Interference Channel