Abstract:Trajectory planning method is a research hotspot in autonomous driving. Existing reinforcement learning-based trajectory planning methods suffer from unstable performance due to the strong randomness of network weight parameter updates during the training process. Therefore, this paper proposes a novel trajectory planning method based on deep reinforcement learning trust region policy optimization (TRPO). Firstly, in order to enhance the robustness of the trajectory planning method based on deep reinforcement learning TRPO, a TRPO-LSTM based decision model was proposed. More specifically, a long short term memory (LSTM) based state feature extraction network was designed and embeded into a TRPO-based decision model to enhance the ability of TRPO to extract information from the environmental state space. Secondly, in order to make the planned trajectory adaptive to the dynamic changes of traffic environment, we presented a novel TRPO-LSTM trajectory fitting algorithm. To the best of our knowledge, this is the first work aiming at applying the TRPO-LSTM based decision model in the trajectory fitting process to search the optimal longitudinal trajectory speed. Finally, the proposed trajectory planning method was implemented and simulated on the CARLA simulator. The experimental results show that, compared with existing trajectory planning methods based on deep reinforcement learning algorithms, our proposed method achieves a cumulative reward improvement of over 28.9% in the scenario of four lane highway, and has better robustness. Meanwhile, the proposed method can achieve a lower collision rate of 0.93% while improving the average speed and comfort of vehicle driving.

Learning Similar Tasks Based on PPO by Transferring Trajectory.

A Novel Trajectory Planning Method Based on Trust Region Policy Optimization

Transferring knowledge from human-demonstration trajectories to reinforcement learning

Trust Region-Guided Proximal Policy Optimization

PTR-PPO: Proximal Policy Optimization with Prioritized Trajectory Replay

A Path Planning Algorithm Based on Deep Reinforcement Learning for Mobile Robots in Unknown Environment

Learning with Training Wheels: Speeding up Training with a Simple Controller for Deep Reinforcement Learning

Robotic arm trajectory tracking method based on improved proximal policy optimization

DTPPO: Dual-Transformer Encoder-based Proximal Policy Optimization for Multi-UAV Navigation in Unseen Complex Environments

Accelerating Proximal Policy Optimization Learning Using Task Prediction for Solving Environments with Delayed Rewards

Distillation Strategies for Proximal Policy Optimization

Simulation of Robotic Arm Grasping Control Based on Proximal Policy Optimization Algorithm

Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment

A Hierarchical Hybrid Learning Framework for Multi-Agent Trajectory Prediction

Continuous Transfer Learning for UAV Communication-aware Trajectory Design

Policy Optimization with Smooth Guidance Learned from State-Only Demonstrations

Trajectory-Oriented Policy Optimization with Sparse Rewards

Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

Proximal Policy Optimization with Mixed Distributed Training

Enhancing Sample Efficiency and Exploration in Reinforcement Learning through the Integration of Diffusion Models and Proximal Policy Optimization

Reinforcement Learning Transfer Based on Subgoal Discovery and Subtask Similarity