Abstract:Implementing an autonomous vehicle that is able to output feasible, smooth and efficient trajectories is a long-standing challenge. Several approaches have been considered, roughly falling under two categories: rule-based and learning-based approaches. The rule-based approaches, while guaranteeing safety and feasibility, fall short when it comes to long-term planning and generalization. The learning-based approaches are able to account for long-term planning and generalization to unseen situations, but may fail to achieve smoothness, safety and the feasibility which rule-based approaches ensure. Hence, combining the two approaches is an evident step towards yielding the best compromise out of both. We propose a Reinforcement Learning-based approach, which learns target trajectory parameters for fully autonomous driving on highways. The trained agent outputs continuous trajectory parameters based on which a feasible polynomial-based trajectory is generated and executed. We compare the performance of our agent against four other highway driving agents. The experiments are conducted in the Sumo simulator, taking into consideration various realistic, dynamically changing highway scenarios, including surrounding vehicles with different driver behaviors. We demonstrate that our offline trained agent, with randomly collected data, learns to drive smoothly, achieving velocities as close as possible to the desired velocity, while outperforming the other agents. Code, training data and details available at: https://nrgit.informatik.uni-freiburg. de/branka.mirchevska/offline-rl-tp.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address the long-standing challenge of generating feasible, smooth, and efficient driving trajectories for autonomous vehicles on highways. Specifically, the authors propose an offline reinforcement learning (RL) based method that can learn to generate trajectory parameters for fully autonomous driving. ### Background and Motivation 1. **Rule-based Methods**: While they ensure safety and feasibility, they perform poorly in long-term planning and generalization. 2. **Learning-based Methods**: They can handle long-term planning and generalize to unseen situations but may not ensure smoothness, safety, and feasibility. To combine the advantages of both, the authors propose a new method that learns target trajectory parameters through reinforcement learning and integrates a polynomial trajectory generation module to generate and execute the trajectories. ### Method Overview 1. **Scene Understanding Module**: Collects environmental information and processes it into RL state features relevant to decision-making. 2. **Decision Module**: Implemented based on the TD3 algorithm, selects four consecutive actions describing the target trajectory parameters. 3. **Trajectory Generation Module**: Generates polynomial trajectories based on the selected trajectory parameters. 4. **Trajectory Execution Module**: Executes the generated trajectory and updates the decision every second. ### Experiments and Results 1. **Experimental Setup**: Experiments were conducted in the Sumo simulator, considering various realistic dynamic highway scenarios, including surrounding vehicles with different driving behaviors. 2. **Performance Comparison**: Compared with four other highway driving agents, the results show that the proposed agent achieves higher average speeds under different traffic densities and can avoid collisions and road boundary departures in complex situations. ### Key Contributions 1. **Novel Offline RL Method**: Suitable for highway autonomous driving, with continuous control components for lateral and longitudinal planning, based on a polynomial trajectory generation module. 2. **Diverse Realistic Scenario Testing**: Compared with various models in different realistic scenarios. 3. **Ability to Handle Critical Situations**: Demonstrated the agent's performance in sudden cut-ins and other complex situations. 4. **Training Data Analysis**: Studied the impact of data structure and terminal sample ratio on the learning strategy. ### Conclusion The proposed method excels in generating smooth, efficient, and safe highway driving trajectories, especially in handling complex situations. Additionally, the analysis of training data further validates the significant impact of data quality and structure on offline RL performance.

Optimizing Trajectories for Highway Driving with Offline Reinforcement Learning

Rollout-Based Interactive Motion Planning for Automated Vehicles *

Learning an Efficient and Safe Policy for Highway Driving Using Supervised Learning and Reinforcement Learning.

Amortized Q-learning with Model-based Action Proposals for Autonomous Driving on Highways

Human-like Highway Trajectory Modeling Based on Inverse Reinforcement Learning.

Autonomous Highway Driving using Deep Reinforcement Learning

Online longitudinal trajectory planning for connected and autonomous vehicles in mixed traffic flow with deep reinforcement learning approach

ReinforcementDriving: Exploring Trajectories and Navigation for Autonomous Vehicles

Comprehensive Training and Evaluation on Deep Reinforcement Learning for Automated Driving in Various Simulated Driving Maneuvers

Learning from Demonstration: Situation-Adaptive Lane Change Trajectory Planning for Automated Highway Driving

A Hybrid Deep Reinforcement Learning and Optimal Control Architecture for Autonomous Highway Driving

Bypassing the Simulation-to-reality Gap: Online Reinforcement Learning using a Supervisor

NeuroTrajectory: A Neuroevolutionary Approach to Local State Trajectory Learning for Autonomous Vehicles

Reinforced Imitative Trajectory Planning for Urban Automated Driving

Learning Realistic Traffic Agents in Closed-loop

Traffic Smoothing Controllers for Autonomous Vehicles Using Deep Reinforcement Learning and Real-World Trajectory Data

Guaranteed Safe Reachability-based Trajectory Design for a High-Fidelity Model of an Autonomous Passenger Vehicle

Uncertainty-aware hybrid paradigm of nonlinear MPC and model-based RL for offroad navigation: Exploration of transformers in the predictive model

Online Trajectory Planning with Reinforcement Learning for Pedestrian Avoidance

HGRL: Human-Driving-Data Guided Reinforcement Learning for Autonomous Driving

Learning to Drive from a World on Rails