Optimizing Trajectories for Highway Driving with Offline Reinforcement Learning

Branka Mirchevska,Moritz Werling,Joschka Boedecker
DOI: https://doi.org/10.48550/arXiv.2203.10949
2022-03-21
Abstract:Implementing an autonomous vehicle that is able to output feasible, smooth and efficient trajectories is a long-standing challenge. Several approaches have been considered, roughly falling under two categories: rule-based and learning-based approaches. The rule-based approaches, while guaranteeing safety and feasibility, fall short when it comes to long-term planning and generalization. The learning-based approaches are able to account for long-term planning and generalization to unseen situations, but may fail to achieve smoothness, safety and the feasibility which rule-based approaches ensure. Hence, combining the two approaches is an evident step towards yielding the best compromise out of both. We propose a Reinforcement Learning-based approach, which learns target trajectory parameters for fully autonomous driving on highways. The trained agent outputs continuous trajectory parameters based on which a feasible polynomial-based trajectory is generated and executed. We compare the performance of our agent against four other highway driving agents. The experiments are conducted in the Sumo simulator, taking into consideration various realistic, dynamically changing highway scenarios, including surrounding vehicles with different driver behaviors. We demonstrate that our offline trained agent, with randomly collected data, learns to drive smoothly, achieving velocities as close as possible to the desired velocity, while outperforming the other agents. Code, training data and details available at: https://nrgit.informatik.uni-freiburg. de/branka.mirchevska/offline-rl-tp.
Robotics,Machine Learning
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address the long-standing challenge of generating feasible, smooth, and efficient driving trajectories for autonomous vehicles on highways. Specifically, the authors propose an offline reinforcement learning (RL) based method that can learn to generate trajectory parameters for fully autonomous driving. ### Background and Motivation 1. **Rule-based Methods**: While they ensure safety and feasibility, they perform poorly in long-term planning and generalization. 2. **Learning-based Methods**: They can handle long-term planning and generalize to unseen situations but may not ensure smoothness, safety, and feasibility. To combine the advantages of both, the authors propose a new method that learns target trajectory parameters through reinforcement learning and integrates a polynomial trajectory generation module to generate and execute the trajectories. ### Method Overview 1. **Scene Understanding Module**: Collects environmental information and processes it into RL state features relevant to decision-making. 2. **Decision Module**: Implemented based on the TD3 algorithm, selects four consecutive actions describing the target trajectory parameters. 3. **Trajectory Generation Module**: Generates polynomial trajectories based on the selected trajectory parameters. 4. **Trajectory Execution Module**: Executes the generated trajectory and updates the decision every second. ### Experiments and Results 1. **Experimental Setup**: Experiments were conducted in the Sumo simulator, considering various realistic dynamic highway scenarios, including surrounding vehicles with different driving behaviors. 2. **Performance Comparison**: Compared with four other highway driving agents, the results show that the proposed agent achieves higher average speeds under different traffic densities and can avoid collisions and road boundary departures in complex situations. ### Key Contributions 1. **Novel Offline RL Method**: Suitable for highway autonomous driving, with continuous control components for lateral and longitudinal planning, based on a polynomial trajectory generation module. 2. **Diverse Realistic Scenario Testing**: Compared with various models in different realistic scenarios. 3. **Ability to Handle Critical Situations**: Demonstrated the agent's performance in sudden cut-ins and other complex situations. 4. **Training Data Analysis**: Studied the impact of data structure and terminal sample ratio on the learning strategy. ### Conclusion The proposed method excels in generating smooth, efficient, and safe highway driving trajectories, especially in handling complex situations. Additionally, the analysis of training data further validates the significant impact of data quality and structure on offline RL performance.