Reinforcement Learning from Human Feedback for Lane Changing of Autonomous Vehicles in Mixed Traffic

Yuting Wang,Lu Liu,Maonan Wang,Xi Xiong
2024-08-08
Abstract:The burgeoning field of autonomous driving necessitates the seamless integration of autonomous vehicles (AVs) with human-driven vehicles, calling for more predictable AV behavior and enhanced interaction with human drivers. Human-like driving, particularly during lane-changing maneuvers on highways, is a critical area of research due to its significant impact on safety and traffic flow. Traditional rule-based decision-making approaches often fail to encapsulate the nuanced boundaries of human behavior in diverse driving scenarios, while crafting reward functions for learning-based methods introduces its own set of complexities. This study investigates the application of Reinforcement Learning from Human Feedback (RLHF) to emulate human-like lane-changing decisions in AVs. An initial RL policy is pre-trained to ensure safe lane changes. Subsequently, this policy is employed to gather data, which is then annotated by humans to train a reward model that discerns lane changes aligning with human preferences. This human-informed reward model supersedes the original, guiding the refinement of the policy to reflect human-like preferences. The effectiveness of RLHF in producing human-like lane changes is demonstrated through the development and evaluation of conservative and aggressive lane-changing models within obstacle-rich environments and mixed autonomy traffic scenarios. The experimental outcomes underscore the potential of RLHF to diversify lane-changing behaviors in AVs, suggesting its viability for enhancing the integration of AVs into the fabric of human-driven traffic.
Computational Engineering, Finance, and Science
What problem does this paper attempt to address?
This paper attempts to address the problem of how to make autonomous vehicles (AVs) more closely mimic human driving behavior during lane changes in a mixed traffic environment, thereby improving their behavior predictability and interaction with human drivers. Specifically, the researchers focus on how to train autonomous vehicles through Reinforcement Learning from Human Feedback (RLHF) to better simulate the decision-making process of human drivers during lane changes. ### Main Issues 1. **Behavior Predictability**: Traditional rule-based decision-making methods often fail to fully capture the subtle behavioral differences of human drivers in various driving scenarios, making the behavior of autonomous vehicles difficult to predict. 2. **Interaction with Human Drivers**: Autonomous vehicles need to operate on roads shared with human drivers, so their behavior needs to align more closely with human expectations to reduce accidents and improve traffic flow. 3. **Design of Reward Functions**: In traditional reinforcement learning methods, the design of reward functions usually requires manual setting, which is difficult to cover all situations in complex and variable driving scenarios. ### Solution The paper proposes an RLHF-based method, implemented through the following steps: 1. **Pre-training**: First, use the Proximal Policy Optimization (PPO) algorithm to pre-train a basic lane-changing decision model, ensuring that the model can safely perform lane-changing operations in the initial stage. 2. **Collecting Human Feedback**: Use the pre-trained model to generate a series of trajectories, and have human evaluators assess these trajectories, marking lane-changing behaviors that align with human preferences. 3. **Training the Reward Model**: Based on the feedback from human evaluators, train a reward model that can predict rewards that align with human preferences. 4. **Policy Optimization**: Use the PPO algorithm to fine-tune the initial policy, making the lane-changing behavior of autonomous vehicles more aligned with human preferences. ### Experimental Validation The researchers conducted experiments on the SUMO platform, developing two types of lane-changing decision models: conservative and aggressive. By comparing the pre-trained model with the RLHF fine-tuned model, they validated the effectiveness of RLHF in obstacle avoidance and mixed autonomous traffic scenarios. ### Conclusion This study demonstrates the potential of RLHF in improving the human-likeness of lane-changing behavior in autonomous vehicles, helping to enhance the behavior predictability and safety of autonomous vehicles in mixed traffic environments.