Abstract:The burgeoning field of autonomous driving necessitates the seamless integration of autonomous vehicles (AVs) with human-driven vehicles, calling for more predictable AV behavior and enhanced interaction with human drivers. Human-like driving, particularly during lane-changing maneuvers on highways, is a critical area of research due to its significant impact on safety and traffic flow. Traditional rule-based decision-making approaches often fail to encapsulate the nuanced boundaries of human behavior in diverse driving scenarios, while crafting reward functions for learning-based methods introduces its own set of complexities. This study investigates the application of Reinforcement Learning from Human Feedback (RLHF) to emulate human-like lane-changing decisions in AVs. An initial RL policy is pre-trained to ensure safe lane changes. Subsequently, this policy is employed to gather data, which is then annotated by humans to train a reward model that discerns lane changes aligning with human preferences. This human-informed reward model supersedes the original, guiding the refinement of the policy to reflect human-like preferences. The effectiveness of RLHF in producing human-like lane changes is demonstrated through the development and evaluation of conservative and aggressive lane-changing models within obstacle-rich environments and mixed autonomy traffic scenarios. The experimental outcomes underscore the potential of RLHF to diversify lane-changing behaviors in AVs, suggesting its viability for enhancing the integration of AVs into the fabric of human-driven traffic.

What problem does this paper attempt to address?

This paper attempts to address the problem of how to make autonomous vehicles (AVs) more closely mimic human driving behavior during lane changes in a mixed traffic environment, thereby improving their behavior predictability and interaction with human drivers. Specifically, the researchers focus on how to train autonomous vehicles through Reinforcement Learning from Human Feedback (RLHF) to better simulate the decision-making process of human drivers during lane changes. ### Main Issues 1. **Behavior Predictability**: Traditional rule-based decision-making methods often fail to fully capture the subtle behavioral differences of human drivers in various driving scenarios, making the behavior of autonomous vehicles difficult to predict. 2. **Interaction with Human Drivers**: Autonomous vehicles need to operate on roads shared with human drivers, so their behavior needs to align more closely with human expectations to reduce accidents and improve traffic flow. 3. **Design of Reward Functions**: In traditional reinforcement learning methods, the design of reward functions usually requires manual setting, which is difficult to cover all situations in complex and variable driving scenarios. ### Solution The paper proposes an RLHF-based method, implemented through the following steps: 1. **Pre-training**: First, use the Proximal Policy Optimization (PPO) algorithm to pre-train a basic lane-changing decision model, ensuring that the model can safely perform lane-changing operations in the initial stage. 2. **Collecting Human Feedback**: Use the pre-trained model to generate a series of trajectories, and have human evaluators assess these trajectories, marking lane-changing behaviors that align with human preferences. 3. **Training the Reward Model**: Based on the feedback from human evaluators, train a reward model that can predict rewards that align with human preferences. 4. **Policy Optimization**: Use the PPO algorithm to fine-tune the initial policy, making the lane-changing behavior of autonomous vehicles more aligned with human preferences. ### Experimental Validation The researchers conducted experiments on the SUMO platform, developing two types of lane-changing decision models: conservative and aggressive. By comparing the pre-trained model with the RLHF fine-tuned model, they validated the effectiveness of RLHF in obstacle avoidance and mixed autonomous traffic scenarios. ### Conclusion This study demonstrates the potential of RLHF in improving the human-likeness of lane-changing behavior in autonomous vehicles, helping to enhance the behavior predictability and safety of autonomous vehicles in mixed traffic environments.

Reinforcement Learning from Human Feedback for Lane Changing of Autonomous Vehicles in Mixed Traffic

A Reinforcement Learning Approach to Smart Lane Changes of Self-driving Cars

Developing Smart Lane-changing Strategies for CAVs on Freeways based on MOBIL and Reinforcement Learning

A Q-learning Foresighted Approach to Ego-efficient Lane Changes of Connected and Automated Vehicles on Freeways.

Multi-agent reinforcement learning for cooperative lane changing of connected and autonomous vehicles in mixed traffic

An Improved Hierarchical Deep Reinforcement Learning Algorithm for Multi-Intelligent Vehicle Lane Change

Harmonious Lane Changing via Deep Reinforcement Learning

A Safe and Efficient Lane Change Decision-Making Strategy of Autonomous Driving Based on Deep Reinforcement Learning

Modeling and field experiments on autonomous vehicle lane changing with surrounding human-driven vehicles

LCS-TF: Multi-Agent Deep Reinforcement Learning-Based Intelligent Lane-Change System for Improving Traffic Flow

Personalized Lane Change Decision Algorithm Using Deep Reinforcement Learning Approach.

Research on Lane Changing Game and Behavioral Decision Making Based on Driving Styles and Micro-Interaction Behaviors

Automated Lane Change Strategy using Proximal Policy Optimization-based Deep Reinforcement Learning

A Novel Dynamic Lane-Changing Trajectory Planning Model for Automated Vehicles Based on Reinforcement Learning

Autonomous Highway Driving using Deep Reinforcement Learning

Decision Making of Autonomous Vehicles in Lane Change Scenarios: Deep Reinforcement Learning Approaches with Risk Awareness

Cooperative lane-changing in mixed traffic: a deep reinforcement learning approach

Deep Reinforcement Learning Reward Function Design for Autonomous Driving in Lane-Free Traffic

Inverse Reinforcement Learning Based: Segmented Lane-Change Trajectory Planning with Consideration of Interactive Driving Intention

Combining Decision Making and Trajectory Planning for Lane Changing Using Deep Reinforcement Learning

Safe Decision-making for Lane-change of Autonomous Vehicles via Human Demonstration-aided Reinforcement Learning