Abstract:A safe and efficient decision-making system is crucial for autonomous vehicles. However, the complexity of driving environments limits the effectiveness of many rule-based and machine learning approaches. Reinforcement Learning, with its robust self-learning capabilities and environmental adaptability, offers a promising solution to these challenges. Nevertheless, safety and efficiency concerns during training hinder its widespread application. To address these concerns, we propose a novel RL framework, Simple to Complex Collaborative Decision (S2CD). First, we rapidly train the teacher model in a lightweight simulation environment. In the more complex and realistic environment, the teacher intervenes when the student agent exhibits suboptimal behavior by assessing actions' value to avert dangers. We also introduce an RL algorithm called Adaptive Clipping Proximal Policy Optimization (ACPPO), which combines samples from both teacher and student policies and employs dynamic clipping strategies based on sample importance. This approach improves sample efficiency while effectively alleviating data imbalance. Additionally, we employ the Kullback-Leibler divergence as a policy constraint, transforming it into an unconstrained problem with the Lagrangian method to accelerate the student's learning. Finally, a gradual weaning strategy ensures that the student learns to explore independently over time, overcoming the teacher's limitations and maximizing performance. Simulation experiments in highway lane-change scenarios show that the S2CD framework enhances learning efficiency, reduces training costs, and significantly improves safety compared to state-of-the-art algorithms. This framework also ensures effective knowledge transfer between teacher and student models, even with a suboptimal teacher, the student achieves superior performance, demonstrating the robustness and effectiveness of S2CD.

Learning by Reusing Previous Advice: a Memory-Based Teacher–student Framework

Teacher-Student Framework: a Reinforcement Learning Approach

Learning on a Budget via Teacher Imitation

Don't Forget Your Teacher: A Corrective Reinforcement Learning Framework

Methodical Advice Collection and Reuse in Deep Reinforcement Learning

Action Advising with Advice Imitation in Deep Reinforcement Learning

A federated advisory teacher–student framework with simultaneous learning agents

Learning to Teach Reinforcement Learning Agents

A Q-values Sharing Framework for Multiagent Reinforcement Learning under Budget Constraint

Teacher Agent: A Knowledge Distillation-Free Framework for Rehearsal-based Video Incremental Learning

Enabling Robust DRL-Driven Networking Systems Via Teacher-Student Learning

Integrating human learning and reinforcement learning: A novel approach to agent training

ACTRCE: Augmenting Experience via Teacher's Advice For Multi-Goal Reinforcement Learning

Learning and reusing primitive behaviours to improve Hindsight Experience Replay sample efficiency

Goal-oriented Knowledge Reuse Via Curriculum Evolution for Reinforcement Learning-based Adaptation

Multi-Agent Advisor Q-Learning

AdaMemento: Adaptive Memory-Assisted Policy Optimization for Reinforcement Learning

A location-based advising method in teacher–student frameworks

RLTutor: Reinforcement Learning Based Adaptive Tutoring System by Modeling Virtual Student with Fewer Interactions

Theoretically-Grounded Policy Advice from Multiple Teachers in Reinforcement Learning Settings with Applications to Negative Transfer

Knowledge Transfer from Simple to Complex: A Safe and Efficient Reinforcement Learning Framework for Autonomous Driving Decision-Making