LORD: Large Models based Opposite Reward Design for Autonomous Driving

Xin Ye,Feng Tao,Abhirup Mallik,Burhaneddin Yaman,Liu Ren

2024-03-28

Abstract:Reinforcement learning (RL) based autonomous driving has emerged as a promising alternative to data-driven imitation learning approaches. However, crafting effective reward functions for RL poses challenges due to the complexity of defining and quantifying good driving behaviors across diverse scenarios. Recently, large pretrained models have gained significant attention as zero-shot reward models for tasks specified with desired linguistic goals. However, the desired linguistic goals for autonomous driving such as "drive safely" are ambiguous and incomprehensible by pretrained models. On the other hand, undesired linguistic goals like "collision" are more concrete and tractable. In this work, we introduce LORD, a novel large models based opposite reward design through undesired linguistic goals to enable the efficient use of large pretrained models as zero-shot reward models. Through extensive experiments, our proposed framework shows its efficiency in leveraging the power of large pretrained models for achieving safe and enhanced autonomous driving. Moreover, the proposed approach shows improved generalization capabilities as it outperforms counterpart methods across diverse and challenging driving scenarios.

Robotics,Artificial Intelligence,Machine Learning

What problem does this paper attempt to address?

This paper discusses the challenges of using reinforcement learning (RL) in the field of autonomous driving, particularly in designing effective reward functions. Traditional data-driven imitation learning methods rely on a large amount of data, while RL can optimize driving strategies by interacting with the environment. However, the formulation of reward functions is complex. The paper proposes a new method called LORD (Large-scale Model-based Reverse Reward Design), which utilizes unexpected language goals (such as "collision") to design reward functions and overcome the difficulty of pre-trained large-scale models understanding ambiguous goals (such as "safe driving"). LORD utilizes large-scale pre-trained image, video, and language models to generate rewards by calculating the cosine distance between the current state and unexpected goals. This approach improves the interpretability, generalization ability, and efficiency of reward functions, enabling autonomous driving systems to operate more safely in various complex scenarios. Experiments show that LORD significantly outperforms other methods in different driving scenarios. The main contributions of the paper are as follows: 1. Introducing the LORD framework, which combines large-scale pre-trained models with unexpected goals for applied AI, addressing the ambiguity of language goals. 2. Using pre-trained image, video, and language models to design RL's reward function through cosine similarity objective. 3. Experiments demonstrate that LORD has better generalization performance compared to other methods in various challenging driving scenarios. In summary, the paper proposes a novel approach to improving the application of reinforcement learning in autonomous driving by utilizing large-scale pre-trained models to design reward functions, enhancing system safety and performance.

LORD: Large Models based Opposite Reward Design for Autonomous Driving

Human-centric Reward Optimization for Reinforcement Learning-based Automated Driving using Large Language Models

Generating and Evolving Reward Functions for Highway Driving with Large Language Models

Large Language Model guided Deep Reinforcement Learning for Decision Making in Autonomous Driving

VLM-RL: A Unified Vision Language Models and Reinforcement Learning Framework for Safe Autonomous Driving

A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement Learning

HGRL: Human-Driving-Data Guided Reinforcement Learning for Autonomous Driving

LLM4RL: Enhancing Reinforcement Learning with Large Language Models

Self-Refined Large Language Model as Automated Reward Function Designer for Deep Reinforcement Learning in Robotics

REvolve: Reward Evolution with Large Language Models using Human Feedback

Drive Like a Human: Rethinking Autonomous Driving with Large Language Models

Secrets of RLHF in Large Language Models Part II: Reward Modeling

Driving Behavior Modeling Using Naturalistic Human Driving Data With Inverse Reinforcement Learning

Large Language Models for Autonomous Driving (LLM4AD): Concept, Benchmark, Simulation, and Real-Vehicle Experiment

Lexicographic Actor-Critic Deep Reinforcement Learning for Urban Autonomous Driving

Receive, Reason, and React: Drive as You Say, With Large Language Models in Autonomous Vehicles

Prompting Multi-Modal Tokens to Enhance End-to-End Autonomous Driving Imitation Learning with LLMs

Optimizing Autonomous Driving for Safety: A Human-Centric Approach with LLM-Enhanced RLHF

Survival-Oriented Reinforcement Learning Model: an Effcient and Robust Deep Reinforcement Learning Algorithm for Autonomous Driving Problem.

CLIP-RLDrive: Human-Aligned Autonomous Driving via CLIP-Based Reward Shaping in Reinforcement Learning

Confronting Reward Model Overoptimization with Constrained RLHF