LORD: Large Models based Opposite Reward Design for Autonomous Driving

Xin Ye,Feng Tao,Abhirup Mallik,Burhaneddin Yaman,Liu Ren
2024-03-28
Abstract:Reinforcement learning (RL) based autonomous driving has emerged as a promising alternative to data-driven imitation learning approaches. However, crafting effective reward functions for RL poses challenges due to the complexity of defining and quantifying good driving behaviors across diverse scenarios. Recently, large pretrained models have gained significant attention as zero-shot reward models for tasks specified with desired linguistic goals. However, the desired linguistic goals for autonomous driving such as "drive safely" are ambiguous and incomprehensible by pretrained models. On the other hand, undesired linguistic goals like "collision" are more concrete and tractable. In this work, we introduce LORD, a novel large models based opposite reward design through undesired linguistic goals to enable the efficient use of large pretrained models as zero-shot reward models. Through extensive experiments, our proposed framework shows its efficiency in leveraging the power of large pretrained models for achieving safe and enhanced autonomous driving. Moreover, the proposed approach shows improved generalization capabilities as it outperforms counterpart methods across diverse and challenging driving scenarios.
Robotics,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
This paper discusses the challenges of using reinforcement learning (RL) in the field of autonomous driving, particularly in designing effective reward functions. Traditional data-driven imitation learning methods rely on a large amount of data, while RL can optimize driving strategies by interacting with the environment. However, the formulation of reward functions is complex. The paper proposes a new method called LORD (Large-scale Model-based Reverse Reward Design), which utilizes unexpected language goals (such as "collision") to design reward functions and overcome the difficulty of pre-trained large-scale models understanding ambiguous goals (such as "safe driving"). LORD utilizes large-scale pre-trained image, video, and language models to generate rewards by calculating the cosine distance between the current state and unexpected goals. This approach improves the interpretability, generalization ability, and efficiency of reward functions, enabling autonomous driving systems to operate more safely in various complex scenarios. Experiments show that LORD significantly outperforms other methods in different driving scenarios. The main contributions of the paper are as follows: 1. Introducing the LORD framework, which combines large-scale pre-trained models with unexpected goals for applied AI, addressing the ambiguity of language goals. 2. Using pre-trained image, video, and language models to design RL's reward function through cosine similarity objective. 3. Experiments demonstrate that LORD has better generalization performance compared to other methods in various challenging driving scenarios. In summary, the paper proposes a novel approach to improving the application of reinforcement learning in autonomous driving by utilizing large-scale pre-trained models to design reward functions, enhancing system safety and performance.