Towards Scenario Retrieval of Real Driving Data with Large Vision-Language Models

Tim Brühl,Lukas Ewecker,Frank Oechsle,Maximilian Dillitzer,Eric Sax,P. Elspas,Tin Stribor Sohn,Robin Schwager,Lena Dalke
DOI: https://doi.org/10.5220/0012738500003702
Abstract:: With the adoption of autonomous driving systems and scenario-based testing, there is a growing need for efficient methods to understand and retrieve driving scenarios from vast amounts of real-world driving data. As manual scenario selection is labor-intensive and limited in scalability, this study explores the use of three Large Vision-Language Models, CLIP, BLIP-2, and BakLLaVA, for scenario retrieval. The ability of the models to retrieve relevant scenarios based on natural language queries is evaluated using a diverse benchmark dataset of real-world driving scenarios and a precision metric. Factors such as scene complexity, weather conditions, and different traffic situations are incorporated into the method through the 6-Layer Model to measure the effectiveness of the models across different driving contexts. This study contributes to the understanding of the capabilities and limitations of Large Vision-Language Models in the context of driving scenario retrieval and provides implications for future research directions.
Engineering,Computer Science
What problem does this paper attempt to address?