Abstract:Today's advanced driver assistance systems (ADAS), like adaptive cruise control or rear collision warning, are finding broader adoption across vehicle classes. Integrating such advanced, multimodal Large Language Models (LLMs) on board a vehicle, which are capable of processing text, images, audio, and other data types, may have the potential to greatly enhance passenger comfort. Yet, an LLM's hallucinations are still a major challenge to be addressed. In this paper, we systematically assessed potential hallucination detection strategies for such LLMs in the context of object detection in vision-based data on the example of pedestrian detection and localization. We evaluate three hallucination detection strategies applied to two state-of-the-art LLMs, the proprietary GPT-4V and the open LLaVA, on two datasets (Waymo/US and PREPER CITY/Sweden). Our results show that these LLMs can describe a traffic situation to an impressive level of detail but are still challenged for further analysis activities such as object localization. We evaluate and extend hallucination detection approaches when applying these LLMs to video sequences in the example of pedestrian detection. Our experiments show that, at the moment, the state-of-the-art proprietary LLM performs much better than the open LLM. Furthermore, consistency enhancement techniques based on voting, such as the Best-of-Three (BO3) method, do not effectively reduce hallucinations in LLMs that tend to exhibit high false negatives in detecting pedestrians. However, extending the hallucination detection by including information from the past helps to improve results.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to systematically evaluate and enhance the credibility of large - language models (LLMs) in perception tasks, especially in the tasks of pedestrian detection and localization in the Advanced Driver - Assistance Systems (ADAS). Specifically, the paper focuses on how to detect and reduce the hallucinations generated by LLMs when processing visual data to ensure the reliability of these models in safety - critical applications. #### Main research questions include: 1. **What are the potential hallucination - detection strategies?** - Research the hallucination - detection methods proposed in the existing literature and evaluate their applicability in ADAS/AD. 2. **How do hallucinations manifest when applying LLMs to pedestrian detection and localization?** - Analyze the types of hallucinations that may occur when LLMs are processing pedestrian - detection tasks, such as false positives and false negatives. 3. **How can the hallucination - detection strategies be enhanced for use in ADAS/AD perception and monitoring systems?** - Explore and propose methods to improve hallucination detection, for example, using historical frame information to improve detection accuracy. #### Specific application scenarios: - **Pedestrian detection and localization**: Detect pedestrians through visual data (such as image sequences) and evaluate the performance of LLMs in this task. - **Hallucination detection**: Evaluate the effectiveness of different hallucination - detection strategies, especially the Best - of - Three (BO3) method and its improved version. #### Datasets: - Use the Waymo and PREPER CITY datasets, which cover pedestrian scenes in urban environments in the United States and Sweden. #### Models: - Evaluate two state - of - the - art LLMs: GPT - 4V and LLaVA, and compare their performance in pedestrian - detection tasks. #### Key findings: - The current state - of - the - art proprietary LLM (such as GPT - 4V) performs better than the open - source LLM (such as LLaVA) in pedestrian - detection tasks. - Simply relying on the voting mechanism (such as BO3) cannot effectively reduce hallucinations, especially in the case of a high false - negative rate. - Using historical frame information can significantly improve the effect of hallucination detection. Through these studies, the paper provides important insights and methods for improving the credibility of LLMs in Advanced Driver - Assistance Systems.

Evaluating and Enhancing Trustworthiness of LLMs in Perception Tasks

LLMs Can Check Their Own Results to Mitigate Hallucinations in Traffic Understanding Tasks

Hallucination Detection in LLMs: Fast and Memory-Efficient Fine-Tuned Models

Cost-Effective Hallucination Detection for LLMs

Beyond Fine-Tuning: Effective Strategies for Mitigating Hallucinations in Large Language Models for Data Analytics

An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation

Hallucination Detection and Hallucination Mitigation: An Investigation

Insights into Classifying and Mitigating LLMs' Hallucinations

MedHalu: Hallucinations in Responses to Healthcare Queries by Large Language Models

SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection

LLM Internal States Reveal Hallucination Risk Faced With a Query

Comprehending and Reducing LLM Hallucinations

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation

Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation

Hallucination of Multimodal Large Language Models: A Survey

InterrogateLLM: Zero-Resource Hallucination Detection in LLM-Generated Answers

Look Within, Why LLMs Hallucinate: A Causal Perspective

Creating Trustworthy LLMs: Dealing with Hallucinations in Healthcare AI

The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models

The Two Sides of the Coin: Hallucination Generation and Detection with LLMs as Evaluators for LLMs