Object Hallucination Detection in Large Vision Language Models Via Evidential Conflict

Zhekun Liu,Tao Huang,Rui Wang,Liping Jing
DOI: https://doi.org/10.1007/978-3-031-67977-3_7
2024-01-01
Abstract:Despite their remarkable ability to understand both textual and visual data, large vision-language models (LVLMs) still face issues with hallucination. This is particularly presented as the object hallucination, where the models inaccurately describe objects in the images. Current efforts mainly focus on detecting such erroneous behaviors through the semantic consistency of outputs via multiple inferences or by evaluating the entropy-based uncertainty of predictions. However, the former is resource-intensive, while the latter is often considered a less precise measure due to generally recognized overconfident predictions. To address the issue, we propose an object hallucination detection method based on evidential conflict. To be specific, we view the features in the last layer of the transformer decoder as evidence. Then, we combine the evidence based on Dempster's rule, following the approach presented in the work [6]. Hence, this enables us to detect hallucinations by evaluating the conflict among evidence. Preliminary experiments were conducted on a state-of-the-art LVLM, mPLUG-Owl2. Results show that our approach exhibits an enhancement over baseline methods, particularly in cases with highly uncertain inputs.
What problem does this paper attempt to address?