FastRM: An efficient and automatic explainability framework for multimodal generative models

Gabriela Ben-Melech Stan,Estelle Aflalo,Man Luo,Shachar Rosenman,Tiep Le,Sayak Paul,Shao-Yen Tseng,Vasudev Lal
2024-12-02
Abstract:While Large Vision Language Models (LVLMs) have become masterly capable in reasoning over human prompts and visual inputs, they are still prone to producing responses that contain misinformation. Identifying incorrect responses that are not grounded in evidence has become a crucial task in building trustworthy AI. Explainability methods such as gradient-based relevancy maps on LVLM outputs can provide an insight on the decision process of models, however these methods are often computationally expensive and not suited for on-the-fly validation of outputs. In this work, we propose FastRM, an effective way for predicting the explainable Relevancy Maps of LVLM models. Experimental results show that employing FastRM leads to a 99.8% reduction in compute time for relevancy map generation and an 44.4% reduction in memory footprint for the evaluated LVLM, making explainable AI more efficient and practical, thereby facilitating its deployment in real-world applications.
Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: Although large - scale visual - language models (LVLMs) perform excellently in handling human prompts and visual inputs, they are prone to generate responses containing misinformation. Identifying these unsubstantiated wrong responses is crucial for building trustworthy artificial intelligence. Existing explanation methods, such as gradient - based relevance maps, can provide insights into the model's decision - making process, but these methods are often computationally expensive and not suitable for real - time verification of outputs. Specifically, the paper proposes the FastRM framework, which aims to efficiently predict the explainable relevancy maps of LVLMs, thereby significantly reducing the computation time and memory usage, making explanatory AI more efficient and practical and facilitating its deployment in real - world applications. Experimental results show that using FastRM can reduce the computation time for generating relevancy maps by 99.8% and the memory usage by 44.4%. In summary, the main problem addressed in this paper is how to improve the efficiency and reliability of LVLMs in practical applications while ensuring interpretability, especially in high - risk or interactive scenarios (such as the medical field, self - driving, etc.), making explanatory AI more practical and easier to deploy.