Visual Robustness Benchmark for Visual Question Answering (VQA)

Md Farhan Ishmam,Ishmam Tashdeed,Talukder Asir Saadat,Md Hamjajul Ashmafee,Abu Raihan Mostofa Kamal,Md. Azam Hossain

2024-10-29

Abstract:Can Visual Question Answering (VQA) systems perform just as well when deployed in the real world? Or are they susceptible to realistic corruption effects e.g. image blur, which can be detrimental in sensitive applications, such as medical VQA? While linguistic or textual robustness has been thoroughly explored in the VQA literature, there has yet to be any significant work on the visual robustness of VQA models. We propose the first large-scale benchmark comprising 213,000 augmented images, challenging the visual robustness of multiple VQA models and assessing the strength of realistic visual corruptions. Additionally, we have designed several robustness evaluation metrics that can be aggregated into a unified metric and tailored to fit a variety of use cases. Our experiments reveal several insights into the relationships between model size, performance, and robustness with the visual corruptions. Our benchmark highlights the need for a balanced approach in model development that considers model performance without compromising the robustness.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the problem of **visual robustness** when visual question answering (VQA) systems are deployed in the real world. Specifically, the paper focuses on the following points: 1. **Actual performance of VQA systems**: Existing VQA systems perform well under ideal conditions, but in the real world, they may be affected by visual interferences such as image blurring and brightness changes, resulting in performance degradation. Especially in sensitive application areas (such as medical VQA), this impact can be fatal. 2. **Evaluation of visual robustness**: Although text robustness has been widely studied in the VQA field, there are relatively few studies on visual robustness, and there is a lack of large - scale benchmark tests and evaluation metrics. Therefore, the paper proposes the first large - scale visual robustness benchmarking framework, which contains 213,000 enhanced images, to evaluate the robustness of multiple VQA models to real - world visual interferences. 3. **Robustness evaluation metrics**: In order to comprehensively evaluate visual robustness, the paper designs 5 new evaluation metrics and aggregates them into a unified measure - **Visual Robustness Error (VRE)**. These metrics can be customized according to specific application scenarios to meet different requirements. 4. **Balance between model performance and robustness**: Through experiments, the paper reveals the relationship between model size, performance and robustness, emphasizing the need to balance these two in the model development process, and not simply pursue high accuracy while ignoring robustness. ### Summary The core problem of the paper is to explore and evaluate the performance of VQA systems in the face of real - world visual interferences, propose a comprehensive evaluation framework and metric system, fill the gaps in existing research, and provide important references and directions for future research.

Visual Robustness Benchmark for Visual Question Answering (VQA)

Visual Robustness Benchmark for Visual Question Answering (VQA)

Robustness Analysis of Visual QA Models by Basic Questions

A Comprehensive Survey on Visual Question Answering Datasets and Algorithms

DARE: Diverse Visual Question Answering with Robustness Evaluation

From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities

Robust Visual Question Answering: Datasets, Methods, and Future Challenges

Assessing the Robustness of Visual Question Answering Models

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks

VQA: Visual Question Answering

Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking

Exploring Weaknesses of VQA Models through Attribution Driven Insights

VQA-GEN: A Visual Question Answering Benchmark for Domain Generalization

Domain-robust VQA with diverse datasets and methods but no target labels

Improving Visual Question Answering Models through Robustness Analysis and In-Context Learning with a Chain of Basic Questions

Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool

Answer Them All! Toward Universal Visual Question Answering Models