Abstract:Medical Visual Question Answering (Med-VQA) is expected to predict a convincing answer with the given medical image and clinical question, aiming to assist clinical decision-making. While today's works have intention to rely on the superficial linguistic correlations as a shortcut, which may generate emergent dissatisfactory clinic answers. In this paper, we propose a novel DeBiasing Med-VQA model with CounterFactual training (DeBCF) to overcome language priors comprehensively. Specifically, we generate counterfactual samples by masking crucial keywords and assigning irrelevant labels, which implicitly promotes the sensitivity of the model to the semantic words and visual objects for bias-weaken. Furthermore, to explicitly prevent the cheating linguistic correlations, we formulate the language prior into counterfactual causal effects and eliminate it from the total effect on the generated answers. Additionally, we initiatively present a newly splitting bias-sensitive Med-VQA dataset, Semantically-Labeled Knowledge-Enhanced under Changing Priors (SLAKE-CP) dataset through regrouping and re-splitting the train-set and test-set of SLAKE into the different prior distribution of answers, dedicating the model to learn interpretable objects rather than overwhelmingly memorizing biases. Experimental results on two public datasets and SLAKE-CP demonstrate that the proposed DeBCF outperforms existing state-of-the-art Med-VQA models and obtains significant improvement in terms of accuracy and interpretability. To our knowledge, it's the first attempt to overcome language priors in Med-VQA and construct the bias-sensitive dataset for evaluating debiased ability.

VQA-PDF: Purifying Debiased Features for Robust Visual Question Answering Task

Debiasing Medical Visual Question Answering via Counterfactual Training

Simple and Effective Visual Question Answering in a Single Modality

Overcoming Language Priors In Vqa Via Decomposed Linguistic Representations

Debiased Visual Question Answering via the perspective of question types

Introspective Distillation for Robust Question Answering

Robust visual question answering via polarity enhancement and contrast

Robust Visual Question Answering: Datasets, Methods, and Future Challenges

Suppressing Biased Samples for Robust VQA

Towards Robust Visual Question Answering: Making the Most of Biased Samples via Contrastive Learning

Vqa-bc: robust visual question answering via bidirectional chaining

Eliminating the Language Bias for Visual Question Answering with fine-grained Causal Intervention

Collaborative Modality Fusion for Mitigating Language Bias in Visual Question Answering

Efficient Counterfactual Debiasing for Visual Question Answering

Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA

From Superficial to Deep: Language Bias Driven Curriculum Learning for Visual Question Answering.

Visual Grounding Methods for VQA are Working for the Wrong Reasons!

LPF: A Language-Prior Feedback Objective Function for De-biased Visual Question Answering

Removing Bias of Video Question Answering by Causal Theory

Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder