Alleviating Shortcut Learning Behavior of VQA Model with Context Augmentation and Adaptive Loss Adjustment

ZeRong Zeng,Ruifang Liu,Huan Wang
DOI: https://doi.org/10.1117/12.2661996
2022-01-01
Abstract:Despite the impressive improvements of Visual Question Answer (VQA), it still remains a challenge of how to avoid the suffering of spurious correlations from textual content to answer. Previous researches have shown that due to the existence of language bias in the VQA dataset, VQA models may tend to capture superficial statistical correlation and suffer from the poor generalization capability in the out-of-distribution data. To alleviate the biases caused by language modality, we propose a method of context augmentation and adaptive loss adjustment, which can alleviate shortcut learning behavior of VQA models. Specifically, the existence of language bias is due to the high co-occurrence frequency of categories and the words in “Question”, therefore, we propose to use “Paraphrase Generation” to produce paraphrases with diverse contexts, so as to mitigate such correlation. Secondly, we use adaptive loss adjustment to adjust the importance of samples, that is, reduce the importance of bias-aligned samples and improve the importance of bias-conflicting samples, so as to guide the model to capture the intrinsic attributes that are beneficial to generalization. The experiments have demonstrated the feasibility and validity of our method on a variety of VQA models.
What problem does this paper attempt to address?