PLMVQA: Applying Pseudo Labels for Medical Visual Question Answering with Limited Data.

Zheng Yu,Yutong Xie,Yong Xia,Qi Wu
DOI: https://doi.org/10.1007/978-3-031-47425-5_32
2023-01-01
Abstract:Different from Visual Question Answering (VQA) in the general domain, Medical VQA is more challenging due to the lack of large-scale labeled datasets. In addition, Medical VQA requires high interpretability when making decisions to answer clinical questions. Thus, it should be clear which visual elements within the medical image such as organs or abnormalities are essential for answering clinical questions. To overcome these challenges, we propose a novel method based on Vision Transformer (ViT), which reformulates Medical VQA as a multi-task learning task. We first construct soft pseudo labels of logits for essential selected visual elements from limited annotation data of the existing Medical VQA dataset. Then, we apply these pseudo labels in our proposed Medical VQA model by predicting the answer and pseudo labels at the same time, which not only improves the performance of the proposed model but also presents better interpretability. Extensive experiments on two Medical VQA datasets demonstrate the effectiveness of our proposed method.
What problem does this paper attempt to address?