SYSU-HCP at VQA-Med 2021: A Data-centric Model with Efficient Training Methodology for Medical Visual Question Answering.

Haifan Gong,Ricong Huang,Guanqi Chen,Guanbin Li
2021-01-01
Abstract:This paper describes our contribution to the Visual Question Answering Task in the Medical Domain at ImageCLEF 2021. We propose the method with a core idea that the model design and the training should best suit the feature of the data. Specifically, we design a hierarchical feature extraction structure to capture multi-scale features of medical images. To alleviate the issue of data limitation, we apply the mixup strategy for data augmentation during the training process. Based on the observation that there exist hard samples, we introduce the curriculum learning paradigm to resolve this issue. Last but not least, we apply label smoothing and ensemble training to avoid the model bias on the data. The proposed method achieves 1st place in the competition with 0.382 in accuracy and 0.416 in BLEU. Our code and model are available at https://github.com/Rodger-Huang/SYSU-HCP-at-ImageCLEF-VQA-Med-2021.
What problem does this paper attempt to address?