Diverse Visual Question Generation based on Multiple Objects Selection

Wenhao Fang,Jiayuan Xie,Hongfei Liu,Jiali Chen,Yi Cai
DOI: https://doi.org/10.1145/3640014
2024-01-15
Abstract:Visual question generation task aims at generating high-quality questions about a given image. To make this task applicable to various scenarios, e.g., the growing demand for exams, it is important to generate diverse questions. The existing methods for this task control diverse question generation based on different question types, e.g., “what” and “when”. Although different question types lead to description diversity, they cannot guarantee semantic diversity when asking the same objects. Research in the field of psychology shows that humans pay attention to different objects in an image based on their preferences, which is beneficial to constructing semantically diverse questions. According to the research, we propose a multi-selector visual question generation (MS-VQG) model, which aims to focus on different objects to generate diverse questions. Specifically, our MS-VQG model employs multiple selectors to imitate different humans to select different objects in a given image. Based on these different selected objects, our MS-VQG model can generate diverse questions corresponding to each selector. Extensive experiments on two datasets show that our proposed model outperforms the baselines in generating diverse questions.
computer science, information systems, theory & methods, software engineering
What problem does this paper attempt to address?