Improving the Robustness of QA Models to Challenge Sets with Variational Question-Answer Pair Generation

Kazutoshi Shinoda,Saku Sugawara,Akiko Aizawa
DOI: https://doi.org/10.48550/arXiv.2004.03238
2021-06-04
Abstract:Question answering (QA) models for reading comprehension have achieved human-level accuracy on in-distribution test sets. However, they have been demonstrated to lack robustness to challenge sets, whose distribution is different from that of training sets. Existing data augmentation methods mitigate this problem by simply augmenting training sets with synthetic examples sampled from the same distribution as the challenge sets. However, these methods assume that the distribution of a challenge set is known a priori, making them less applicable to unseen challenge sets. In this study, we focus on question-answer pair generation (QAG) to mitigate this problem. While most existing QAG methods aim to improve the quality of synthetic examples, we conjecture that diversity-promoting QAG can mitigate the sparsity of training sets and lead to better robustness. We present a variational QAG model that generates multiple diverse QA pairs from a paragraph. Our experiments show that our method can improve the accuracy of 12 challenge sets, as well as the in-distribution accuracy. Our code and data are available at <a class="link-external link-https" href="https://github.com/KazutoshiShinoda/VQAG" rel="external noopener nofollow">this https URL</a>.
Computation and Language,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **Improve the robustness of question - answering (QA) models on challenge datasets**. Specifically, although existing QA models have achieved human - comparable accuracy on in - distribution test sets, their performance on challenge datasets is not robust enough. The distributions of these challenge datasets are different from those of the training sets. Existing data augmentation methods alleviate this problem by sampling synthetic samples from the same distribution as the challenge datasets. However, these methods assume that the distribution of the challenge datasets is known, so their effectiveness is limited when facing unknown challenge datasets. To solve this problem, the author proposes a variational question - answer pair generation model (VQAG) based on variational auto - encoders, aiming to improve the diversity of the training set by generating diverse question - answer pairs, thereby enhancing the robustness of QA models. Specific contributions include: 1. **Propose a variational question - answer pair generation model (VQAG) with explicit KL control**, which can significantly generate diverse answers and questions. 2. **Use this model to construct a synthetic QA dataset** to improve the performance of QA models on in - distribution test sets and achieve results comparable to existing question - answer pair generation (QAG) methods. 3. **Find that this method also achieves meaningful improvements on unseen challenge datasets** and further improves performance through a simple integration method. Through these improvements, the author hopes to significantly improve the robustness of QA models on various challenge datasets without sacrificing in - distribution accuracy.