Abstract:If a question cannot be answered with the available information, robust systems for question answering (QA) should know _not_ to answer. One way to build QA models that do this is with additional training data comprised of unanswerable questions, created either by employing annotators or through automated methods for unanswerable question generation. To show that the model complexity of existing automated approaches is not justified, we examine a simpler data augmentation method for unanswerable question generation in English: performing antonym and entity swaps on answerable questions. Compared to the prior state-of-the-art, data generated with our training-free and lightweight strategy results in better models (+1.6 F1 points on SQuAD 2.0 data with BERT-large), and has higher human-judged relatedness and readability. We quantify the raw benefits of our approach compared to no augmentation across multiple encoder models, using different amounts of generated data, and also on TydiQA-MinSpan data (+9.3 F1 points with BERT-large). Our results establish swaps as a simple but strong baseline for future work.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to generate unanswerable questions in order to enhance the capabilities of question - answering systems (QA systems), enabling them to recognize and avoid answering questions that cannot be answered based on the existing information. Specifically, the paper proposes a lightweight method to generate unanswerable questions by performing synonym and entity substitution. This method is not only simple but also outperforms existing complex methods on multiple evaluation metrics.
### Background of the Paper and Problem Definition
- **Importance of the Problem**: Queries in the real world often contain unanswerable questions. For example, according to the Wikipedia pages in the top five search results, 37% of fact - seeking user questions are unanswerable (Kwiatkowski et al., 2019). In addition, identifying unanswerable questions is an important feature of reading comprehension, but traditional extractive question - answering systems usually guess a seemingly reasonable answer in these cases (Rajpurkar et al., 2018).
- **Deficiencies of Existing Methods**: Existing methods for automatically generating unanswerable questions usually require a large number of training parameters and complex model structures, but the questions generated by these methods are often only superficially different and are actually still very similar to answerable questions (as shown in Figure 1).
### Main Contributions of the Paper
- **Lightweight Method**: The paper proposes a lightweight method to generate unanswerable questions through antonym swapping and entity swapping. This method does not require additional training parameters, and the generated questions have higher relevance and readability in human evaluation.
- **Performance Improvement**: Experimental results show that the data generated using this method can significantly improve the performance of models on the SQuAD 2.0 and TydiQA - MinSpan datasets, especially with a significant improvement in the F1 score.
- **Simple and Effective**: The paper proves through experiments that the simple synonym and entity substitution methods outperform existing complex methods on multiple models and datasets, indicating that simpler methods may be more effective in generating unanswerable questions.
### Experimental Setup and Results
- **Datasets**: The paper uses two datasets, SQuAD 2.0 and TydiQA - MinSpan (English part), for experiments.
- **Models**: Different variants of BERT, RoBERTa, and ALBERT are used in the experiments.
- **Evaluation Metrics**: The main evaluation metrics include development set performance (EM and F1 scores), unanswerability in human evaluation, relevance, and readability.
- **Results**:
- On the SQuAD 2.0 dataset, the entity substitution method significantly outperforms other methods, with an improvement of 1.6 points in the F1 score.
- On the TydiQA - MinSpan dataset, the entity substitution method also performs well, with an improvement of 9.3 points in the F1 score.
- Human evaluation results show that the questions generated by the method proposed in the paper perform excellently in terms of unanswerability, relevance, and readability.
### Conclusion
The lightweight method proposed in the paper performs excellently in generating unanswerable questions. It not only outperforms existing complex methods on multiple evaluation metrics but also has high practical value in practical applications. Future work can further explore why the data generated by the entity substitution method is so effective for model learning and study how to generate more diverse unanswerable questions.