Automatically Extracting High-Quality Negative Examples for Answer Selection in Question Answering

Haotian Zhang,Jinfeng Rao,Jimmy Lin,Mark D. Smucker
DOI: https://doi.org/10.1145/3077136.3080645
2017-08-07
Abstract:We propose a heuristic called "one answer per document" for automatically extracting high-quality negative examples for answer selection in question answering. Starting with a collection of question-answer pairs from the popular TrecQA dataset, we identify the original documents from which the answers were drawn. Sentences from these source documents that contain query terms (aside from the answers) are selected as negative examples. Training on the original data plus these negative examples yields improvements in effectiveness by a margin that is comparable to successive recent publications on this dataset. Our technique is completely unsupervised, which means that the gains come essentially for free. We confirm that the improvements can be directly attributed to our heuristic, as other approaches to extracting comparable amounts of training data are not effective. Beyond the empirical validation of this heuristic, we also share our improved TrecQA dataset with the community to support further work in answer selection.
What problem does this paper attempt to address?