Abstract:We introduce a large scale MAchine Reading COmprehension dataset, which we name MS MARCO. The dataset comprises of 1,010,916 anonymized questions---sampled from Bing's search query logs---each with a human generated answer and 182,669 completely human rewritten generated answers. In addition, the dataset contains 8,841,823 passages---extracted from 3,563,535 web documents retrieved by Bing---that provide the information necessary for curating the natural language answers. A question in the MS MARCO dataset may have multiple answers or no answers at all. Using this dataset, we propose three different tasks with varying levels of difficulty: (i) predict if a question is answerable given a set of context passages, and extract and synthesize the answer as a human would (ii) generate a well-formed answer (if possible) based on the context passages that can be understood with the question and passage context, and finally (iii) rank a set of retrieved passages given a question. The size of the dataset and the fact that the questions are derived from real user search queries distinguishes MS MARCO from other well-known publicly available datasets for machine reading comprehension and question-answering. We believe that the scale and the real-world nature of this dataset makes it attractive for benchmarking machine reading comprehension and question-answering models.

English Machine Reading Comprehension Datasets: A Survey

PALRACE: Reading Comprehension Dataset with Human Data and Labeled Rationales

A Survey of Machine Narrative Reading Comprehension Assessments

More Than Reading Comprehension: A Survey on Datasets and Metrics of Textual Question Answering

A Span-Extraction Dataset for Chinese Machine Reading Comprehension

A Chinese Machine Reading Comprehension Dataset Automatic Generated Based on Knowledge Graph

MA-MRC: A Multi-answer Machine Reading Comprehension Dataset

Recent Advances in Multi-Choice Machine Reading Comprehension: A Survey on Methods and Datasets

Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension

Dataset for the First Evaluation on Chinese Machine Reading Comprehension

ESTER: A Machine Reading Comprehension Dataset for Event Semantic Relation Reasoning

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

CliCR: A Dataset of Clinical Case Reports for Machine Reading Comprehension

DRCD: a Chinese Machine Reading Comprehension Dataset

A Survey on Neural Machine Reading Comprehension

Machine Reading Comprehension: a Literature Review

A survey of deep learning techniques for machine reading comprehension

Knowledge Based Machine Reading Comprehension.

How Much Reading Does Reading Comprehension Require? A Critical Investigation of Popular Benchmarks

DREAM: A Challenge Dataset and Models for Dialogue-Based Reading Comprehension

SciMRC: Multi-perspective Scientific Machine Reading Comprehension