A Full-Text Learning to Rank Dataset for Medical Information Retrieval

Vera Boteva,Demian Gholipour,Artem Sokolov,Stefan Riezler
DOI: https://doi.org/10.1007/978-3-319-30671-1_58
2016-01-01
Abstract:We present a dataset for learning to rank in the medical domain, consisting of thousands of full-text queries that are linked to thousands of research articles. The queries are taken from health topics described in layman’s English on the non-commercial www.NutritionFacts.org website; relevance links are extracted at 3 levels from direct and indirect links of queries to research articles on PubMed. We demonstrate that ranking models trained on this dataset by far outperform standard bag-of-words retrieval models. The dataset can be downloaded from: www.cl.uni-heidelberg.de/statnlpgroup/nfcorpus/.
What problem does this paper attempt to address?