Aliababa DAMO Academy at TREC Precision Medicine 2020: State-of-the-art Evidence Retriever for Precision Medicine with Expert-in-the-loop Active Learning

Qiao Jin,Chuanqi Tan,Mosha Chen,Ming Yan,Songfang Huang,Ningyu Zhang,Xiaozhong Liu
2020-01-01
Abstract:This paper describes the submissions of Alibaba DAMO Academy to the TREC Precision Medicine (PM) Track in 2020, which achieve state-of-the-art performance in the evidence quality assessment. The focus of the TREC PM Track is to retrieve academic papers that report critical clinical evidence for or against a given treatment in a population specified by its disease and gene mutation. We use a two-step approach that includes: 1) a baseline retriever using query expansion with Elasticsearch (ES) and 2) an automatic or expert-in-the-loop reranker: the automatic re-ranker uses features of the ES scores, pre-trained BioBERT scores, publication type scores and citation count scores; the expert-in-the-loop re-ranker uses expert annotations, fine-tuned BioBERT as well as features used in the automatic re-ranker. For the expert-in-the-loop re-ranker, we use a novel active learning annotation strategy that is sample-efficient: at each iteration of the annotation, 1) we fine-tune the BioBERT using all expert annotations of query-document relevance; 2) we let human experts annotate the actual relevance of the most relevant unannotated query-document pairs predicted by the fine-tuned BioBERT. Our submissions outperform the median topic-wise scores in the phase 1 assessment for general relevance and achieve state-of-the-art performance in the phase 2 assessment for evidence quality. Our analyses show that evidence quality is a distinct aspect than the general relevance, and thus additional modeling of it is necessary to assist IR for Evidence-based Precision Medicine
What problem does this paper attempt to address?