Abstract:Objective: With the onset of the Coronavirus Disease 2019 (COVID-19) pandemic, there has been a surge in the number of publicly available biomedical information sources, which makes it an increasingly challenging research goal to retrieve a relevant text to a topic of interest. In this paper, we propose a Contextual Query Expansion framework based on the clinical Domain knowledge (CQED) for formalizing an effective search over PubMed to retrieve relevant COVID-19 scholarly articles to a given information need. Materials and methods: For the sake of training and evaluation, we use the widely adopted TREC-COVID benchmark. Given a query, the proposed framework utilizes a contextual and a domain-specific neural language model to generate a set of candidate query expansion terms that enrich the original query. Moreover, the framework includes a multi-head attention mechanism that is trained alongside a learning-to-rank model for re-ranking the list of generated expansion candidate terms. The original query and the top-ranked expansion terms are posed to the PubMed search engine for retrieving relevant scholarly articles to an information need. The framework, CQED, can have four different variations, depending upon the learning path adopted for training and re-ranking the candidate expansion terms. Results: The model drastically improves the search performance, when compared to the original query. The performance improvement in comparison to the original query, in terms of RECALL@1000 is 190.85% and in terms of NDCG@1000 is 343.55%. Additionally, the model outperforms all existing state-of-the-art baselines. In terms of P@10, the model that has been optimized based on Precision outperforms all baselines (0.7987). On the other hand, in terms of NDCG@10 (0.7986), MAP (0.3450) and bpref (0.4900), the CQED model that has been optimized based on an average of all retrieval measures outperforms all the baselines. Conclusion: The proposed model successfully expands queries posed to PubMed, and improves search performance, as compared to all existing baselines. A success/failure analysis shows that the model improved the search performance of each of the evaluated queries. Moreover, an ablation study depicted that if ranking of generated candidate terms is not conducted, the overall performance decreases. For future work, we would like to explore the application of the presented query expansion framework in conducting technology-assisted Systematic Literature Reviews (SLR).

Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature

Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature

Efficient Self-Supervised Metric Information Retrieval: A Bibliography Based Method Applied to COVID Literature

Advancing PICO Element Detection in Biomedical Text via Deep Neural Networks

Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains.

CMT in TREC-COVID Round 2: Mitigating the Generalization Gaps from Web to Special Domain Search

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

A Search Engine for Discovery of Scientific Challenges and Directions

SLEDGE-Z: A Zero-Shot Baseline for COVID-19 Literature Search

SERVAL: Synergy Learning between Vertical Models and LLMs towards Oracle-Level Zero-shot Medical Prediction

A knowledge-based learning framework for self-supervised pre-training towards enhanced recognition of biomedical microscopy images

Learning to rank query expansion terms for COVID-19 scholarly search

Pre-trained Language Models in Biomedical Domain: A Systematic Survey

Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

Improving Biomedical Information Retrieval with Neural Retrievers

Pre-training technique to localize medical BERT and enhance biomedical BERT

Pre-training Methods in Information Retrieval

MedCPT: Contrastive Pre-trained Transformers with Large-scale PubMed Search Logs for Zero-shot Biomedical Information Retrieval

CO-Search: COVID-19 Information Retrieval with Semantic Search, Question Answering, and Abstractive Summarization