Abstract:Background: Biomedical sciences, with their focus on human health and disease, have attracted unprecedented attention in the 21st century. The proliferation of biomedical sciences has also led to a large number of scientific articles being produced, which makes it difficult for biomedical researchers to find relevant articles and hinders the dissemination of valuable discoveries. To bridge this gap, the research community has initiated the article recommendation task, with the aim of recommending articles to biomedical researchers automatically based on their research interests. Over the past two decades, many recommendation methods have been developed. However, an algorithm-level comparison and rigorous evaluation of the most important methods on a shared dataset is still lacking. Method: In this study, we first investigate 15 methods for automated article recommendation in the biomedical domain. We then conduct an empirical evaluation of the 15 methods, including six term-based methods, two word embedding methods, three sentence embedding methods, two document embedding methods, and two BERT-based methods. These methods are evaluated in two scenarios: article-oriented recommenders and user-oriented recommenders, with two publicly available datasets: TREC 2005 Genomics and RELISH, respectively. Results: Our experimental results show that the text representation models BERT and BioSenVec outperform many existing recommendation methods (e.g., BM25, PMRA, XPRC) and web-based recommendation systems (e.g., MScanner, MedlineRanker, BioReader) on both datasets regarding most of the evaluation metrics, and fine-tuning can improve the performance of the BERT-based methods. Conclusions: Our comparison study is useful for researchers and practitioners in selecting the best modeling strategies for building article recommendation systems in the biomedical domain. The code and datasets are publicly available.

A Comparison Between Term-Based and Embedding-Based Methods for Initial Retrieval

A LDA Topic Model Based Collection Selection Method for Distributed Information Retrieval

Beyond Precision: A Study on Recall of Initial Retrieval with Neural Representations

Leveraging Semantic and Lexical Matching to Improve the Recall of Document Retrieval Systems: A Hybrid Approach

Divide and Conquer: Towards Better Embedding-based Retrieval for Recommender Systems From a Multi-task Perspective

Semantic Models for the First-Stage Retrieval: A Comprehensive Review

pEBR: A Probabilistic Approach to Embedding Based Retrieval

Comparing Neighbors Together Makes it Easy: Jointly Comparing Multiple Candidates for Efficient and Effective Retrieval

Semantic Matching by Non-Linear Word Transportation for Information Retrieval

Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

Learning to Combine Ad-hoc Ranking Functions for Image Retrieval

Composite Re-Ranking for Efficient Document Search with BERT

Multi-word Term Embeddings Improve Lexical Product Retrieval

A comparative evaluation of biomedical similar article recommendation

Term Selection and Result Reranking for Question Retrieval by Exploiting Hierarchical Classification.

Evaluate Retrieval Systems Based on Ontology Vocabulary.

Embedding-based Product Retrieval in Taobao Search

A Discriminative Semantic Ranker for Question Retrieval

Feedback Model for Microblog Retrieval.

Semi-Parametric Retrieval via Binary Token Index