Abstract:<h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Background</h3><p>Finding specific scientific articles in a large collection is an important natural language processing challenge in the biomedical domain. Systematic reviews and interactive article search are the type of downstream applications that benefit from addressing this problem. The task often involves screening articles for a combination of selection criteria. While machine learning was previously used for this purpose, it is not known if different criteria should be modeled together or separately in an ensemble model. The performance impact of the modern contextual language models on the task is also not known.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Methods</h3><p>We framed the problem as text classification and conducted experiments to compare ensemble architectures, where the selection criteria were mapped to the components of the ensemble. We proposed a novel cascade ensemble analogous to the step-wise screening process employed in developing the gold standard. We compared performance of the ensembles with a single integrated model, which we refer to as the individual task learner (ITL). We used SciBERT, a variant of BERT pre-trained on scientific articles, and conducted experiments using a manually annotated dataset of ∼49K MEDLINE abstracts, known as Clinical Hedges.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Results</h3><p>The cascade ensemble had significantly higher precision (0.663 vs. 0.388 vs. 0.478 vs. 0.320) and F measure (0.753 vs. 0.553 vs. 0.628 vs. 0.477) than ITL and ensembles using Boolean logic and a feed-forward network. However, ITL had significantly higher recall than the other classifiers (0.965 vs. 0.872 vs. 0.917 vs. 0.944). In fixed high recall studies, ITL achieved 0.509 precision @ 0.970 recall and 0.381 precision @ 0.985 recall on a subset that was studied earlier, and 0.295 precision @ 0.985 recall on the full dataset, all of which were improvements over the previous studies.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Conclusion</h3><p>Pre-trained neural contextual language models (e.g. SciBERT) performed well for screening scientific articles. Performance at high fixed recall makes the single integrated model (ITL) more suitable among the architectures considered here, for systematic reviews. However, high F measure of the cascade ensemble makes it a better approach for interactive search applications. The effectiveness of the cascade ensemble architecture suggests broader applicability beyond this task and the dataset, and the approach is analogous to query optimization in Information Retrieval and query optimization in databases.</p>

BERTMeSH: deep contextual representation learning for large-scale high-performance MeSH indexing with full text

FullMeSH: Improving Large-Scale MeSH Indexing with Full Text.

DeepMeSH: Deep Semantic Representation for Improving Large-Scale MeSH Indexing.

MeSHLabeler and DeepMeSH: Recent Progress in Large-Scale MeSH Indexing.

MeSH Now: automatic MeSH indexing at PubMed scale via learning to rank

Recommending MeSH terms for annotating biomedical articles

MeSHup: A Corpus for Full Text Biomedical Document Indexing

MeSHProbeNet: a Self-Attentive Probe Net for MeSH Indexing

Automatic Annotation of PubMed Articles with MeSH Qualifiers

MeSHLabeler: Improving the Accuracy of Large-Scale MeSH Indexing by Integrating Diverse Evidence.

Developing a More Accurate Biomedical Literature Retrieval Method using Deep Learning and Citations in PubMed Central Full-text Articles

BMExpert: Mining MEDLINE for Finding Experts in Biomedical Domains Based on Language Model

Using the contextual language model BERT for multi-criteria classification of scientific articles

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction

Automatic Assignment of Non-Leaf MeSH Terms to Biomedical Articles

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

SMedBERT: A Knowledge-Enhanced Pre-trained Language Model with Structured Semantics for Medical Text Mining

Deep learning to refine the identification of high-quality clinical research articles from the biomedical literature: Performance evaluation

Med-BERT: pre-trained contextualized embeddings on large-scale structured electronic health records for disease prediction

Beyond MeSH: Fine-Grained Semantic Indexing of Biomedical Literature based on Weak Supervision

Deep Learning for Medical Text Processing: BERT Model Fine-Tuning and Comparative Study