Abstract:Background: For selection and evaluation of potential biomarkers, inclusion of already published information is of utmost importance. In spite of significant advancements in text- and data-mining techniques, the vast knowledge space of biomarkers in biomedical text has remained unexplored. Existing named entity recognition approaches are not sufficiently selective for the retrieval of biomarker information from the literature. The purpose of this study was to identify textual features that enhance the effectiveness of biomarker information retrieval for different indication areas and diverse end user perspectives. Methods: A biomarker terminology was created and further organized into six concept classes. Performance of this terminology was optimized towards balanced selectivity and specificity. The information retrieval performance using the biomarker terminology was evaluated based on various combinations of the terminology's six classes. Further validation of these results was performed on two independent corpora representing two different neurodegenerative diseases. Results: The current state of the biomarker terminology contains 119 entity classes supported by 1890 different synonyms. The result of information retrieval shows improved retrieval rate of informative abstracts, which is achieved by including clinical management terms and evidence of gene/protein alterations (e.g. gene/protein expression status or certain polymorphisms) in combination with disease and gene name recognition. When additional filtering through other classes (e.g. diagnostic or prognostic methods) is applied, the typical high number of unspecific search results is significantly reduced. The evaluation results suggest that this approach enables the automated identification of biomarker information in the literature. A demo version of the search engine SCAIView, including the biomarker retrieval, is made available to the public through http://www.scaiview.com/scaiview-academia.html. Conclusions: The approach presented in this paper demonstrates that using a dedicated biomarker terminology for automated analysis of the scientific literature maybe helpful as an aid to finding biomarker information in text. Successful extraction of candidate biomarkers information from published resources can be considered as the first step towards developing novel hypotheses. These hypotheses will be valuable for the early decision-making in the drug discovery and development process.

Two Phase Indexes Based Passage Retrieval In Biomedical Texts

Cross-Reading by Leveraging a Hybrid Index of Heterogeneous Information.

Multistage and Multi-features Medical Image Retrieval System

Term Extraction and Negation Detection Method in Chinese Clinical Document

A New Biomedical Passage Retrieval Framework for Laboratory Medicine: Leveraging Domain-specific Ontology, Multilevel PRF, and Negation Differential Weighting

A Retrieval System For 3d Multi-Phase Contrast-Enhanced Ct Images Of Focal Liver Lesions Based On Combined Bags Of Visual Words And Texture Words

A Medical Literature Search System for Identifying Effective Treatments in Precision Medicine

A New Two-Step Method for Medical Image Retrieval

Top K Relevant Passage Retrieval for Biomedical Question Answering

Relation-Based document retrieval for biomedical IR

Developing a More Accurate Biomedical Literature Retrieval Method using Deep Learning and Citations in PubMed Central Full-text Articles

Assessment of approximate string matching in a biomedical text retrieval problem

Medical Image Retrieval Based on Gray Level Co-occurrence Matrix and Gradient Phase Mutual Information

Relation-Based document retrieval for biomedical literature databases

Semantic Analysis For Enhanced Medical Retrieval

Indexing the medical open access literature for textual and content-based visual retrieval

Mining biomarker information in biomedical literature

Improving Biomedical Information Retrieval with Neural Retrievers

BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers

MeSHup: A Corpus for Full Text Biomedical Document Indexing

State-of-the-Art Evidence Retriever for Precision Medicine: Algorithm Development and Validation