Abstract:BackgroundCurrently, most people use NCBI's PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information. However, PubMed has some drawbacks that make it difficult to find relevant publications pertaining to users' individual intentions, especially for non-expert users. To ameliorate the disadvantages of PubMed, we developed G-Bean, a g raph based b iomedical sea rch en gine, to search biomedical articles in MEDLINE database more efficiently.MethodsG-Bean addresses PubMed's limitations with three innovations: (1) Parallel document index creation: a multithreaded index creation strategy is employed to generate the document index for G-Bean in parallel; (2) Ontology-graph based query expansion: an ontology graph is constructed by merging four major UMLS (Version 2013AA) vocabularies, MeSH, SNOMEDCT, CSP and AOD, to cover all concepts in National Library of Medicine (NLM) database; a Personalized PageRank algorithm is used to compute concept relevance in this ontology graph and the Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme is used to re-rank the concepts. The top 500 ranked concepts are selected for expanding the initial query to retrieve more accurate and relevant information; (3) Retrieval and re-ranking of documents based on user's search intention: after the user selects any article from the existing search results, G-Bean analyzes user's selections to determine his/her true search intention and then uses more relevant and more specific terms to retrieve additional related articles. The new articles are presented to the user in the order of their relevance to the already selected articles.ResultsPerformance evaluation with 106 OHSUMED benchmark queries shows that G-Bean returns more relevant results than PubMed does when using these queries to search the MEDLINE database. PubMed could not even return any search result for some OHSUMED queries because it failed to form the appropriate Boolean query statement automatically from the natural language query strings. G-Bean is available at http://bioinformatics.clemson.edu/G-Bean/index.php.ConclusionsG-Bean addresses PubMed's limitations with ontology-graph based query expansion, automatic document indexing, and user search intention discovery. It shows significant advantages in finding relevant articles from the MEDLINE database to meet the information need of the user.

Gene Related Mining of Biomedical Literatures

Literature-Mining For Genes Based Natural Language Processing And Biomedical Ontology

Literature Mining Associations of Diseases Using Gene Ontology

The GenExtractor: A Web-Based Bioinformation Mining System

GeneSUM: Large Language Model-based Gene Summary Extraction

Extracting Relationship Both Gene2Disease and Gene2Gene from Biomedical Literatures

Generating Gene Summaries from Biomedical Literature: A Study of Semi-Structured Summarization.

&Lt;title>gdrms: a System for Automatic Extraction of the Disease-Centre Relation</title>

Biotopic: A Topic-Driven Biological Literature Mining System

Towards Automatic Generation of Gene Summary

G-Bean: an ontology-graph based web tool for biomedical literature retrieval

Development and Validation of an AI‐Driven System for Automatic Literature Analysis and Molecular Regulatory Network Construction

GeneNetMiner: accurately mining gene regulatory networks from literature

MedMiner: An Internet Text-Mining Tool for Biomedical Information, with Application to Gene Expression Profiling

Text mining approach for relationships between genes and diseases

Biomedical literature mining: graph kernel-based learning for gene–gene interaction extraction

Gene Name Automatic Recognition in Biomedical Literature

Learning to Rank-Based Gene Summary Extraction

Research on Text Mining of Biomedical Field Based on Pubmed

A Semantic-Based Approach for Mining Undiscovered Public Knowledge from Biomedical Literature.

Knowledge Discovery in Biomedical Literature:Survey and Prospect