Abstract:With the rapid growth in the numbers of scientific publications in domains such as neuroscience and medicine, visually interlinking documents in online databases such as PubMed with the purpose of indicating the context of a query results can improve the multi-disciplinary relevance of the search results. Translational medicine and systems biology rely on studies relating basic sciences to applications, often going through multiple disciplinary domains. This paper focuses on the design and development of a new scientific document visualization platform, which allows inferring translational aspects in biosciences within published articles using machine learning and natural language processing (NLP) methods. From online databases, this software platform effectively extracted relationship connections between multiple sub-domains within neuroscience derived from abstracts related to user query. In our current implementation, the document visualization platform employs two clustering algorithms namely Suffix Tree Clustering (STC) and LINGO. Clustering quality was improved by mapping top-ranked cluster labels derived from an UMLS-Metathesaurus using a scoring function. To avoid non-clustered documents, an iterative scheme, called auto-clustering was developed and this allowed mapping previously uncategorized documents during the initial grouping process to relevant clusters. The efficacy of this document clustering and visualization platform was evaluated by expert-based validation of clustering results obtained with unique search terms. Compared to normal clustering, auto-clustering demonstrated better efficacy by generating larger numbers of unique and relevant cluster labels. Using this implementation, a Parkinson’s disease systems theory model was developed and studies based on user queries related to neuroscience and oncology have been showcased as applications.

A Mixture Language Model for Class-Attribute Mining from Biomedical Literature Digital Library

A Language Modeling Text Mining Approach to the Annotation of Protein Community

A Probabilistic Model for Mining Implicit 'chemical Compound-Gene' Relations from Literature

Mining Disease-Specific Molecular Association Profiles from Biomedical Literature: A Case Study

Automated Text Mining of Experimental Methodologies from Biomedical Literature

Application Of A New Probabilistic Model For Mining Implicit Associated Cancer Genes From Omim And Medline

Biotopic: A Topic-Driven Biological Literature Mining System

A latent topic model for mining heterogenous non-randomly missing electronic health records data

A MeSH-based Biomedical Literature Mining Method for Exploring Associations Between Genes and Clinical Terms

MedMine: Examining Pre-trained Language Models on Medication Mining

BMExpert: Mining MEDLINE for Finding Experts in Biomedical Domains Based on Language Model

Extracting LncRNA-protein Interactions from Literature Using a Text Feature-based Approach

AI for Biomedicine in the Era of Large Language Models

Text mining for finding functional community of related genes using TCM knowledge

A Comprehensive Evaluation of Large Language Models in Mining Gene Interactions and Pathway Knowledge

A Study on Integrating Multimodal English and American Literature Resources Using Data Mining

Mining Inter-Relationships in Online Scientific Articles and its Visualization: Natural Language Processing for Systems Biology Modeling

Knowledge Discovery in Biomedical Literature:Survey and Prospect

Automated electrosynthesis reaction mining with multimodal large language models (MLLMs)

Efficient Tag Mining Via Mixture Modeling for Real-Time Search-Based Image Annotation.

Exploration of Attention Mechanism-Enhanced Deep Learning Models in the Mining of Medical Textual Data