Abstract:With the rapid growth in the numbers of scientific publications in domains such as neuroscience and medicine, visually interlinking documents in online databases such as PubMed with the purpose of indicating the context of a query results can improve the multi-disciplinary relevance of the search results. Translational medicine and systems biology rely on studies relating basic sciences to applications, often going through multiple disciplinary domains. This paper focuses on the design and development of a new scientific document visualization platform, which allows inferring translational aspects in biosciences within published articles using machine learning and natural language processing (NLP) methods. From online databases, this software platform effectively extracted relationship connections between multiple sub-domains within neuroscience derived from abstracts related to user query. In our current implementation, the document visualization platform employs two clustering algorithms namely Suffix Tree Clustering (STC) and LINGO. Clustering quality was improved by mapping top-ranked cluster labels derived from an UMLS-Metathesaurus using a scoring function. To avoid non-clustered documents, an iterative scheme, called auto-clustering was developed and this allowed mapping previously uncategorized documents during the initial grouping process to relevant clusters. The efficacy of this document clustering and visualization platform was evaluated by expert-based validation of clustering results obtained with unique search terms. Compared to normal clustering, auto-clustering demonstrated better efficacy by generating larger numbers of unique and relevant cluster labels. Using this implementation, a Parkinson’s disease systems theory model was developed and studies based on user queries related to neuroscience and oncology have been showcased as applications.

Explaining Relationships Between Scientific Documents

Learning Semantic Correspondences in Technical Documentation

Explaining Datasets in Words: Statistical Models with Natural Language Parameters

A Rule-Based Information Extraction System for Human-Readable Semi-Structured Scientific Documents

Understanding the Logical and Semantic Structure of Large Documents

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

Scientific document processing: challenges for modern learning methods

OpenMSD: Towards Multilingual Scientific Documents Similarity Measurement

Explaining Text Similarity in Transformer Models

Mining Inter-Relationships in Online Scientific Articles and its Visualization: Natural Language Processing for Systems Biology Modeling

Towards an understanding and explanation for mixed-initiative artificial scientific text detection

Eliciting Relations from Natural Language Requirements Documents Based on Linguistic and Statistical Analysis.

Explaining Relation Classification Models with Semantic Extents

Capturing Relations Between Scientific Papers: an Abstractive Model for Related Work Section Generation

Exploring Effective Inter-Encoder Semantic Interaction for Document-Level Relation Extraction

Relationships are Complicated! An Analysis of Relationships Between Datasets on the Web

Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation Extraction

Explainability Techniques for Chemical Language Models

Explainability of Text Processing and Retrieval Methods: A Critical Survey

Star-BiLSTM-LAN for Document-level Mutation-Disease Relation Extraction from Biomedical Literature

Reasoning with Latent Structure Refinement for Document-Level Relation Extraction