Abstract:The rapid growth of biomedical literature presents a significant challenge for researchers to extract and analyze relevant information efficiently. In this study, we explore the application of GPT, the large language model to automate the extraction and visualization of metabolic networks from a corpus of PubMed abstracts. Our objective is to provide a valuable tool for biomedical researchers to explore and understand the intricate metabolic interactions discussed in scientific literature. We begin by splitting a ton of the tokens within the corpus, as the GPT-3.5-Turbo model has a token limit of 4,000 per analysis. Through iterative prompt optimization, we successfully extract a comprehensive list of metabolites, enzymes, and proteins from the abstracts. To validate the accuracy and completeness of the extracted entities, our biomedical data domain experts compare them with the provided abstracts and ensure a fully matched result. Using the extracted entities, we generate a directed graph that represents the metabolic network including 3 types of metabolic events that consist of metabolic consumption, metabolic reaction, and metabolic production. The graph visualization, achieved through Python and NetworkX, offers a clear representation of metabolic pathways, highlighting the relationships between metabolites, enzymes, and proteins. Our approach integrates language models and network analysis, demonstrating the power of combining automated information extraction with sophisticated visualization techniques. The research contributions are twofold. Firstly, we showcase the ability of GPT-3.5-Turbo to automatically extract metabolic entities, streamlining the process of cataloging important components in metabolic research. Secondly, we present the generation and visualization of a directed graph that provides a comprehensive overview of metabolic interactions. This graph serves as a valuable tool for further analysis, comparison with existing pathways, and updating or refining metabolic networks. Our findings underscore the potential of large language models and network analysis techniques in extracting and visualizing metabolic information from scientific literature. This approach enables researchers to gain insights into complex biological systems, advancing our understanding of metabolic pathways and their components.

Advancing Plant Metabolic Research By Using Large Language Models To Expand Databases And Extract Labelled Data

Within- and Cross-Species Predictions of Plant Specialized Metabolism Genes Using Transfer Learning.

Leveraging large language models for metabolic engineering design

Mapping of specialized metabolite terms onto a plant phylogeny using text mining and large language models

Large Language Models in Plant Biology

Language model-guided anticipation and discovery of unknown metabolites

Automated Extraction and Visualization of Metabolic Networks from Biomedical Literature Using a Large Language Model

Robust Predictions of Specialized Metabolism Genes Through Machine Learning

Large Language Models for Biomolecular Analysis: from Methods to Applications

Advancing plant biology through deep learning-powered natural language processing

Utilizing Large Language Models for Natural Interface to Pharmacology Databases

Phylochemical mapping of natural products onto the plant tree of life using text mining and large language models

Transformers and Large Language Models for Chemistry and Drug Discovery

Large Language Models, scientific knowledge and factuality: A framework to streamline human expert evaluation

Leveraging large language models for predictive chemistry

An Evaluation of Large Language Models in Bioinformatics Research

Large language models help facilitate the automated synthesis of information on potential pest controllers

Comparative Performance Evaluation of Large Language Models for Extracting Molecular Interactions and Pathway Knowledge

Large language models reshaping molecular biology and drug development

Extracting Structured Data from Organic Synthesis Procedures Using a Fine-Tuned Large Language Model

AI for Biomedicine in the Era of Large Language Models