Abstract:Traditional Chinese medicine (TCM) has long been viewed as precious sources of modern drug discovery. AI-assisted drug discovery (AIDD) has been investigated extensively. However, there are still two challenges in applying AIDD to guide TCM drug discovery: the lack of a large amount of standardized TCM-related information and AIDD is prone to pathological failures in out-of-domain data. We have released TCM Database@Taiwan in 2011, and it has been widely disseminated and used. Now, we developed TCMBank, the largest systematic free TCM database, which is an extension of TCM Database@Taiwan. TCMBank contains 9192 herbs, 61,966 ingredients (unduplicated), 15,179 targets, 32,529 diseases, and their pairwise relationships. By integrating multiple data sources, TCMBank provides 3D structure information of ingredients, and provides standard list and detailed information of herbs, ingredients, targets and diseases. TCMBank has an intelligent document identification module that continuously adds TCM-related information retrieved from literature in PubChem. In addition, driven by TCMBank big data, we developed an ensemble learning-based drug discovery protocol for identifying potential lead and drug repurposing. We take colorectal cancer and Alzheimer's disease as examples to demonstrate how to accelerate drug discovery by artificial intelligence. Using TCMBank, researchers can view literature-driven relationship mapping between herbs/ingredients and genes/diseases, allowing understanding of molecular action mechanisms for ingredients and identification of new potentially effective treatments. TCMBank is available at https://TCMBank.CN/.

What problem does this paper attempt to address?

There are two main problems that this paper attempts to solve: 1. **Lack of standardized Traditional Chinese Medicine (TCM) - related information**: One of the important challenges in traditional Chinese medicine research and modern drug development is the lack of a large amount of standardized TCM information. For example, information about active ingredients in herbs, the associations between ingredients and target proteins, etc. This information is scattered in various books and journals and is difficult to comprehensively collect, resulting in it being difficult for researchers to obtain complete data on ingredients and their mechanisms of action. 2. **Pathological failures of Artificial Intelligence - Assisted Drug Discovery (AIDD) on out - of - domain data**: Existing AIDD methods are prone to systematic errors when dealing with out - of - domain data, and most methods lack wet - experiment verification. A single model may be too sensitive or dependent on certain data points, resulting in insufficient generalization ability on new data. ### Solutions To solve the above problems, the research team has developed **TCMBank**, which is a free and systematic TCM database aiming to provide standardized TCM information, including herbs, ingredients, targets, diseases and their inter - relationships. Specifically: - **Features of TCMBank**: - It contains 9,192 kinds of herbs, 61,966 non - repetitive ingredients, 15,179 targets, 32,529 diseases and their pairwise relationships. - It provides 3D structure information of ingredients, facilitating virtual screening and molecular simulation. - The Intelligent Document Identification Module (IDIM) regularly downloads the latest literature from PubChem and extracts TCM - related information through techniques such as Natural Language Processing (NLP) and Optical Character Recognition (OCR) to ensure the continuous update of the database. - **Drug discovery framework based on ensemble learning**: - Use an Ensemble Learning (EL) framework to improve the efficiency of virtual screening, and identify potential effective lead compounds and drug re - use by finding consensus among prediction methods. - Specific steps include: molecular docking, ligand - based EL model, Hybrid Neural Network (HNN) - based EL model to predict Drug - Target Affinity (DTA), and evaluating the kinetic properties and interactions of protein - ligand complexes through Molecular Dynamics (MD) simulations. Through these measures, TCMBank not only provides rich standardized TCM data, but also accelerates the drug discovery process through AI technology, thus promoting the modernization of traditional Chinese medicine.

TCMBank: Bridges Between the Largest Herbal Medicines, Chemical Ingredients, Target Proteins, and Associated Diseases with Intelligence Text Mining

Database of traditional Chinese medicine and its application to studies of mechanism and to prescription validation.

A method for finding groups of related herbs in traditional chinese medicine

TCM Database@Taiwan: the World's Largest Traditional Chinese Medicine Database for Drug Screening in Silico.

TCMSID: a simplified integrated database for drug discovery from traditional chinese medicine

TCMM: A Unified Database for Traditional Chinese Medicine Modernization and Therapeutic Innovations

LTM-TCM: A comprehensive database for the linking of Traditional Chinese Medicine with modern medicine at molecular and phenotypic levels

ccTCM: A quantitative component and compound platform for promoting the research of traditional Chinese medicine

HERB: a high-throughput experiment- and reference-guided database of traditional Chinese medicine

AI Empowering Traditional Chinese Medicine?

Modern Bioinformatics Meets Traditional Chinese Medicine.

TCMSP: a database of systems pharmacology for drug discovery from herbal medicines

A critical assessment of Traditional Chinese Medicine databases as a source for drug discovery

TCMBank-the Largest TCM Database Provides Deep Learning-Based Chinese-Western Medicine Exclusion Prediction

TCM-Mesh: The database and analytical system for network pharmacology analysis for TCM preparations

A focus on harnessing big data and artificial intelligence: revolutionizing drug discovery from traditional Chinese medicine sources

TCMAnalyzer: A Chemo- and Bioinformatics Web Service for Analyzing Traditional Chinese Medicine

Exploring pharmacological active ingredients of traditional Chinese medicine by pharmacotranscriptomic map in ITCM

TCM‐Suite: A comprehensive and holistic platform for Traditional Chinese Medicine component identification and network pharmacology analysis

MicrobeTCM: A comprehensive platform for the interactions of microbiota and traditional Chinese medicine