Chemoinformatic characterization of NAPROC-13: A database for natural product 13C NMR dereplication

José L. Medina-Franco,Juan F. Avellaneda-Tamayo,Naicolette A. Agudo-Muñoz,Javier E. Sánchez-Galán,José Luis López-Pérez
DOI: https://doi.org/10.26434/chemrxiv-2024-spksf-v2
2024-09-13
Abstract:Natural products (NPs) are secondary metabolites of natural origin with broad applications across various human activities, particularly discovering bioactive compounds. Structural elucidation of new NPs entails significant cost and effort. On the other hand, the dereplication of known compounds is crucial for the early exclusion of irrelevant compounds in contemporary pharmaceutical research. NAPROC-13 stands out as a publicly accessible database, providing structural and 13C NMR spectroscopic information for over 25,000 compounds, rendering it a pivotal resource in natural product (NP) research, favoring open science. This study seeks to quantitatively analyze the chemical content, structural diversity, and chemical space coverage of NPs within NAPROC-13, compared to FDA-approved drugs and a very diverse subset of NPs, UNPD-A. Findings indicated that NPs in NAPROC-13 exhibit comparable properties to those in UNPD-A, albeit showcasing a notably diverse array of structural content, scaffolds, ring systems of pharmaceutical interest, and molecular fragments. NAPROC-13 covers a specific region of the chemical multiverse (a generalization of the chemical space from different chemical representations) regarding physicochemical properties and a region as broad as UNPD-A in terms of structural features represented by fingerprints.
Chemistry
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to conduct a quantitative analysis of the NAPROC-13 database to evaluate its chemical content, structural diversity, and chemical space coverage. NAPROC-13 is a publicly accessible database containing the structures and carbon-13 nuclear magnetic resonance (NMR) spectral information of over 25,000 compounds. The primary objective of the study is to quantify the characteristics of these natural products by comparing NAPROC-13 with FDA-approved drugs and another highly diverse natural product subset (UNPD-A). Specifically, the paper focuses on the following aspects: 1. **Chemical Content and Structural Diversity**: Analyze the chemical properties, structural diversity, and chemical space coverage of natural products in NAPROC-13 using cheminformatics and statistical tools. 2. **Molecular Fingerprints**: Use different designed molecular fingerprints (such as ECFP4, ECFP6, etc.) to represent compounds and assess their similarities. 3. **Scaffolds**: Analyze the scaffolds in NAPROC-13 and compare them with the scaffolds in FDA-approved drugs and UNPD-A. 4. **Ring Systems**: Further understand structural diversity by analyzing ring systems and identifying potentially bioactive ring systems. Through a detailed analysis of the NAPROC-13 database, the paper demonstrates its potential applications in natural product dereplication and virtual screening, revealing its structural diversity and complexity characteristics.