Abstract:Introduction: The pharmacological literature and patents connect compound structures to their bioactivity. However, entombing these relationships among millions of PDFs is seriously problematic. The situation is ameliorated by resources that extract data relationships the authors put in to their PDFs back out into structured database records. The IUPHAR/BPS Guide to PHARMACOLOGY (GtoPdb) has been doing this by stringent curation of ligands and their quantitative activity against protein targets [1]. Our citations are submitted to PubChem (PC), who then link to PubMed (PM). This study presents an overview of this connectivity. Methods: For GtoPdb entries in PC Substance we used the PC interface to count our submitted PM links. This gives the PC to PM mapping counts from which we analysed the PM links. We then performed reciprocal analyses (i.e. PM to PC) by selecting PM sets. We then compared two journals by counting structure links by year and source. Results: From 8988 GtoPdb-submitted substances in PC (release 2017.5), 7309 are linked to 8980 PM entries and 5632 links to chemical structures in PC the rest being antibodies and larger peptides. From the 8980 PMIDs, the Journal of Medicinal Chemistry (JMC) accounted for 1003 as our most frequently cited primary source of structure-to-activity mappings. For the British Journal of Pharmacology (BJP) most of the 345 cross-references were development compounds. Further analysis showed that from 2014 to 2017 the BJP to PC links of aprroximately 30 structures per year are mostly from GtoPdb and the Comparative Toxicology Database. However, going back to 2010-12, this increased to 500-800 connections, mainly derived from the IBM automated chemical extraction from abstracts. A similar pattern was observed for JMC. Conclusion: Navigation between documents and databases is an essential competence for pharmacologists and drug discovery but the NCBI Entrez system is daunting. GtoPdb is a major contributor of high-quality links and provides a first-stop to guide users into the PC and PM systems. However, our results indicated potentially serious specificity issues with automated chemistry-to-journal linking from non-GtoPdb sources. References: [1] Harding et al. (2018). Nucl. Acids Res. 45 (Database Issue)

Cross‐Mapping of Protein – Ligand Binding Data Between ChEMBL and PDBbind

ChEMBL: a large-scale bioactivity database for drug discovery

PDB-wide Collection of Binding Data: Current Status of the PDBbind Database

The ChEMBL bioactivity database: an update

Adme-Ap: A Database of Adme Associated Proteins

The Pdbbind Database: Collection of Binding Affinities for Protein-Ligand Complexes with Known Three-Dimensional Structures

The ChEMBL database in 2017

The PDBbind Database: Methodologies and Updates

PDBBind Optimization to Create a High-Quality Protein-Ligand Binding Dataset for Binding Affinity Prediction

The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods

A High-Quality Data Set of Protein-Ligand Binding Interactions Via Comparative Complex Structure Modeling

CLiBE: a Database of Computed Ligand Binding Energy for Ligand-Receptor Complexes.

Leak Proof PDBBind: A Reorganized Dataset of Protein-Ligand Complexes for More Generalizable Binding Affinity Prediction

BigBind: Learning from Nonstructural Data for Structure-Based Virtual Screening

A compound-target pairs dataset: differences between drugs, clinical candidates and other bioactive compounds

The big data join in pharmacology: linking structures, databases and documents

Ligand-protein database: linking protein-ligand complex structures to binding data.

BindingDB in 2024: a FAIR Knowledgebase of Protein-Small Molecule Binding Data

Annotation of biologically relevant ligands in UniProtKB using ChEBI

Orthologue chemical space and its influence on target prediction