Cross‐Mapping of Protein – Ligand Binding Data Between ChEMBL and PDBbind

Zhihai Liu,Jie Li,Jie Liu,Yuchen Liu,Wei Nie,Li Han,Yan Li,Renxiao Wang
DOI: https://doi.org/10.1002/minf.201500010
IF: 4.05
2015-01-01
Molecular Informatics
Abstract:The ChEMBL database is a valuable open data source, which provides a comprehensive collection of binding data, functional and ADMET properties of bioactive compounds. The PDBbind database has a more focused scope, i.e. collecting binding data for the protein-ligand complexes in the Protein Data Bank. Currently, the PDBbind collection of binding data is rather modest as compared to the ChEMBL collection (approximate to 13000 versus approximate to 1.3 million). One may suspect if the former is actually a subset of the latter. In this study, we mapped the molecular information and protein-ligand binding data in PDBbind to the records in ChEMBL, and then analyzed the overlap between the binding data recorded in these two databases. Our results indicate that only approximate to 20% of the binding data in PDBbind can find their counterparts in ChEMBL. Thus, the PDBbind collection of binding data is largely complementary to the ChEMBL collection. We also reveal two reasons accounting for the low overlap between two databases: First, only a minor fraction of the protein-ligand complexes in PDBbind is covered by ChEMBL; Second, the literature spaces screened by these two databases do not have a substantial overlap either. The value of focused databases versus more comprehensive ones is demonstrated by our study.
What problem does this paper attempt to address?