ADME Properties Evaluation in Drug Discovery: Prediction of Plasma Protein Binding Using NSGA-II Combining PLS and Consensus Modeling
Ning-Ning Wang,Zhen-Ke Deng,Chen Huang,Jie Dong,Min-Feng Zhu,Zhi-Jiang Yao,Alex F. Chen,Ai-Ping Lu,Qi Mi,Dong-Sheng Cao
DOI: https://doi.org/10.1016/j.chemolab.2017.09.005
IF: 4.175
2017-01-01
Chemometrics and Intelligent Laboratory Systems
Abstract:Plasma protein binding affinity of a drug compound has a strong influence on its pharmacodynamic behavior because it can affect the drug uptake and distribution. In this study, we collected a sizeable dataset consisting of 1830 drug compounds from several accessible sources. A descriptor pool composed of four different types of descriptors (2-D, 3-D, Estate and MACCS) was firstly built and non-dominated sorting genetic algorithm (NSGA-II) combining partial least square (PIS) regression was applied to select important descriptors for model building. Finally, we obtained a consensus model (for five-fold cross-validation: Q(2) = 0.750; RMSE = 16.151) based on five different predictive models built using random forest (RF), support vector machine (SVM), Cubist, Gaussian process (GP), and Boosting. Further, a test set and two external validation datasets were applied to validate its robustness and practicality. For the test set, R-T(2) = 0.787 and RMSET = 14.154; when two external datasets were applied, R-Ex(2) = 0.704 and 0.703, RMSEEX = 18.194 and 17.233 respectively. Additionally, according to OECD principles, y-randomization, Williams plot and scaffold analyses were proposed to validate the reliability and practical application domain of our predictive model. Overall, our consensus model shows a good prediction performance and generalization ability in predicting plasma protein binding (PPB). After analyzing those important descriptors selected by NSGA-II and RF, we concluded that the PPB of a drug compound is mainly related to its lipophilicity, aromatic rings, and partial charge properties. In summary, this study developed a robust and practical consensus model for PPB prediction and it could be used to the distribution evaluation and risk assessment in the early stage of drug development.