In Silico ADME Modeling 3: Computational Models to Predict Human Intestinal Absorption Using Sphere Exclusion and kNN QSAR Methods

Sitarama B. Gunturi,Ramamurthi Narayanan,Sitarama B. Gunturi
DOI: https://doi.org/10.1002/qsar.200630094
2007-05-01
Abstract:Modeling of human intestinal absorption (HIA) data of 175 diverse drugs and 336 calculated descriptors is performed to develop global predictive models that are applicable to the whole medicinal chemistry space. With this aim, we employed two automated procedures, (a) Sphere Exclusion Algorithm (SEA) to select members of the training and test sets based on structural dissimilarity and (b) k‐Nearest Neighbors (kNN) method along with Genetic Algorithms (kNN‐QSAR‐GA) to select significant and independent descriptors. This methodology helped us to derive optimal Quantitative Structure–Property Relationship (QSPR) models based on three and four descriptors. The best three descriptor model is based on Delta Chi Index of order 3 (Cluster), Hydrogen type E‐State index ShsOH, AlogP99 ($\rm{ q_{{\rm{LOO}}}^2 }$=0.7401 and $\rm{ q_{{\rm{ext}}}^2 }$=0.7989); the best four variable model is based on auto‐correlation descriptor (Moran) weighted by atomic weights – order 7, AI‐State_Indices_AISssssC, number of hydrogen bond acceptors, AlogP99 ($\rm{ q_{{\rm{LOO}}}^2 }$=0.8196 and $\rm{ q_{{\rm{ext}}}^2 }$=0.6999). Based on extensive validation tests of the models M1–M4, comparison of their overall performance and $\rm{ q_{{\rm{ext}}}^2 }$ statistics with reported models using other approaches, it is shown that: (a) the models have high stability and are robust and (b) for the first time in HIA modeling, the combination of an automated training set selection (SEA) followed by variable selection (kNN‐QSAR_GA) is shown to be a promising methodology to build multiple stable models that are useful in consensus prediction. From the analysis of the physical meaning of the selected descriptors, it is inferred that the HIA of small organic compounds can be accurately predicted using calculated descriptors that code for the following fundamental properties: (1) lipophilicity, (2) hydrogen bonding capacity, (3) size, and (4) shape and further, the role of new calculated descriptors on the HIA profile of small organic compounds is uncovered. Finally, as the models reported herein are based on computed properties, they appear to be a valuable tool in virtual screening, where selection and prioritization of candidates is required.
What problem does this paper attempt to address?