Abstract:Quantitative structure-activity relationship (QSAR) methods have been widely applied in drug discovery, lead optimization, toxicity prediction, and regulatory decisions. Despite major advances in algorithms and software, QSAR models have inherent limitations associated with a size and chemical-structure diversity of the training set, experimental error, and many characteristics of structure representation and correlation algorithms. Whereas excellent fit to the training data may be readily attainable, often models fail to predict accurately chemicals that are outside their domain of applicability. A QSAR's utility and, in the case of regulatory decisions, justification for usage increasingly depend on the ability to quantify a model's potential for predicting unknown chemicals with some known degree of certainty. It is never possible to predict an unknown chemical with absolute certainty. Here we report on two QSAR models based on different data sets for classification of chemicals according to their ability to bind to the estrogen receptor. The models were developed by using a novel QSAR method, Decision Forest, which combines the results of multiple heterogeneous but comparable Decision Tree models to produce a consensus prediction. We used an extensive cross-validation process to define an applicability domain for model predictions based on two quantitative measures: prediction confidence and domain extrapolation. Together, these measures quantify the accuracy of each prediction within and outside of the training domain. Despite being based on large and diverse training sets, both QSAR models had poor accuracy for chemicals within the domain of low confidence, whereas good accuracy was obtained for those within the domain of high confidence. For prediction in the high confidence domain, accuracy was inversely proportional to the degree of domain extrapolation. The model with a larger training set of 1,092, compared with 232 for the other, was more accurate in predicting chemicals at larger domain extrapolation, and could be particularly useful for rapidly prioritizing potential endocrine disruptors from large chemical universe.

Application of Breiman’s Random Forest to Modeling Structure-Activity Relationships of Pharmaceutical Molecules

Quantitative Structure–activity Relationship: Promising Advances in Drug Discovery Platforms

Structure‐activity Relationship Approaches and Applications

Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling

Comparison of the Predictive Performance and Interpretability of Random Forest and Linear Models on Benchmark Data Sets

Comprehensive strategies of machine-learning-based quantitative structure-activity relationship models

To Model The Quantitative Structure-Activity Relationship Of Drug With Adapting Partial Least Square Regression

Boosting: An Ensemble Learning Tool for Compound Classification and QSAR Modeling

Comprehensive ensemble in QSAR prediction for drug discovery

Light Gradient Boosting Machine as a Regression Method for Quantitative Structure-Activity Relationships

Prediction of the Aquatic Toxicity of Aromatic Compounds to Tetrahymena Pyriformis Through Support Vector Regression

Assessment of Prediction Confidence and Domain Extrapolation of Two Structure-Activity Relationship Models for Predicting Estrogen Receptor Binding Activity

Random Forest Algorithm for Enhanced Prediction of Drug Target Interactions

In Silico Prediction of Androgenic and Nonandrogenic Compounds Using Random Forest

From Decision Tree to Heterogeneous Decision Forest: A novel chemometrics approach for structure-activity relationship modeling

On the Virtues of Automated QSAR The New Kid on the Block

Biomacromolecular Quantitative Structure–activity Relationship (bioqsar): a Proof-of-concept Study on the Modeling, Prediction and Interpretation of Protein–protein Binding Affinity

Descriptor Selection Via Log-Sum Regularization for the Biological Activities of Chemical Structure

How Precise Are Our Quantitative Structure-Activity Relationship Derived Predictions for New Query Chemicals?

Regression Methods For Developing Qsar And Qspr Models To Predict Compounds Of Specific Pharmacodynamic, Pharmacokinetic And Toxicological Properties

Evaluating the Performances of Quantitative Structure-Retention Relationship Models with Different Sets of Molecular Descriptors and Databases for High-Performance Liquid Chromatography Predictions