Abstract:Quantitative structure-activity relationship (QSAR) methods have been widely applied in drug discovery, lead optimization, toxicity prediction, and regulatory decisions. Despite major advances in algorithms and software, QSAR models have inherent limitations associated with a size and chemical-structure diversity of the training set, experimental error, and many characteristics of structure representation and correlation algorithms. Whereas excellent fit to the training data may be readily attainable, often models fail to predict accurately chemicals that are outside their domain of applicability. A QSAR's utility and, in the case of regulatory decisions, justification for usage increasingly depend on the ability to quantify a model's potential for predicting unknown chemicals with some known degree of certainty. It is never possible to predict an unknown chemical with absolute certainty. Here we report on two QSAR models based on different data sets for classification of chemicals according to their ability to bind to the estrogen receptor. The models were developed by using a novel QSAR method, Decision Forest, which combines the results of multiple heterogeneous but comparable Decision Tree models to produce a consensus prediction. We used an extensive cross-validation process to define an applicability domain for model predictions based on two quantitative measures: prediction confidence and domain extrapolation. Together, these measures quantify the accuracy of each prediction within and outside of the training domain. Despite being based on large and diverse training sets, both QSAR models had poor accuracy for chemicals within the domain of low confidence, whereas good accuracy was obtained for those within the domain of high confidence. For prediction in the high confidence domain, accuracy was inversely proportional to the degree of domain extrapolation. The model with a larger training set of 1,092, compared with 232 for the other, was more accurate in predicting chemicals at larger domain extrapolation, and could be particularly useful for rapidly prioritizing potential endocrine disruptors from large chemical universe.

Why QSAR fails: an empirical evaluation using conventional computational approach.

Assessing Qsar Limitations - A Regulatory Perspective

Quantitative Structure–activity Relationship: Promising Advances in Drug Discovery Platforms

Do we need different machine learning algorithms for QSAR modeling? A comprehensive assessment of 16 machine learning algorithms on 14 QSAR data sets

Prediction reliability of QSAR models: an overview of various validation tools

Rethinking the applicability domain analysis in QSAR models

Using Support Vector Regression Coupled with the Genetic Algorithm for Predicting Acute Toxicity to the Fathead Minnow

Consensus ranking approach to understanding the underlying mechanism with QSAR.

Prediction of the Aquatic Toxicity of Aromatic Compounds to Tetrahymena Pyriformis Through Support Vector Regression

A hybrid framework for improving uncertainty quantification in deep learning-based QSAR regression modeling

How Precise Are Our Quantitative Structure-Activity Relationship Derived Predictions for New Query Chemicals?

The relationship between model predictive ability and validated method in QSAR/QSPR study

Nano(Q)SAR: Challenges, Pitfalls and Perspectives

Reliably assessing prediction reliability for high dimensional QSAR data

QSAR Modeling: Where Have You Been? Where Are You Going To?

Enhanced QSAR model performance by integrating structural and gene expression information.

Development and Evaluation of Conformal Prediction Methods for QSAR

Assessment of Prediction Confidence and Domain Extrapolation of Two Structure-Activity Relationship Models for Predicting Estrogen Receptor Binding Activity

Comprehensive ensemble in QSAR prediction for drug discovery

Predictive QSAR Models for Polyspecific Drug Targets: the Importance of Feature Selection

One size does not fit all: revising traditional paradigms for QSAR-based virtual screenings.