Abstract:Wider acceptance of QSARs would result in a constellation of benefits and savings to both private and public sectors. For this to occur, particularly in regulatory applications, a model's limitations need to be identified. We define a model's limitations as encompassing assessment of overall prediction accuracy, applicability domain and chance correlation. A general guideline is presented in this review for assessing a model's limitations with emphasis on and examples of application with consensus modeling methods. More specifically, we discuss the commonalities and differences between external validation and cross-validation for assessing a model's limitations. We illustrate two common ways of assessing overall prediction accuracy, depending on whether or not the intended application domain is predefined. Since even a high quality model will have different confidence in accuracy for predicting different chemicals, we further demonstrate using the novel Decision Forest consensus modeling method a means to determine prediction confidence (i.e., certainty for an individual chemical's prediction) and domain extrapolation (i.e., the prediction accuracy for a chemical that is outside the chemistry space defined by the training chemicals). We show that prediction confidence and domain extrapolation are related measures that together determine the applicability domain of a model, and that prediction confidence is the more important measure. Lastly, the importance of assessing chance correlation is emphasized, and illustrated with several examples of models having a high degree of chance correlations despite cross-validation indicating high prediction accuracy. Generally, a dataset with a skewed distribution, small data size and/or low signal/noise ratio tends to produce a model with high chance correlation.We conclude that it is imperative to assess all three aspects (i.e., overall accuracy, applicability domain and chance correlation) of a model for the regulatory acceptance of QSARs.

Assessing Model Fit by Cross-Validation

Prediction reliability of QSAR models: an overview of various validation tools

Efficient, adaptive cross-validation for tuning and comparing models, with application to drug discovery

The relationship between model predictive ability and validated method in QSAR/QSPR study

The difference of model robustness assessment using cross‐validation and bootstrap methods

Is Cross-Validation the Gold Standard to Evaluate Model Performance?

Cross-validation of component models: A critical look at current methods

Assessing Qsar Limitations - A Regulatory Perspective

Why QSAR fails: an empirical evaluation using conventional computational approach.

Clinical Prediction Models: Model Validation

Bootstrap Cross-validation Improves Model Selection in Pharmacometrics

Cross validation for model selection: a review with examples from ecology

Overview of model validation for survival regression model with competing risks using melanoma study data

Model Validation Via Uncertainty Propagation and Data Transformations

Model Selection Via Multifold Cross Validation

Cross-Validation for Nonlinear Mixed Effects Models

A survey of cross-validation procedures for model selection

Is K-fold cross validation the best model selection method for Machine Learning?

Approximate Cross-validation: Guarantees for Model Assessment and Selection

A framework for the cross‐validation of categorical geostatistical simulations

Empirical Comparison between Cross-Validation and Mutation-Validation in Model Selection