Abstract:Clinical prediction models estimate an individual's risk of a particular health outcome. A developed model is a consequence of the development dataset and model‐building strategy, including the sample size, number of predictors, and analysis method (e.g., regression or machine learning). We raise the concern that many models are developed using small datasets that lead to instability in the model and its predictions (estimated risks). We define four levels of model stability in estimated risks moving from the overall mean to the individual level. Through simulation and case studies of statistical and machine learning approaches, we show instability in a model's estimated risks is often considerable, and ultimately manifests itself as miscalibration of predictions in new data. Therefore, we recommend researchers always examine instability at the model development stage and propose instability plots and measures to do so. This entails repeating the model‐building steps (those used to develop the original prediction model) in each of multiple (e.g., 1000) bootstrap samples, to produce multiple bootstrap models, and deriving (i) a prediction instability plot of bootstrap model versus original model predictions; (ii) the mean absolute prediction error (mean absolute difference between individuals' original and bootstrap model predictions), and (iii) calibration, classification, and decision curve instability plots of bootstrap models applied in the original sample. A case study illustrates how these instability assessments help reassure (or not) whether model predictions are likely to be reliable (or not), while informing a model's critical appraisal (risk of bias rating), fairness, and further validation requirements.

Model stability: a key factor in determining whether an algorithm produces an optimal model from a matching distribution

Minimax Optimal Estimation of Stability Under Distribution Shift

Stability of clinical prediction models developed using statistical or machine learning methods

Matching Model Versus Single Model: A Study Of The Requirement To Match Class Distribution Using Decision Trees

Stability Evaluation Through Distributional Perturbation Analysis

Stability for the training of deep neural networks and other classifiers

A Study on the Effiect of Class Distribution Using Cost-Sensitive Learning

Stability Evaluation via Distributional Perturbation Analysis

Stability of decision trees and logistic regression

Stacking and stability

LOCAL MODELS—THE KEY TO BOOSTING STABLE LEARNERS SUCCESSFULLY

Matchings under Preferences: Strength of Stability and Trade-offs

Instability in Stable Marriage Problem: Matching Unequally Numbered Men and Women

Improving Stability in Decision Tree Models

Computational complexity of $k$-stable matchings

Stability in Large Markets

Algorithmic stability implies training-conditional coverage for distribution-free prediction methods

The Bayesian Stability Zoo

Stable Matching with Uncertain Linear Preferences

Towards Stable Machine Learning Model Retraining via Slowly Varying Sequences

A Review of Stability in Topic Modeling: Metrics for Assessing and Techniques for Improving Stability