The path toward generalizable clinical prediction models

Fredrik Hieronymus,Magnus Hieronymus,Axel Sjöstedt,Staffan Nilsson,Jakob Näslund,Alexander Lisinski,Søren Dinesen Østergaard
DOI: https://doi.org/10.1101/2024.04.16.24305902
2024-04-19
Abstract:The peaking phenomenon refers to the observation that, after a point, the performance of prediction models starts to decrease as the number of predictors (p) increases. This issue is commonly encountered in small datasets (colloquially known as “small n, large p” datasets or high-dimensional data). It was recently reported based on analysis of data from five placebo-controlled trials that clinical prediction models in schizophrenia showed poor performance (average balanced accuracy, BAC, 0.54). This was interpreted to suggest that prediction models in schizophrenia have poor generalizability. In this paper we demonstrate that this outcome more likely reflects the peaking phenomenon in a small n, large p dataset (n=1513 participants, p=217) and generalize this to a set of illustrative cases using simulated data. We then demonstrate that an ensemble of supervised learning models trained using more data (18 placebo-controlled trials, n=4634 participants), but fewer predictors (p=33), achieves better prediction (average BAC = 0.64) which generalizes to out-of-sample studies as well as to data from active-controlled trials (n=1463, average BAC = 0.67). Based on these findings, we argue that the achievable prediction accuracy for treatment response in schizophrenia— and likely for many other medical conditions—is highly dependent on sample size and the number of included predictors, and, hence, remains unknown until more data has been analyzed. Finally, we provide recommendations for how researchers and data holders might work to improve future data analysis efforts in clinical prediction.
Psychiatry and Clinical Psychology
What problem does this paper attempt to address?