Changes in prediction modelling in biomedicine– do systematic reviews indicate whether there is any trend towards larger data sets and machine learning methods?

Lara Lusa,Franziska Kappenberg,Gary S. Collins,Matthias Schmid,Willi Sauerbrei,Joerg Rahnenfuehrer
DOI: https://doi.org/10.1101/2024.08.09.24311759
2024-08-10
Abstract:The number of prediction models proposed in the biomedical literature has been growing year on year. In the last few years there has been an increasing attention to the changes occurring in the prediction modeling landscape. It is suggested that machine learning techniques are becoming more popular to develop prediction models to exploit complex data structures, higher-dimensional predictor spaces, very large number of participants, heterogeneous subgroups, with the ability to capture higher-order interactions.  We examine these changes in modelling practices by investigating a selection of systematic reviews on prediction models published in the biomedical literature. We selected systematic reviews published since 2020 which included at least 50 prediction models. Information was extracted guided by the CHARMS checklist. Time trends were explored using the models published since 2005. We identified 8 reviews, which included 1448 prediction models published in 887 papers. The average number of study participants and outcome events increased considerably between 2015 and 2019, but remained stable afterwards. The number of candidate and final predictors did not noticeably increase over the study period, with a few recent studies using very large numbers of predictors. Internal validation and reporting of discrimination measures became more common, but assessing calibration and carrying out external validation were less common. Information about missing values was not reported in about half of the papers, however the use of imputation methods increased. There was no sign of an increase in using of machine learning methods. Overall, most of the findings were heterogeneous across reviews. Our findings indicate that changes in the prediction modeling landscape in biomedicine are less dramatic than expected and that poor reporting is still common; adherence to well established best practice recommendations from the traditional biostatistics literature is still needed. For machine learning best practice recommendations are still missing, whereas such recommendations are available in the traditional biostatistics literature, but adherence is still inadequate.
Health Informatics
What problem does this paper attempt to address?
The paper investigates changes in the prediction modeling landscape within biomedicine, specifically looking at trends towards larger datasets and the adoption of machine learning methods. The authors aimed to understand if there has been a significant shift in the use of machine learning techniques for developing prediction models in biomedical research. To achieve this, they conducted a review of systematic reviews that covered prediction models published in the biomedical literature since 2020. They selected systematic reviews that included at least 50 prediction models. After applying inclusion and exclusion criteria, eight systematic reviews were chosen, encompassing 1,448 prediction models published in 887 papers. Key findings include: - The average number of study participants and outcome events increased significantly between 2015 and 2019 but remained stable thereafter. - The number of candidate and final predictors did not show a noticeable increase over the study period, although a few recent studies used a very large number of predictors. - Internal validation and reporting of discrimination measures became more common, but assessing calibration and performing external validation were less frequent. - Reporting of missing value handling improved, with an increase in the use of imputation methods. - There was no evidence of an increase in the use of machine learning methods. - Adherence to best practice recommendations from traditional biostatistics literature w