A Novel Machine Learning Based Framework for Developing Composite Digital Biomarkers of Disease Progression

Song Zhai,Andy Liaw,Judong Shen,Yuting Xu,Vladimir Svetnik,James J. FitzGerald,Chrystalina A. Antoniades,Dan Holder,Marissa F. Dockendorf,Jie Ren,Richard Baumgartner
DOI: https://doi.org/10.1101/2024.09.23.24313737
2024-09-24
Abstract:Background: Current methods of measuring disease progression of neurodegenerative disorders, including Parkinson's disease (PD), largely rely on composite clinical rating scales, which are prone to subjective biases and lack the sensitivity to detect progression signals in a timely manner. Digital health technology (DHT)-derived measures offer potential solutions to provide objective, precise, and sensitive measures that address these limitations. However, the complexity of DHT datasets and the potential to derive numerous digital features that were not previously possible to measure pose challenges, including in selection of the most important digital features and construction of composite digital biomarkers. Methods: We present a comprehensive machine learning based framework to construct composite digital biomarkers for progression tracking. This framework consists of a marginal (univariate) digital feature screening, a univariate association test, digital feature selection, and subsequent construction of composite (multivariate) digital disease progression biomarkers using Penalized Generalized Estimating Equations (PGEE). As an illustrative example, we applied this framework to data collected from a PD longitudinal observational study. The data consisted of OpalTM sensor-based movement measurements and MDS-UPDRS Part III scores collected at 3-month intervals for 2 years in 30 PD and 10 healthy control participants. Results: In our illustrative example, 77 out of 235 digital features from the study passed univariate feature screening, with 11 features selected by PGEE to include in construction of the composite digital measure. Compared to MDS-UPDRS Part III, the composite digital measure exhibited a smoother and more significant increasing trend over time in PD groups with less variability, indicating improved ability for tracking disease progression. This digital composite measure also demonstrated the ability to classify between de novo PD and healthy control groups. Conclusion: Measures from DHTs show promise in tracking neurodegenerative disease progression with increased sensitivity and reduced variability as compared to traditional clinical scores. Herein, we present a novel framework and methodology to construct composite digital measure of disease progression from high-dimensional DHT datasets, which may have utility in accelerating the development and application of composite digital biomarkers in drug development.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that the current methods for evaluating the progression of neurodegenerative diseases (such as Parkinson's disease, PD) are subjectively biased and lack sufficient sensitivity. Traditional methods mainly rely on composite clinical rating scales, such as the Movement Disorder Society - Unified Parkinson's Disease Rating Scale (MDS - UPDRS). These methods are not timely enough in detecting disease progression signals and are easily influenced by subjective biases. Measurement methods derived from digital health technology (DHT) provide more objective, accurate and sensitive solutions, but the complexity of DHT datasets and the ability to extract a large number of digital features from these data pose new challenges, especially in selecting the most important digital features and constructing composite digital biomarkers. To address these challenges, the paper proposes a machine - learning - based framework for constructing composite digital biomarkers for tracking disease progression from high - dimensional DHT datasets. This framework includes the following steps: 1. **Univariate digital feature screening**: Each digital feature is individually tested by a linear mixed - effects model (LMM) to identify which features can detect disease progression during the study period. 2. **Univariate association test**: Further examine the associations between these candidate digital features and standard clinical measurements (such as MDS - UPDRS Part III). 3. **Multivariate analysis**: Use the penalized generalized estimating equations (PGEE) method to select and combine a set of digital features from the features that pass the univariate screening to construct a composite digital biomarker. 4. **Performance evaluation**: Determine the optimal number of digital features included in the final multivariate prediction model through a cross - validation strategy and evaluate the performance of the model. By applying this framework to the data of a longitudinal observational study of Parkinson's disease, the paper demonstrates its potential in improving the ability to track disease progression. The results show that, compared with the traditional MDS - UPDRS Part III, the constructed composite digital measurement method shows a smoother, more significant trend in tracking the progression of Parkinson's disease, with less variability, and can also effectively distinguish newly diagnosed Parkinson's disease patients from healthy control groups. This indicates that the composite digital biomarkers constructed using DHT data have higher sensitivity and lower variability in tracking the progression of neurodegenerative diseases.