Predicting dimensions of depression from smartphone data

Vincent L. Holstein,Samir Akre,Ramona Leenings,Yoonho Chung,Tim Hahn,Justin T. Baker
DOI: https://doi.org/10.1101/2024.01.08.23300679
2024-01-09
Abstract:Depressive disorders are highly prevalent but demand nuanced personalized treatment that traditional approaches in psychiatry cannot address. This gap has prompted a surge of interest in leveraging digital technology, particularly smartphones, for remote monitoring to enhance outpatient care. This study utilizes the BRIGHTEN dataset to construct interpretable prediction models for overall depression severity, measured by PHQ-9, and various depression dimensions using a factor modelling approach. Our factor model unveils a three-factor solution encompassing mood, somatic, and concentration/psychomotor-related factors. Machine learning models effectively predict both the PHQ-9 scores and individual factors, with feature importance methods analyses underscoring the influence of the PHQ-2 scale and communication-related features. These findings are corroborated by models trained on data subsets. Through nested multi-level models, we identify between-subject effects for the PHQ-2 and select communication-related features, along with within-subject effects for these features. In summary, this study underscores the robust predictive capacity of ecological momentary assessments and highlights features of potential relevance for future investigations, such as communication-related features. We advocate for future studies to assess the cost-effectiveness and intervention potential of these models.
Psychiatry and Clinical Psychology
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to predict different dimensions of depression using smartphone data to improve remote monitoring and outpatient care for depression patients. Specifically, the research objectives include: 1. **Constructing Interpretable Predictive Models**: Using the BRIGHTEN dataset to build models that can predict overall depression severity (measured by the PHQ-9 scale) and different dimensions of depression. 2. **Exploring Factor Structure**: Revealing specific domains of depressive symptoms (such as emotional, somatic symptoms, attention/psychomotor-related symptoms) through factor analysis methods. 3. **Evaluating Feature Importance**: Analyzing the importance of features through various methods to guide feature selection in future research. 4. **Multilevel Model Analysis**: Identifying between-individual and within-individual effects through nested multilevel models, assessing the utility of different features in modeling. ### Background and Motivation Depression is a common mental illness affecting millions of people worldwide and is expected to become the leading source of global disease burden by 2030. Traditional psychiatric methods struggle to meet the needs for personalized treatment, thus increasing the demand for remote monitoring using digital technologies, especially smartphones. These technologies can provide real-time data collection and analysis, helping doctors better understand patients' conditions and intervene promptly. ### Research Methods 1. **Data Source**: Using the BRIGHTEN dataset, which includes two studies (BRIGHTEN-V1 and BRIGHTEN-V2), each lasting 12 weeks, collecting participants' daily PHQ-2 surveys, passive data, and weekly (first 4 weeks) and bi-weekly PHQ-9 surveys. 2. **Factor Analysis**: Conducting factor analysis on baseline PHQ-9 assessments to determine the dimensions of depressive symptoms. 3. **Predictive Modeling**: Using various machine learning models (such as linear regression, autoregressive integrated moving average, support vector regression, random forest regression, gradient boosting regression, etc.) to predict PHQ-9 total scores and individual factor scores. 4. **Feature Importance Analysis**: Calculating feature importance using SHAP values to understand which features have the greatest impact on model predictions. 5. **Subset Models**: Training predictive models based on different data types (such as EMA data, communication data, activity data, etc.) to evaluate their performance. 6. **Inference Modeling**: Using multilevel regression models to analyze the statistical relationship between participants' symptoms and various digital measurements, considering between-individual and within-individual effects. ### Expected Contributions Through this research, the authors hope to: 1. **Improve Predictive Ability**: Demonstrate the effectiveness of ecological momentary assessment (EMA) and other smartphone data in predicting depression. 2. **Guide Future Research**: Identify the most valuable features for future digital phenotyping research, such as communication-related features. 3. **Promote Clinical Application**: Evaluate the cost-effectiveness and intervention potential of these models, providing a basis for future clinical applications. In summary, this paper aims to use smartphone data to more accurately predict different dimensions of depression through a combination of factor analysis, machine learning, and multilevel modeling methods, thereby providing new tools and methods for personalized treatment and remote monitoring of depression.