A Machine Learning Analysis of Big Metabolomics Data for Classifying Depression: Model Development and Validation

Simeng Ma,Xinhui Xie,Zipeng Deng,Wei Wang,Dan Xiang,Lihua Yao,Lijun Kang,Shuxian Xu,Huiling Wang,Gaohua Wang,Jun Yang,Zhongchun Liu
DOI: https://doi.org/10.1016/j.biopsych.2023.12.015
2024-07-01
Abstract:Background: Many metabolomics studies of depression have been performed, but these have been limited by their scale. A comprehensive in silico analysis of global metabolite levels in large populations could provide robust insights into the pathological mechanisms underlying depression and candidate clinical biomarkers. Methods: Depression-associated metabolomics was studied in 2 datasets from the UK Biobank database: participants with lifetime depression (N = 123,459) and participants with current depression (N = 94,921). The Whitehall II cohort (N = 4744) was used for external validation. CatBoost machine learning was used for modeling, and Shapley additive explanations were used to interpret the model. Fivefold cross-validation was used to validate model performance, training the model on 3 of the 5 sets with the remaining 2 sets for validation and testing, respectively. Diagnostic performance was assessed using the area under the receiver operating characteristic curve. Results: In the lifetime depression and current depression datasets and sex-specific analyses, 24 significantly associated metabolic biomarkers were identified, 12 of which overlapped in the 2 datasets. The addition of metabolic features slightly improved the performance of a diagnostic model using traditional (nonmetabolomics) risk factors alone (lifetime depression: area under the curve 0.655 vs. 0.658 with metabolomics; current depression: area under the curve 0.711 vs. 0.716 with metabolomics). Conclusions: The machine learning model identified 24 metabolic biomarkers associated with depression. If validated, metabolic biomarkers may have future clinical applications as supplementary information to guide early and population-based depression detection.
What problem does this paper attempt to address?