Multitask Learning of Longitudinal Circulating Biomarkers and Clinical Outcomes: Identification of Optimal Machine-Learning and Deep-Learning Models

Min Yuan,Shixin Su,Haolun Ding,Yaning Yang,Manish Gupta,Xu Steven Xu
DOI: https://doi.org/10.1101/2023.08.19.553991
2023-01-01
Abstract:Many circulating biomarkers are assessed at different time intervals during clinical studies. Despite of the success of standard joint models in predicting clinical outcomes using low dimensional longitudinal data, significant computational challenges are encountered when applying these techniques to high dimensional biomarker datasets. Modern machine or deep-learning models show potential for multiple biomarker processes, but systematic evaluations and applications to high dimensional data in the clinical settings have yet to be reported. We aimed to enhance the scalability of joint modeling and provide guidance on optimal approaches for high-dimensional biomarker data and outcomes. We evaluated multiple deep learning and machine-learning models using 24 clinical biomarkers and survival data from the SQUIRE trial, a phase 3 randomized clinical trial investigating necitumumab and standard gemcitabine/cisplatin treatment in patients with squamous non small cell lung cancer. Overall, we confirmed that longitudinal models enabled more accurate prediction of patients' survival compared to those solely based on baseline information. Coupling multivariate functional principal component analysis (MFPCA) with Cox regression (MFPCA-Cox) provided the highest predictive discrimination and accuracy for the NSCLC patients with AUC values of 0.7 - >0.8 at various landmark time points and prediction timeframes, outperforming recent advanced Transformer and convolutional neural network deep-learning algorithms (TransformerJM and Match-Net, respectively). In conclusion, we identified that MFPCA-Cox represents a robust and versatile joint modeling algorithm for high-dimensional biomarker longitudinal data with irregular and missing data, capturing complex relationships within the data, yielding accurate predictions for both longitudinal biomarkers and survival outcomes, and gaining insights into the underlying dynamics. ### Competing Interest Statement The authors have declared no competing interest.
What problem does this paper attempt to address?