Integrated Models of Blood Protein and Metabolite Enhance the Diagnostic Accuracy for Non-Small Cell Lung Cancer
Runhao Xu,Jiongran Wang,Qingqing Zhu,Chen Zou,Zehao Wei,Hao Wang,Zian Ding,Minjie Meng,Huimin Wei,Shijin Xia,Dongqing Wei,Li Deng,Shulin Zhang
DOI: https://doi.org/10.1186/s40364-023-00497-2
2023-01-01
Biomarker Research
Abstract:Background For early screening and diagnosis of non-small cell lung cancer (NSCLC), a robust model based on plasma proteomics and metabolomics is required for accurate and accessible non-invasive detection. Here we aim to combine TMT-LC-MS/MS and machine-learning algorithms to establish models with high specificity and sensitivity, and summarize a generalized model building scheme. Methods TMT-LC-MS/MS was used to discover the differentially expressed proteins (DEPs) in the plasma of NSCLC patients. Plasma proteomics-guided metabolites were selected for clinical evaluation in 110 NSCLC patients who were going to receive therapies, 108 benign pulmonary diseases (BPD) patients, and 100 healthy controls (HC). The data were randomly split into training set and test set in a ratio of 80:20. Three supervised learning algorithms were applied to the training set for models fitting. The best performance models were evaluated with the test data set. Results Differential plasma proteomics and metabolic pathways analyses revealed that the majority of DEPs in NSCLC were enriched in the pathways of complement and coagulation cascades, cholesterol and bile acids metabolism. Moreover, 10 DEPs, 14 amino acids, 15 bile acids, as well as 6 classic tumor biomarkers in blood were quantified using clinically validated assays. Finally, we obtained a high-performance screening model using logistic regression algorithm with AUC of 0.96, sensitivity of 92%, and specificity of 89%, and a diagnostic model with AUC of 0.871, sensitivity of 86%, and specificity of 78%. In the test set, the screening model achieved accuracy of 90%, sensitivity of 91%, and specificity of 90%, and the diagnostic model achieved accuracy of 82%, sensitivity of 77%, and specificity of 86%. Conclusions Integrated analysis of DEPs, amino acid, and bile acid features based on plasma proteomics-guided metabolite profiling, together with classical tumor biomarkers, provided a much more accurate detection model for screening and differential diagnosis of NSCLC. In addition, this new mathematical modeling based on plasma proteomics-guided metabolite profiling will be used for evaluation of therapeutic efficacy and long-term recurrence prediction of NSCLC.