Abstract:Background: Low-dose CT (LDCT) is recommended for screening people at high risk for lung cancer, but large numbers of low-risk people, such as the never-smoking population, are missed. Liquid biopsy for early cancer and cancer recurrence detection has been studied for a long time, but meaningful clinical laboratory data based on peripheral blood were underappreciated and underutilized.Methods: Using machine learning methods, a lung cancer prediction model was trained on 24 indicators of peripheral blood laboratory markers and patient ages were recorded at the time of lung cancer diagnosis. We assembled 7060 lung cancer cases and 3368 contemporaneous benign cases to train and test the model, using the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity as measures of model performance. The benign disease datasets were divided into three subpopulations according to disease type, including one infectious subpopulation and two non-infectious subpopulations, which were used to train or test the model. The models were prospectively validated on internal datasets based on 77 lung cancer cases involving hospitalization for benign diseases.Findings: In the lung cancer prediction model with the infectious subpopulation (group 1) as the test dataset, the AUC value of stage I-II lung cancer patients was 0.70. The benign vertebral disease dataset was used as the test dataset, and the AUC value of stage I-II lung cancer patients was 0.74. Taking cerebrovascular disease as the test dataset, the AUC value of stage I-II lung cancer patients was 0.75. Using cerebrovascular disease as the test dataset, the diagnostic sensitivity was 67.2% at the predefined specificity of 95%. The model was validated on an internally traceable lung cancer dataset composed of 77 lung cancer patients with hospitalization records in our hospital for non-neoplastic diseases before the diagnosis of lung cancer. Analysis of previous hospitalization data showed that at 95% specificity, our model would predict lung cancer in eight patients at the time of their previous hospitalization, among which five patients had negative chest radiographs. Considering the five patients with negative chest radiographs, our model would predict lung cancer 25.10 ± 16.48 months before their actual lung cancer diagnosis. Two of these patients with negative chest radiographs had stage I lung cancer at the time of their diagnosis. Our lung cancer prediction model is publicly available at https://www.mtaibt.com.Interpretation: The pTablab (pan-tumor associated peripheral blood laboratory markers) lung cancer prediction model has better performance in predicting lung cancer for people who do not require treatment for infectious diseases and can predict lung cancer before lung nodules appear on imaging.Funding: Integrated innovation and application of key technologies for precise prevention and treatment of primary lung cancer, Chongqing, China (No. 2019ZX002).Declaration of Interest: The authors declare that they have no conflicts of interestEthical Approval: This study was approved by the Institutional Review Board of Chongqing University Cancer Hospital.

Machine Learning Computational Model to predict Lung Cancer Using Electronic Medical Records

Machine learning computational model to predict lung cancer using electronic medical records

Machine Learning and Real-World Data to Predict Lung Cancer Risk in Routine Care

Validation of a Deep Learning-Based Model to Predict Lung Cancer Risk Using Chest Radiographs and Electronic Medical Record Data

Machine Learning for Lung Cancer Prediction Using Pan-Tumor Associated Peripheral Blood Laboratory Markers

Lung Cancer Risk Prediction with Machine Learning Models

Construction of a risk prediction model for lung infection after chemotherapy in lung cancer patients based on the machine learning algorithm

Development of Lung Cancer Risk Prediction Machine Learning Models for Equitable Learning Health System: Retrospective Study

Pulmonologists-Level lung cancer detection based on standard blood test results and smoking status using an explainable machine learning approach

Predicting the risk of lung cancer using machine learning: A large study based on UK Biobank

Machine Learning for Early Discrimination Between Lung Cancer and Benign Nodules Using Routine Clinical and Laboratory Data

Application of machine learning for lung cancer survival prognostication—A systematic review and meta-analysis

A risk model for prediction of lung cancer.

Machine learning for predicting liver and/or lung metastasis in colorectal cancer: a retrospective study based on the SEER database

Machine Learning Models for Pancreatic Cancer Risk Prediction Using Electronic Health Record Data - A Systematic Review and Assessment

Development and Validation of a Risk Prediction Model for Venous Thromboembolism in Lung Cancer Patients Using Machine Learning

Performance of a Machine Learning Algorithm Using Electronic Health Record Data to Identify and Estimate Survival in a Longitudinal Cohort of Patients With Lung Cancer

Dynamic Predictive Models with Visualized Machine Learning for Assessing the Risk of Lung Metastasis in Kidney Cancer Patients

Predicting early-onset COPD risk in adults aged 20–50 using electronic health records and machine learning

Development and Validation of a Multivariable Lung Cancer Risk Prediction Model That Includes Low-Dose Computed Tomography Screening Results

Early symptoms and sensations as predictors of lung cancer: a machine learning multivariate model