Machine Learning Computational Model to predict Lung Cancer Using Electronic Medical Records

Shlomi,D.,Levi,M.,Kushnir,S.,Yossef,N.,Hoogi,A.,Lazebnik,T.
DOI: https://doi.org/10.1183/13993003.congress-2024.oa4682
IF: 24.3
2024-11-01
European Respiratory Journal
Abstract:Background: Lung cancer (LC) screening using low-dose computed tomography (CT) is recommended according to the standard risk criteria or individual risk calculators. Machine learning (ML) models that can predict disease risks are an emerging method in medicine for finding hidden associations that are individually unique. Methods: We used ML to develop a model based on known risk factors for LC as part of a larger trial for ML prediction using electronic medical records and chest CT. We used data from patients with LC versus controls (1:2) of patients aged ≥ 35 years. We developed a model for all LC patients as well as for patients with and without a smoking background. We included age, sex, body mass index (BMI), smoking history, history of chronic obstructive pulmonary disease (COPD)/emphysema/chronic bronchitis (CB), interstitial lung disease (ILD)/pulmonary fibrosis (PF), and family history of LC. Results: Of the 4,076 patients, 1,428 (35%) were in the LC group and 2,648 (65%) were in the control group. For the entire study population, our model achieved an accuracy of 0.712, with a sensitivity of 69% and a positive predictive value (PPV) of 74%. Higher accuracy was achieved for the two sub-groups. An accuracy of 0.748 (sensitivity 72%, PPV 76%) and 0.73 (sensitivity 76%, PPV 72%) was achieved for the smoking and never-smoking cohorts, respectively. Conclusion: Known risk factors for LC could be used in ML models to modestly predict LC. Further studies are needed to confirm these results in new patients and to improve them.
respiratory system
What problem does this paper attempt to address?