Constructing a machine learning-based non-invasive liquid biopsy model for lung cancer detection via miRNA sequencing.
Jason Chia-Hsun Hsieh,Tsung-Ting Hsieh,Ko-Han Lee,Yu-Hsuen Tu,Po-Ya Chang,Eva Yi-Hsuan Wu,Yu-Chuan Chang,Yen-Jung Lu
DOI: https://doi.org/10.1200/jco.2024.42.16_suppl.e20014
IF: 45.3
2024-06-01
Journal of Clinical Oncology
Abstract:e20014 Background: Lung cancer is a prevalent and fatal disease, ranking second among common cancers globally. Early detection and prompt treatment significantly improve survival rates for most cancers, including lung cancer. Presently, low-dose computed tomography (LDCT) is a widely accepted tool for early lung cancer screening. However, it exhibits a high false-positive rate of up to 95%, resulting in overdiagnosis and causing anxiety among patients awaiting diagnosis results. Therefore, investigating alternative effective interventions would provide physicians with additional tools for evaluating lung cancer and making informed decisions. This study employs the concept of liquid biopsy, utilizing next-generation sequencing (NGS) to gather miRNA profiles, aiming to construct a machine learning-driven model for the detection of lung cancer. Methods: We assembled case-control cohorts, comprising 74 lung cancer cases (stage 0-II: 66; III-IV: 8) and 74 age- and gender-matched healthy control subjects, recruited from Chang Gung Memorial Hospital in Linkuo, Taiwan. Plasma derived from whole blood was collected from each subject before any cancer surgery or treatment. Subsequently, we extracted miRNA from the plasma and constructed libraries using the QIAseq miRNA Library kit for miRNA sequencing conducted on the Illumina NextSeq550 platform. RNA-seq data analysis was performed on miRNA sequencing data using the QIAGEN RNA-seq Analysis Portal 5.0. The count per million base (CPM) derived from the quantification results was utilized for differential analysis and constructing the machine learning model. To assess the model's performance, we reserved 10% of the subjects for isolated testing, while the remaining 90% underwent 10-fold cross-validation (CV). Results: The final model, built using logistic regression with six significant differentially expressed miRNAs (fold change > = 2 and P-value < = 0.01), achieved optimal performance. The 10-fold CV demonstrated accuracy: 97.73%; sensitivity: 98.48%; specificity: 96.97%; positive predictive value: 97.01%; negative predictive value: 98.46%. In isolated testing, all 16 subjects, including 8 cases and 8 controls, were correctly identified. Conclusions: This study with 148 subjects showcases the potential of NGS and machine learning for early-stage lung cancer detection via liquid biopsy. However, the sample size is relatively small, lacking cross-site subjects. Therefore, an additional multi-center clinical trial is underway to gather more subjects, aiming for a deeper investigation and more concrete conclusions in the near future.
oncology