Identification of the differences in molecular networks between idiopathic pulmonary fibrosis and lung squamous cell carcinoma using machine learning

Yosui Nojima,Kenji Mizuguchi
DOI: https://doi.org/10.1101/2024.11.25.625120
2024-11-26
Abstract:Idiopathic pulmonary fibrosis (IPF), a form of idiopathic interstitial pneumonia, is an independent risk factor for lung cancer. The prognosis of IPF patients with lung cancer is poorer than that of IPF patients without lung cancer, and preventive measures for lung cancer remain elusive in patients with IPF. To mitigate lung cancer onset in patients with IPF, understanding the distinct mechanisms that induce both diseases is crucial. We developed highly accurate machine learning (ML) models to classify patients with IPF and lung cancer using public RNA sequencing data. To construct the ML models, the random restart technique was applied to five algorithms, namely, k-nearest neighbors, support vector machines (SVM) with radial basis function kernel, SVM with linear kernel, eXtreme gradient boosting, and random forest. To identify differentially expressed genes between IPF and lung cancer, feature importance was calculated in the classification models. Furthermore, we detected somatic mutations impacting gene expression using lung cancer data. The ML models identified , , , , , , , , , , and as differentially expressed genes. Somatic mutations were detected in four transcription factors, , , , and , that regulate the expression of the abovementioned 11 genes. Furthermore, a molecular network was discovered, comprising the four transcription factors and 11 downstream genes. The newly identified molecular network enhances our understanding of the distinct mechanisms underlying IPF and lung cancer onset, providing new insights into the prevention of lung cancer complications in patients with IPF.
Cancer Biology
What problem does this paper attempt to address?