Feature Selection and Assessment of Lung Cancer Sub-types by Applying Predictive Models

Sara González,Daniel Castillo,Juan Manuel Galvez,Ignacio Rojas,Luis Javier Herrera
DOI: https://doi.org/10.1007/978-3-030-20518-8_73
2019-01-01
Advances in Computational Intelligence
Abstract:The main goal of this study is the identification of a robust set of genes having the capability of discerning among the different sub-types of lung cancer: Small Cell Lung Carcinoma (SCLC), Adenocarcinoma (ACC), Squamous Cell Carcinoma (SCC) and Large Cell Lung Carcinoma (LCLC). To achieve this goal, an overall differentially expressed genes analysis was performed by using data from gene expression microarrays publicly stored at NCBI/GEO platform. Once the analysis was done, a total of 60 Differential Expressed Genes (DEGs) were selected and then used in the development of predictive models combining supervised machine learning and feature selection algorithms. This provided a reduced and specific gene signature that allows identifying the sub-type of lung cancer of new samples. The predictive models designed are assessed in terms of accuracy, f1-score, sensitivity and specificity. Finally, a set of public web platforms having biological information on genes, were used in order to determine the relation that exists between the final subset of genes and the addressed sub-types of lung cancer.
What problem does this paper attempt to address?