The machine learning algorithm identified COL7A1 as a diagnostic marker for LUSC and HNSC

Chenyu Wang,Yongxin Ma,Jiaojiao Qi,Xianglai Jiang
DOI: https://doi.org/10.1101/2023.07.19.23292914
2023-07-24
MedRxiv
Abstract:Squamous cell carcinomas (SCCs) comes from different parts, but there may be similar tumorigenic signaling pathways and metabolism, and different squamous cell carcinoma has a similar mutation landscape and squamous differentiation expression. Studying the expression profile of common SCCs is helpful to find biomarkers with diagnostic and prognostic significance for a variety of squamous cell carcinoma. Lung squamous cell carcinoma (LUSC), head and neck squamous cell carcinoma (HNSC), and "squamous cell cancer" in esophageal carcinoma (ESCA) and cervical squamous cell carcinoma and endocervical adenocarcinoma (CESC) in The Cancer Genome Atlas (TCGA) database were used as training sets. The relevant data sets in the Gene Expression Omnibus (GEO) database were selected as validation sets. Machine learning algorithms were used to screen out factors with high accuracy in the diagnosis of SCCs as core genes, and explore their effects on patient prognosis and immunotherapy. COL7A1 (Collagen Type VII Alpha 1 Chain) has high accuracy in the diagnosis of LUSC and HCSC, whether in the training set (LUSC _ AUC: 0.987; HNSC _ AUC: 0.933) or validation set (LUSC _ AUC: 1.000; HNSC _ AUC: 0.967). Moreover, the expression of COL7A1 was significantly correlated with shorter OS and DSS in HNSC and LUSC patients, and was also significantly negatively correlated with IPS in LUSC patients treated with CTLA4 (-) PD1 (+), CTLA4 (+) PD1 (-) and CTLA4 (+) PD1 (+). COL7A1 has the potential to be used as a diagnostic and prognostic marker for LUSC and HNSC and to predict the efficacy of LUSC immunotherapy.
What problem does this paper attempt to address?