Role of hypoxia-related genes in breast cancer based on a comprehensive analysis of scRNA-seq and bulk RNA-seq

Rui Sun,Wenming Yin,Yilin Zha,Dan Xi,Yingjie Shao,Wendong Gu,Jingting Jiang
DOI: https://doi.org/10.21203/rs.3.rs-1558418/v1
2022-01-01
Abstract:Abstract Background: The hypoxic state in tumor microenvironment of breast cancer favors the proliferation and metastasis of tumor cells thus affecting patient survival. In this study, we aimed to combine single-cell sequencing data and bulk sequencing data to construct hypoxia-related prognostic signature of breast cancer patients.Methods:Single-cell RNA transcriptome data of MCF7 cells subjected to hypoxia microenvironment, the bulk tumour transcriptome data and clinical data were loaded from Gene Expression Omnibus (GEO) database and The Cancer Genome Atlas (TCGA) database. Screening for differentially expressed genes (DEGs) in MCF7 cells under hypoxic microenvironment. Functional enrichment analysis of these DGEs was performed via Gene ontology annotation (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Set Variation Analysis (GSVA). Univariate, the least absolute shrinkage and selection operator (LASSO) regression algorithm and multivariate Cox regression were applied to determine the prognostic gene signature. Risk score and clinical features were used to construct a nomogram model. 22 immune cell infiltrates were calculated using the CIBERSORT algorithm and correlations between risk scores and tumor microenvironment were analyzed. Results:Total 329 DEGs between the hypoxia and normoxia state were identified from scRNA-seq data. GO, KEGG and GSVA analysis revealed that the most gene sets were related to hypoxia. Combined with information of 1099 breast cancer samples sourced from the TCGA database, we identified four genes (ERRFI1, HSPB8, PGK1, STC2) to be independent prognostic genes, and risk scores based on their gene expression were calculated for each patient. Kaplan-Meier survival analysis showed a negative correlation between risk score and patient survival time. The validation using GSE20685 dataset data also yielded similar results. Risk scores and clinical data were considered as independent prognostic factors for breast cancer patients for the construction of a nomogram model, and the model showed good prognostic power. CIBERSORT algorithms was used to calculate 22 immune cell infiltrates in BRCA patients, and the analysis demonstrated that risk scores were positively associated with immunosuppression.Conclusions:In summary, we used single-cell sequencing data from MCF-7 cells under hypoxic conditions to identify prognosis-related genes, and the constructed prognostic model displayed well predictive properties.
What problem does this paper attempt to address?