Bioinformatics Analysis of Pivotal Module and Biomarkers Related to the Prognosis of Breast Cancer Based on Single-cell Transcriptome Data

Rui Liu,Xin Yang,Yuhang Quan,Yiyin Tang,Yafang Lai,Chunxiang Li,Dechun Yang,Maohua Wang,Anhao Wu
DOI: https://doi.org/10.21203/rs.3.rs-424509/v1
2021-04-20
Abstract:Abstract Background: The effect of breast cancer heterogeneity on prognosis of patients is still unclear, especially the role of immune cells in prognosis of breast cancer. Therefore, the discovery of new markers to assess the effect of breast cancer heterogeneity on patient prognosis is crucial to improve prognosis and survival of patients. Methods: Single cell transcriptome sequencing data of breast cancer were downloaded from GEO database. PCA and UMAP were used for dimensionality reduction analysis and cell clustering. Find All Markers function was used to calculate differential genes in each cluster, and Do Heatmap function was used to plot the distribution of differential genes in each clusters. CellPhoneDB was used to analyze ligand-receptor interactions. TRRUST database combined with Cytoscape were used to construct a receptor-ligand-transcription factor interaction network. WGCNA is used to analyze pivotal modules associated with breast cancer prognosis. Univariate regression analysis and KM survival analysis were used to identify prognostic genes in prognostic modules. Multivariate regression analysis combined with risk scoring were used to construct a breast prognosis model, which was verified by TCGA and ICGC sample data. Result: In this study, 14 cell clusters were identified in two single-cell datasets (GSE75688 and G118389). The results of ligand receptor interaction network revealed that macrophages and DC cells were the most frequently interacting cells with other cells in breast cancer. The results of WGCNA analysis suggested that the MEblue module is most relevant to the overall survival time of triple-negative breast cancer. Twenty-four prognostic genes in the blue module were identified by univariate Cox regression analysis and KM survival analysis. Multivariate regression analysis combined with risk analysis was used to analyze 24 prognostic genes to construct a prognostic model. The verification result of our prognostic model showed that there were significant differences in the expression of PCDH12, SLIT3, ACVRL1, and DLL4 genes between the high-risk group and the low-risk group. Conclusion: PCDH12, SLIT3, ACVRL1 and DLL4 are prognostic biomarkers and relate to the type and proportion of immune cells in breast cancer.
What problem does this paper attempt to address?