Toward the precision breast cancer survival prediction utilizing combined whole genome-wide expression and somatic mutation analysis

Yifan Zhang,William Yang,Dan Li,Jack Y Yang,Renchu Guan,Mary Qu Yang
DOI: https://doi.org/10.1186/s12920-018-0419-x
2018-01-01
BMC Medical Genomics
Abstract:Background Breast cancer is the most common type of invasive cancer in woman. It accounts for approximately 18% of all cancer deaths worldwide. It is well known that somatic mutation plays an essential role in cancer development. Hence, we propose that a prognostic prediction model that integrates somatic mutations with gene expression can improve survival prediction for cancer patients and also be able to reveal the genetic mutations associated with survival. Method Differential expression analysis was used to identify breast cancer related genes. Genetic algorithm (GA) and univariate Cox regression analysis were applied to filter out survival related genes. DAVID was used for enrichment analysis on somatic mutated gene set. The performance of survival predictors were assessed by Cox regression model and concordance index(C-index). Results We investigated the genome-wide gene expression profile and somatic mutations of 1091 breast invasive carcinoma cases from The Cancer Genome Atlas (TCGA). We identified 118 genes with high hazard ratios as breast cancer survival risk gene candidates (log rank p < 0.0001 and c-index = 0.636). Multiple breast cancer survival related genes were found in this gene set, including FOXR2 , FOXD1 , MTNR1B and SDC1 . Further genetic algorithm (GA) revealed an optimal gene set consisted of 88 genes with higher c-index (log rank p < 0.0001 and c-index = 0.656). We validated this gene set on an independent breast cancer data set and achieved a similar performance (log rank p < 0.0001 and c-index = 0.614). Moreover, we revealed 25 functional annotations, 15 gene ontology terms and 14 pathways that were significantly enriched in the genes that showed distinct mutation patterns in the different survival risk groups. These functional gene sets were used as new features for the survival prediction model. In particular, our results suggested that the Fanconi anemia pathway had an important role in breast cancer prognosis. Conclusions Our study indicated that the expression levels of the gene signatures remain the effective indicators for breast cancer survival prediction. Combining the gene expression information with other types of features derived from somatic mutations can further improve the performance of survival prediction. The pathways that were associated with survival risk suggested by our study can be further investigated for improving cancer patient survival.
What problem does this paper attempt to address?