COADREADx: A comprehensive algorithmic dissection unravels salient biomarkers and actionable insights into the discrete progression of colorectal cancer
Ashok Palaniappan,Sangeetha Muthamilselvan,Arjun Sarathi
DOI: https://doi.org/10.1101/2022.08.16.22278877
2024-03-15
Abstract:Colorectal cancer is a common condition with an uncommon burden of disease, heterogeneity in manifestation, and no definitive treatment in the advanced stages. Against this backdrop, renewed efforts to unravel the genetic drivers of colorectal cancer progression are paramount. Early-stage detection contributes to the success of cancer therapy and increases the likelihood of a favorable prognosis. Here, we have executed a comprehensive computational workflow aimed at uncovering the discrete stagewise genomic drivers of colorectal cancer progression. Using the TCGA COADREAD expression data and clinical metadata, we constructed stage-specific linear models as well as contrast models to identify stage-salient differentially expressed genes. Stage-salient differentially expressed genes with a significant monotone trend of expression across the stages were identified as progression-significant biomarkers. Among the biomarkers identified are: CRLF1, CALB2, STAC2, UCHL1, KCNG1 (stage-I salient), KLHL34, LPHN3, GREM2, ADCY5, PLAC2, DMRT3 (stage-II salient), PIGR, HABP2, SLC26A9 (stage-III salient), GABRD, DKK1, DLX3, CST6, HOTAIR (stage-IV salient), and CDH3, KRT80, AADACL2, OTOP2, FAM135B, HSP90AB1 (top linear model genes). In particular the study yielded 31 genes that are progression-significant such as ESM1, DKK1, SPDYC, IGFBP1, BIRC7, NKD1, CXCL13, VGLL1, PLAC1, SPERT, UPK2, and interestingly three members of the LY6G6 family. Significant monotonic linear model genes included HIGD1A, ACADS, PEX26, and SPIB. The stage-salient genes were benchmarked using normals-augmented dataset, and cross-referenced with existing knowledge. In addition, the signature of a multicellular immuno-cyte community specific to colorectal cancer relative to normal tissue was identified. The candidate biomarkers were used to construct the feature space for learning an optimal model for the digital screening of early-stage colorectal cancers. A feature space of just seven biomarkers, namely ESM1, DHRS7C, OTOP3, AADACL2, LPHN3, GABRD, and LPAR1, was sufficient to optimize a RandomForest model that achieved >98% balanced accuracy (and performant recall) on blind validation with external datasets. Survival analysis yielded a panel of three stage-IV salient genes, namely HOTAIR, GABRD, and DKK1, for the design of an optimal multivariate model for patient risk stratification. Integrating the above results, we have developed COADREADx, a web-server for assisting the screening and prognosis of colorectal cancers. COADREADx has been deployed at: for academic research and further refinement.
Oncology