Identifying gastric cancer molecular subtypes by integrating DNA-based hierarchical classification strategy and clinical stratification

Binyu Yang,Siying Liu,Jiemin Xie,Xi Tang,Pan Guan,Yifan Zhu,Li C. Xia
DOI: https://doi.org/10.1101/2023.06.09.544302
2023-06-11
Abstract:Abstract Background Molecular subtyping has been introduced to better understand the genetic landscape of gastric cancer ( GC ), but current subtyping methods only had limited success because of the mixed use of molecular features, a lack of strategy optimization, and the limited availability of GC samples. The community urgently calls for a precise, and easily adoptable subtyping method to enable DNA-based early screening and treatment. Methods Based on TCGA subtypes, we developed a novel classifier, termed Hierarchical DNA-based Classifier for Gastric Cancer Molecular Subtyping ( HCG ), leveraging all DNA-level alterations as predictors, including gene mutations, copy number aberrations and methylations. By adding the closely related esophageal adenocarcinomas ( EA ) dataset, we expanded the TCGA GC dataset for training and testing HCG (n=453). We optimized HCG with three hierarchical strategies evaluated by their overall accuracy using Lasso-Logistic regression, and by their clinical stratification capacity using multivariate survival analysis. We used difference tests to identify subtype-specific DNA alteration biomarkers based on HCG defined subtypes. Results Our HCG classifier achieved an overall AUC score of 0.95 and significantly improved the clinical stratification of patients (overall p-value=0.032). 25 subtype-specific DNA alterations were identified by difference tests, including high level of mutations in SYNE1 , ITGB4 and COL22A1 genes for the MSI subtype, high level of methylations of ALS2CL , KIAA0406 and RPRD1B genes for the EBV subtype. Conclusions HCG is an accurate and robust classifier for DNA-based GC molecular subtyping with high-performing clinical stratification capacity. The training and test datasets and analysis programs of HCG are available at https://github.com/labxscut/HCG .
What problem does this paper attempt to address?