Heterogeneity Analysis Via Integrating Multi-Sources High-Dimensional Data with Applications to Cancer Studies

Tingyan Zhong,Qingzhao Zhang,Jian Huang,Mengyun Wu,Shuangge Ma
DOI: https://doi.org/10.5705/ss.202021.0002
IF: 1.4
2021-01-01
Statistica Sinica
Abstract:This study has been motivated by cancer research, in which heterogene-ity analysis plays an important role and can be roughly classified as unsupervised or supervised. In supervised heterogeneity analysis, the finite mixture of regres-sion (FMR) technique is used extensively, under which the covariates affect the response differently in subgroups. High-dimensional molecular and, very recently, histopathological imaging features have been analyzed separately and shown to be effective for heterogeneity analysis. For simpler analysis, they have been shown to contain overlapping, but also independent information. In this article, our goal is to conduct the first and more effective FMR-based cancer heterogeneity analysis by integrating high-dimensional molecular and histopathological imaging features. A penalization approach is developed to regularize estimation, select relevant vari-ables, and, equally importantly, promote the identification of independent informa-tion. Consistency properties are rigorously established. An effective computational algorithm is developed. A simulation and an analysis of The Cancer Genome Atlas (TCGA) lung cancer data demonstrate the practical effectiveness of the proposed approach. Overall, this study provides a practical and useful new way of conducting supervised cancer heterogeneity analysis.
What problem does this paper attempt to address?