Integrate Any Omics: Towards genome-wide data integration for patient stratification

Shihao Ma,Andy G.X. Zeng,Benjamin Haibe-Kains,Anna Goldenberg,John E Dick,Bo Wang
2024-01-16
Abstract:High-throughput omics profiling advancements have greatly enhanced cancer patient stratification. However, incomplete data in multi-omics integration presents a significant challenge, as traditional methods like sample exclusion or imputation often compromise biological diversity and dependencies. Furthermore, the critical task of accurately classifying new patients with partial omics data into existing subtypes is commonly overlooked. To address these issues, we introduce IntegrAO (Integrate Any Omics), an unsupervised framework for integrating incomplete multi-omics data and classifying new samples. IntegrAO first combines partially overlapping patient graphs from diverse omics sources and utilizes graph neural networks to produce unified patient embeddings. Our systematic evaluation across five cancer cohorts involving six omics modalities demonstrates IntegrAO's robustness to missing data and its accuracy in classifying new samples with partial profiles. An acute myeloid leukemia case study further validates its capability to uncover biological and clinical heterogeneity in incomplete datasets. IntegrAO's ability to handle heterogeneous and incomplete data makes it an essential tool for precision oncology, offering a holistic approach to patient characterization.
Genomics,Machine Learning,Quantitative Methods
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on the challenges in multi - omics data integration and patient stratification, especially the ability to handle incomplete data sets. Specifically, the paper points out that current multi - omics data integration methods usually require complete data samples, which is a significant limitation in practical applications, because experimental or financial limitations often lead to data missing. Such data missing not only reduces the number of samples available for analysis, but also may introduce bias and uncertainty through methods of filling in missing values. In addition, existing methods perform poorly when classifying new patients into defined subtypes, especially when the data sets of these new patients are incomplete. To address these problems, the paper proposes an unsupervised framework named IntegrAO (Integrate Any Omics), which is specifically designed to integrate incomplete multi - omics data and can classify new samples with partial data. IntegrAO achieves this goal by fusing partially overlapping patient profiles from different omics sources and using graph neural networks to generate unified patient embeddings. This framework not only improves the robustness to missing data, but also enhances the accuracy of classifying new samples. In particular, it shows its strong ability in revealing biological and clinical heterogeneity in the case study of acute myeloid leukemia (AML). In summary, the paper aims to overcome the limitations of existing methods in handling incomplete multi - omics data by developing IntegrAO, thereby providing a comprehensive tool for precision oncology and promoting the comprehensive assessment of patient characteristics.