Using Local Principal Components to Explore Relationships Between Heterogeneous Omics Datasets

Noor Alaydie,Farshad Fotouhi
DOI: https://doi.org/10.1007/978-3-7091-1538-1_11
2013-01-01
Abstract:In the post-genomic era, high-throughput technologies lead to the generation of large amounts of ‘omics’ data such as transcriptomics, metabolomics, proteomics or metabolomics, that are measured on the same set of samples. The development of methods that are capable to perform joint analysis of multiple datasets from different technology platforms to unravel the relationships between different biological functional levels becomes crucial. A common way to analyze the relationships between a pair of data sources based on their correlation is canonical correlation analysis (CCA). CCA seeks for linear combinations of all the variables from each dataset which maximize the correlation between them. However, in high dimensional datasets, where the number of variables exceeds the number of experimental units, CCA may not lead to meaningful information. Moreover, when collinearity exists in one or both the datasets, CCA may not be applicable. Here, we present a novel method, (LPC-KR), to extract common features from a pair of data sources using Local Principal Components and Kendall’s Ranking. The results show that the proposed algorithm outperforms CCA in many scenarios and is more robust to noisy data. Moreover, meaningful results are obtained using the proposed algorithm when the number of variables exceeds the number of experimental units.
What problem does this paper attempt to address?