Abstract:Abstract Motivation Recent developments in technology have enabled researchers to collect multiple OMICS datasets for the same individuals. The conventional approach for understanding the relationships between the collected datasets and the complex trait of interest would be through the analysis of each OMIC dataset separately from the rest, or to test for associations between the OMICS datasets. In this work we show that integrating multiple OMICS datasets together, instead of analysing them separately, improves our understanding of their in-between relationships as well as the predictive accuracy for the tested trait. Several approaches have been proposed for the integration of heterogeneous and high-dimensional (p≫n) data, such as OMICS. The sparse variant of canonical correlation analysis (CCA) approach is a promising one that seeks to penalize the canonical variables for producing sparse latent variables while achieving maximal correlation between the datasets. Over the last years, a number of approaches for implementing sparse CCA (sCCA) have been proposed, where they differ on their objective functions, iterative algorithm for obtaining the sparse latent variables and make different assumptions about the original datasets. Results Through a comparative study we have explored the performance of the conventional CCA proposed by Parkhomenko et al., penalized matrix decomposition CCA proposed by Witten and Tibshirani and its extension proposed by Suo et al. The aforementioned methods were modified to allow for different penalty functions. Although sCCA is an unsupervised learning approach for understanding of the in-between relationships, we have twisted the problem as a supervised learning one and investigated how the computed latent variables can be used for predicting complex traits. The approaches were extended to allow for multiple (more than two) datasets where the trait was included as one of the input datasets. Both ways have shown improvement over conventional predictive models that include one or multiple datasets. Availability and implementation https://github.com/theorod93/sCCA. Supplementary information Supplementary data are available at Bioinformatics online.

On the stability of canonical correlation analysis and partial least squares with application to brain-behavior associations

On stability of Canonical Correlation Analysis and Partial Least Squares with application to brain-behavior associations

Stability test of canonical correlation analysis for studying brain‐behavior relationships: The effects of subject‐to‐variable ratios and correlation strengths

Comparing the stability and reproducibility of brain-behaviour relationships found using Canonical Correlation Analysis and Partial Least Squares within the ABCD Sample

Balancing the Stability and Predictive Performance for Multivariate Voxel Selection in fMRI Study.

Multivariate brain-behaviour associations in psychiatric disorders

A technical review of canonical correlation analysis for neuroscience applications

Comparison of variants of canonical correlation analysis and partial least squares for combined analysis of MRI and genetic data

Correlated Components Analysis - Extracting Reliable Dimensions in Multivariate Data

Mining High-order Multimodal Brain Image Associations via Sparse Tensor Canonical Correlation Analysis

Canonical Correlation Analysis of Imaging Genetics Data Based on Statistical Independence and Structural Sparsity

Comparison of Canonical Correlation and Partial Least Squares analyses of simulated and empirical data

Identifying associations in dense connectomes using structured kernel principal component regression

Identifying Associations Between Brain Imaging Phenotypes and Genetic Factors Via a Novel Structured SCCA Approach

Capturing functional connectomics using Riemannian partial least squares

Identifying Associations among Genomic, Proteomic and Imaging Biomarkers Via Adaptive Sparse Multi-View Canonical Correlation Analysis

A powerful and efficient multivariate approach for voxel-level connectome-wide association studies

Sparse multiway canonical correlation analysis for multimodal stroke recovery data

A Powerful and Ef Fi Cient Multivariate Approach for Voxel-Level Connectome-Wide Association Studies

A framework for interpretation and testing of sparse canonical correlations

Integrating multi-OMICS data through sparse canonical correlation analysis for the prediction of complex traits: a comparison study