Abstract:We propose genomic principal and independent component analysis (PCA, ICA) to decompose a large set of univariate genome‐wide association study (GWAS) statistics of multimodal brain traits into more interpretable latent genomic components. Our results indicate that genomic PCA and ICA decompose genetic effects on imaging derived phenotypes from GWAS statistics with high reproducibility by taking advantage of the inherent pleiotropic patterns. Genomic components clustered along the boundaries of neuroimaging modalities. These findings encourage further applications of genomic PCA and ICA as fully data‐driven methods to effectively reduce the dimensionality, enhance the signal to noise ratio and improve interpretability of high‐dimensional multitrait genome‐wide analyses. The highly polygenic and pleiotropic nature of behavioural traits, psychiatric disorders and structural and functional brain phenotypes complicate mechanistic interpretation of related genome‐wide association study (GWAS) signals, thereby obscuring underlying causal biological processes. We propose genomic principal and independent component analysis (PCA, ICA) to decompose a large set of univariate GWAS statistics of multimodal brain traits into more interpretable latent genomic components. Here we introduce and evaluate this novel methods various analytic parameters and reproducibility across independent samples. Two UK Biobank GWAS summary statistic releases of 2240 imaging‐derived phenotypes (IDPs) were retrieved. Genome‐wide beta‐values and their corresponding standard‐error scaled z‐values were decomposed using genomic PCA/ICA. We evaluated variance explained at multiple dimensions up to 200. We tested the inter‐sample reproducibility of output of dimensions 5, 10, 25 and 50. Reproducibility statistics of the respective univariate GWAS served as benchmarks. Reproducibility of 10‐dimensional PCs and ICs showed the best trade‐off between model complexity and robustness and variance explained (PCs: |rz − max| = 0.33, |rraw − max| = 0.30; ICs: |rz − max| = 0.23, |rraw − max| = 0.19). Genomic PC and IC reproducibility improved substantially relative to mean univariate GWAS reproducibility up to dimension 10. Genomic components clustered along neuroimaging modalities. Our results indicate that genomic PCA and ICA decompose genetic effects on IDPs from GWAS statistics with high reproducibility by taking advantage of the inherent pleiotropic patterns. These findings encourage further applications of genomic PCA and ICA as fully data‐driven methods to effectively reduce the dimensionality, enhance the signal to noise ratio and improve interpretability of high‐dimensional multitrait genome‐wide analyses.

TeraPCA: a fast and scalable software package to study genetic variation in tera-scale genotypes

Power Analysis of Principal Components Regression in Genetic Association Studies.

A high-performance computing toolset for relatedness and principal component analysis of SNP data

PCA-Plus: Enhanced principal component analysis with illustrative applications to batch effects and their quantitation

Tropical principal component analysis on the space of phylogenetic trees

Principal component analysis revisited: fast multi-trait genetic evaluations with smooth convergence

Probabilistic PCA of Censored Data: Accounting for Uncertainties in the Visualization of High-Throughput Single-Cell Qpcr Data.

Detecting Genomic Signatures of Natural Selection with Principal Component Analysis: Application to the 1000 Genomes Data

A joint framework for studying population structure using principal component analysis and F-statistics

A spectral graph approach to discovering genetic ancestry

GRAF-pop: A Fast Distance-Based Method To Infer Subject Ancestry from Multiple Genotype Datasets Without Principal Components Analysis

Parallel GPU Implementation of Iterative PCA Algorithms

Principal Component Analyses in Anthropological Genetics

PCA Outperforms Popular Hidden Variable Inference Methods for Molecular QTL Mapping

Principal and independent genomic components of brain structure and function

Fast Randomized PCA for Sparse Data

Deterministic parallel analysis: An improved method for selecting factors and principal components

PRSice-2: Polygenic Risk Score software for biobank-scale data

Principal component analysis revisited: fast multitrait genetic evaluations with smooth convergence

$e$PCA: High dimensional exponential family PCA

Sparse Principal Component Analysis