FPLS-DC: Functional partial least squares through distance covariance for imaging genetics

Wenliang Pan,Yue Shan,Chuang Li,Shuai Huang,Tengfei Li,Yun Li,Hongtu Zhu
DOI: https://doi.org/10.1093/bioinformatics/btae173
IF: 5.8
2024-03-29
Bioinformatics
Abstract:Abstract Motivation Imaging genetics integrates imaging and genetic techniques to examine how genetic variations influence the function and structure of organs like the brain or heart, providing insights into their impact on behavior and disease phenotypes. The use of organ-wide imaging endophenotypes has increasingly been employed to identify potential genes associated with complex disorders. However, analyzing organ-wide imaging data alongside genetic data presents two significant challenges: high dimensionality and complex relationships. To address these challenges, we propose a novel, nonlinear inference framework designed to partially mitigate these issues. Results We propose a functional partial least squares through distance covariance (FPLS-DC) framework for efficient genome wide analyses of imaging phenotypes. It consists of two components. The first component utilizes the FPLS-derived base functions to reduce image dimensionality while screening genetic markers. The second component maximizes the distance correlation between genetic markers and projected imaging data, which is a linear combination of the FPLS-basis functions, using simulated annealing algorithm. In addition, we proposed an iterative FPLS-DC (I-FPLS-DC) method based on FPLS-DC framework, which effectively overcomes the influence of inter-gene correlation on inference analysis. We efficiently approximate the null distribution of test statistics using a gamma approximation. Compared to existing methods, FPLS-DC offers computational and statistical efficiency for handling large-scale imaging genetics. In real-world applications, our method successfully detected genetic variants associated with the hippocampus, demonstrating its value as a statistical toolbox for imaging genetic studies. Availability and implementation The FPLS-DC method we propose opens up new research avenues and offers valuable insights for analyzing functional and high-dimensional data. Additionally, it serves as a useful tool for scientific analysis in practical applications within the field of imaging genetics research. The R package FPLS-DC is available in Github: https://github.com/BIG-S2/FPLSDC. Supplementary information Supplementary data are available at Bioinformatics online.
biochemical research methods,biotechnology & applied microbiology,mathematical & computational biology
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve two major challenges faced when analyzing whole - organ imaging data and genetic data in imaging genetics: high - dimensionality and complex relationships. Specifically, the research objectives are as follows: 1. **High - dimensionality problem**: Genetic data and imaging data usually have extremely high dimensions, which makes it difficult for traditional statistical methods to handle effectively. The paper proposes a new non - linear inference framework - Functional Partial Least Squares via Distance Covariance (FPLS - DC) to partially alleviate this problem. 2. **Complex relationship problem**: The influence of genetic variation on organ function and structure is usually complex and non - linear. The FPLS - DC framework captures these complex relationships by maximizing the distance correlation between genetic markers and projected imaging data. ### Specific methods The FPLS - DC framework proposed in the paper contains two main components: 1. **Dimensionality reduction and screening of genetic markers**: - Use FPLS - derived basis functions to reduce the dimension of imaging data. - Screen genetic markers for use in subsequent analysis. 2. **Maximizing distance correlation**: - Use the simulated annealing algorithm to maximize the distance correlation between genetic markers and projected imaging data, which is a linear combination of FPLS basis functions. In addition, the paper also proposes an iterative FPLS - DC (I - FPLS - DC) method based on the FPLS - DC framework, which effectively overcomes the influence of inter - gene correlation on inference analysis. To efficiently approximate the null distribution of the test statistic, the paper adopts the gamma approximation method. ### Experimental verification The paper verifies the effectiveness of the FPLS - DC method through Monte Carlo simulation and real - data application. The results show that FPLS - DC has computational and statistical efficiency advantages when processing large - scale imaging genetics data. In practical applications, this method has successfully detected genetic variations related to the hippocampus, demonstrating its value in imaging genetics research. ### Conclusion The FPLS - DC method provides a new research approach for processing functional high - dimensional data and performs well in practical applications. This method not only helps to understand the influence of genetic variation on organ function and structure but also can play a role in clinical diagnosis and prognosis. The paper also provides the R package FPLS - DC for researchers to use in practical work.