Network assisted analysis to reveal the genetic basis of autism

Li Liu,Jing Lei,Kathryn Roeder
DOI: https://doi.org/10.1214/15-AOAS844
2015-11-17
Abstract:While studies show that autism is highly heritable, the nature of the genetic basis of this disorder remains illusive. Based on the idea that highly correlated genes are functionally interrelated and more likely to affect risk, we develop a novel statistical tool to find more potentially autism risk genes by combining the genetic association scores with gene co-expression in specific brain regions and periods of development. The gene dependence network is estimated using a novel partial neighborhood selection (PNS) algorithm, where node specific properties are incorporated into network estimation for improved statistical and computational efficiency. Then we adopt a hidden Markov random field (HMRF) model to combine the estimated network and the genetic association scores in a systematic manner. The proposed modeling framework can be naturally extended to incorporate additional structural information concerning the dependence between genes. Using currently available genetic association data from whole exome sequencing studies and brain gene expression levels, the proposed algorithm successfully identified 333 genes that plausibly affect autism risk.
Methodology,Applications
What problem does this paper attempt to address?
The problem this paper attempts to address is the genetic basis of Autism Spectrum Disorder (ASD). Although research indicates that autism is highly heritable, its specific genetic mechanisms remain unclear. The paper points out that current research methods face difficulties in identifying genes associated with autism risk, mainly due to: 1. **Weak genetic signals**: The genetic signals associated with autism are usually weak and distributed across a large number of genes. 2. **High-dimensional networks**: These genetic signals aggregate in gene networks, but the networks themselves are very complex and high-dimensional. To overcome these challenges, the authors developed a new statistical tool that combines gene association scores with gene co-expression data from specific brain regions and developmental periods to identify more potential autism risk genes. Specifically, the method includes the following key steps: - **Gene dependency network estimation**: Using a new Partial Neighborhood Selection (PNS) algorithm to estimate the gene dependency network, which can incorporate node-specific information to improve statistical and computational efficiency. - **Hidden Markov Random Field (HMRF) model**: Systematically combining the estimated network and gene association scores to discover potential risk genes. - **Extended model**: This model can naturally extend to incorporate more structural information about the dependencies between genes. Using this method, the authors successfully identified 333 genes that may influence autism risk. Overall, the paper aims to advance the discovery of autism risk genes by integrating multiple biological data sources and powerful statistical tests.