CLUSTERING OF RARE VARIANTS FOR CAUSAL VARIANTS IDENTIFICATION AND EFFECT DIRECTION CLASSIFICATION

Xianbang Sun,Xue Liu,Chunyu Liu
DOI: https://doi.org/10.1101/2024.02.22.24303151
2024-02-23
Abstract:Several gene-based tests, e.g., sequence kernel association test, have been developed for association testing of rare single nucleotide variants (SNVs) in genomic regions with disease traits. A common limitation of these aggregate methods is their inability to discriminate potentially causal variants from null variants within the tested regions. We propose a novel clustering method to classify rare variants into null and signal variant groups using summary statistics from the gene-based tests based on a Gaussian mixture model (GMM). We classify the signal variants into potentially risk and protective subgroups of different effect sizes. We evaluate the performance of the proposed method by a simulation study, considering several statistics such as the adjusted rand index (ARI), mean square error (MSE), and accuracy in specifying the number of clusters. We apply the proposed clustering method to identify possibly risk and protective rare variants in six genes that are significantly associated with blood pressure (BP) traits in the most recent large genomewide association study (GWAS) and meta-analysis. This proposed method may facilitate the identification of potentially causal rare variant clusters in genomic regions and ultimately help understand the genetic architecture underlying human complex traits for the discovery of drug target and the design of gene therapy.
Genetic and Genomic Medicine
What problem does this paper attempt to address?