Impact of sample size and tissue relevance on T2D gene identification

David Davtian,Theo Dupuis,Dina Mansour Aly,Naeimeh Atabaki Pasdar,Mark Walker,Paul W Franks,Femke Rutters,Hae Kyung W Im,Ewan R Pearson,Martijn van de Bunt,Ana Vinuela,Andrew Brown
DOI: https://doi.org/10.1101/2024.10.31.24316435
2024-11-02
Abstract:Identification of genes and proteins mediating the activity of GWAS variants requires molecular data from disease relevant tissues, but these may be difficult to collect. Using multiple gene expression reference datasets and GWAS summary statistics for T2D we identified 1,818 unique genes associated with T2D. Comparing the performance of different reference datasets, we found that sample size, and not the relevance of the tissue to the disease, was the critical factor in identifying relevant genes. Genes implicated using a well powered expression dataset were also more likely to have multiple lines of genetic evidence. A targeted proteomics reference dataset from plasma samples showed similar power to identify T2D related proteins as gene expression with the same sample size. Accounting for BMI reduces power across all tissues and phenotypes by ~30%, suggesting that many GWAS links to T2D are mediated by BMI, potentially implicating insulin resistance related effects. Finally, using data from smaller GWAS studies with precisely defined T2D subtypes uncovers genes directly relevant to that subtype, such as , an immune response gene for Severe Autoimmune Diabetes and , involved in beta-cell apoptosis, for Severe Insulin Deficient Diabetes. Our work demonstrates the benefits of well powered reference datasets in accessible tissues and well-defined disease subtypes when studying complex diseases involving multiple tissues.
What problem does this paper attempt to address?