Selection-adjusted inference: an application to confidence intervals for cis-eQTL effect sizes

Snigdha Panigrahi,Junjie Zhu,Chiara Sabatti
DOI: https://doi.org/10.48550/arXiv.1801.08686
2018-06-07
Abstract:The goal of eQTL studies is to identify the genetic variants that influence the expression levels of the genes in an organism. High throughput technology has made such studies possible: in a given tissue sample, it enables us to quantify the expression levels of approximately 20,000 genes and to record the alleles present at millions of genetic polymorphisms. While obtaining this data is relatively cheap once a specimen is at hand, obtaining human tissue remains a costly endeavor. Thus, eQTL studies continue to be based on relatively small sample sizes, with this limitation particularly serious for tissues of most immediate medical relevance. Given the high dimensional nature of this datasets and the large number of hypotheses tested, the scientific community has adopted early on multiplicity adjustment procedures, which primarily control the false discoveries rate for the identification of genetic variants with influence on the expression levels. In contrast, a problem that has not received much attention to date is that of providing estimates of the effect sizes associated to these variants, in a way that accounts for the considerable amount of selection. We illustrate how the recently developed conditional inference approach can be deployed to obtain confidence intervals for the eQTL effect sizes with reliable coverage. The procedure we propose is based on a randomized hierarchical strategy that both reflects the steps typically adopted in state of the art investigations and introduces the use of randomness instead of data splitting to maximize the use of available data. Analysis of the GTEx Liver dataset (v6) suggests that naively obtained confidence intervals would likely not cover the true values of effect sizes and that the number of local genetic polymorphisms influencing the expression level of genes might be underestimated.
Applications
What problem does this paper attempt to address?