SLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures

Hong-Qiang Wang,Lindsey K Tuominen,Chung-Jui Tsai
DOI: https://doi.org/10.1093/bioinformatics/btq650
IF: 5.8
2011-01-15
Bioinformatics
Abstract:Motivation: The pre-estimate of the proportion of null hypotheses (π(0)) plays a critical role in controlling false discovery rate (FDR) in multiple hypothesis testing. However, hidden complex dependence structures of many genomics datasets distort the distribution of p-values, rendering existing π(0) estimators less effective. Results: From the basic non-linear model of the q-value method, we developed a simple linear algorithm to probe local dependence blocks. We uncovered a non-static relationship between tests' p-values and their corresponding q-values that is influenced by data structure and π(0). Using an optimization framework, these findings were exploited to devise a Sliding Linear Model (SLIM) to more reliably estimate π(0) under dependence. When tested on a number of simulation datasets with varying data dependence structures and on microarray data, SLIM was found to be robust in estimating π(0) against dependence. The accuracy of its π(0) estimation suggests that SLIM can be used as a stand-alone tool for prediction of significant tests. Availability: The R code of the proposed method is available at http://aspendb.uga.edu/downloads for academic use.
What problem does this paper attempt to address?