Overlapping Probabilities of Top Ranking Gene Lists, Hypergeometric Distribution, and Stringency of Gene Selection Criterion

Wen Fury,Franak Batliwalla,Peter K. Gregersen,Wentian Li
DOI: https://doi.org/10.1109/IEMBS.2006.260828
2006-06-15
Abstract:When the same set of genes appear in two top ranking gene lists in two different studies, it is often of interest to estimate the probability for this being a chance event. This overlapping probability is well known to follow the hypergeometric distribution. Usually, the lengths of top-ranking gene lists are assumed to be fixed, by using a pre-set criterion on, e.g., $p$-value for the t-test. We investigate how overlapping probability changes with the gene selection criterion, or simply, with the length of the top-ranking gene lists. It is concluded that overlapping probability is indeed a function of the gene list length, and its statistical significance should be quoted in the context of gene selection criterion.
Quantitative Methods
What problem does this paper attempt to address?