Two-Sample Inference in Highly Dispersed Negative Binomial Models

David Shilane,Derek Bean
DOI: https://doi.org/10.48550/arXiv.1203.0812
2012-03-05
Methodology
Abstract:Two-sample inference for the difference of population means typically relies upon a Central Limit Theorem approximation. When data are drawn from a Negative Binomial distribution, previous work of Shilane et al. (2010) showed that a Normal approximation is often unreliable in one-sample inference and proposed alternative techniques. We seek to extend these methods to the problem of two-sample inference on the difference of sample means in highly dispersed Negative Binomial models. We demonstrate that the Normal approximation is considerably more robust in the two-sample setting and may often be applied even when it is not appropriate for either sample individually. We also provide an intuitive extension of Bernstein's Inequality to the two-sample case. A simple mixture of these two methods may improve coverage in borderline and small sample settings. We subsequently investigate the coverage quality of confidence intervals and sample size considerations in a wide variety of simulation studies. Overall, we demonstrate that the Normal approximation is considerably more robust in approximating the distribution of the mean difference than the corresponding one-sample theory would suggest.
What problem does this paper attempt to address?