Assessing the Lognormal Distribution Assumption For the Crude Odds Ratio: Implications For Point and Interval Estimation

David Douglas Newstein
DOI: https://doi.org/10.21203/rs.3.rs-29245/v1
2020-05-28
Abstract:Abstract Background: The assumption that the sampling distribution of the crude Odds Ratio (ORcrude) is a lognormal distribution with parameters mu and sigma leads to the incorrect conclusion that the expectation of the log of ORcrude is equal to the parameter mu. Here, the standard method of point and interval estimation (I) is compared with a modified method utilizing ORstar where ln(ORstar) = ln(ORcrude )– sigma **2/2. Methods: Confidence intervals are obtained utilizing ln(ORstar) by both parametric bootstrap simulations with a percentile derived confidence interval (II), and a simple calculation done by replacing ln(ORcrude) with ln(ORstar) in the standard formula (III) as well as a method proposed by Barendregt (IV), who also noted the bias present in estimating ORtrue by ORcrude. Simulations are conducted for a “protective” exposure (ORtrue 1). Results: In simulations the estimation methods (II and III) exhibited the highest level of statistical conclusion validity for their confidence intervals as indicated by one minus the coverage probability being close to alpha. Also, as demonstrated by the MC simulations, these two methods exhibited the least biased point estimates and the narrowest confidence intervals of the four estimation approaches. Conclusions: Monte Carlo simulations prove useful in validating the inferential procedures used in data analysis. In the case of the odds ratio, the standard method of point and interval estimation is based on the assumption that the crude odds ratio has a sampling distribution that is lognormal. Utilizing this assumption, as well as the formula for the expectation of this distribution function, an alternative estimation method was obtained for ORtrue (but different from a method from the earlier report (Barendregt)), that yielded point and interval estimates that MC simulations indicate are the most statistically valid.
What problem does this paper attempt to address?