Variance estimation for logistic regression in case-cohort studies

Hisashi Noma
Abstract:The logistic regression analysis proposed by Schouten et al. (Stat Med. 1993;12:1733-1745) has been a standard method in current statistical analysis of case-cohort studies, and it enables effective estimation of risk ratio from selected subsamples. Schouten et al. (1993) also proposed the standard error estimate of the risk ratio estimator can be calculated by the robust variance estimator. In this article, however, we show that the robust variance estimator does not account for the duplications of case and subcohort samples and generally has certain bias, i.e., inaccurate confidence intervals and P-values are possibly obtained. To address the invalid statistical inference problem, we provide an alternative bootstrap-based valid variance estimator. Through simulation studies, the bootstrap method consistently provided more precise confidence intervals compared with those provided by the robust variance method, while retaining adequate coverage probabilities. The conventional robust variance estimator has certain bias, and inadequate conclusions might be deduced. The bootstrap method would be an alternative effective approach in practice to provide accurate evidence.
What problem does this paper attempt to address?
This paper attempts to solve the problem of bias caused by the failure of traditional robust variance estimators to fully consider the repetition of case and sub - cohort samples when using logistic regression analysis in case - cohort studies. Specifically, although the robust variance estimation method proposed by Schouten et al. has been widely adopted, it ignores sample repetition, which may lead to inaccurate confidence intervals and P - values. To solve this problem, the author proposes a variance estimation method based on Bootstrap and verifies its validity and accuracy through simulation studies. ### Background of the paper - **Background**: Logistic regression analysis, proposed by Schouten et al., has become one of the standard methods for handling case - cohort studies in current statistical analysis. This method can effectively estimate the risk ratio from selected sub - samples and adjust for potential confounding factors. Schouten et al. also proposed that the standard error can be calculated by a robust variance estimator. - **Problem**: However, traditional robust variance estimators do not take into account the repetition between case and sub - cohort samples, resulting in potentially inaccurate confidence intervals and P - values. ### Methods and results - **Methods**: The author points out that traditional robust variance estimators do not fully consider sample repetition and may therefore be biased. To provide more accurate statistical inferences, the author proposes a variance estimation method based on Bootstrap. - **Results**: Through simulation studies, the Bootstrap method provides more precise confidence intervals while maintaining appropriate coverage. ### Conclusions - **Conclusions**: Traditional robust variance estimators are biased and may lead to inaccurate statistical inferences. The proposed Bootstrap variance estimation method can provide more accurate and precise interval estimates, and it is recommended to use the Bootstrap method in practice to provide accurate evidence. ### Keywords - Case - cohort design - Logistic regression - Risk ratio - Bias - Bootstrap ### Formulas - Logistic regression model: \[ \logit\{\Pr(D = 1)\} = \beta_0+\beta_1x_1+\cdots+\beta_px_p \] where \(D\) is an indicator variable, which is 1 if the participant is in the case sample, and 0 otherwise; \(x_1,\ldots,x_p\) are explanatory variables. - Robust variance estimator: \[ \widehat{\text{SE}}_{\text{robust}}=\sqrt{(\mathbf{X}^T\mathbf{W}\mathbf{X})^{-1}\mathbf{X}^T\mathbf{S}\mathbf{X}(\mathbf{X}^T\mathbf{W}\mathbf{X})^{-1}} \] where \(\mathbf{X}\) is the design matrix, \(\mathbf{W}\) is the weight matrix, and \(\mathbf{S}\) is the residual sum of squares matrix. - Bootstrap variance estimator: \[ \widehat{\text{SE}}_{\text{bootstrap}}=\sqrt{\frac{1}{B - 1}\sum_{b = 1}^B(\hat{\beta}_b-\bar{\beta})^2} \] where \(B\) is the number of Bootstrap resampling times, \(\hat{\beta}_b\) is the regression coefficient estimate of the \(b\) - th Bootstrap sample, and \(\bar{\beta}\) is the average of all Bootstrap sample regression coefficient estimates. Through these methods and results, the paper demonstrates the advantages of the Bootstrap method in handling case - cohort studies, especially in providing more accurate confidence intervals.