Facilitating heterogeneous effect estimation via statistically efficient categorical modifiers

Daniel R. Kowal
2024-08-01
Abstract:Categorical covariates such as race, sex, or group are ubiquitous in regression analysis. While main-only (or ANCOVA) linear models are predominant, cat-modified linear models that include categorical-continuous or categorical-categorical interactions are increasingly important and allow heterogeneous, group-specific effects. However, with standard approaches, the addition of cat-modifiers fundamentally alters the estimates and interpretations of the main effects, often inflates their standard errors, and introduces significant concerns about group (e.g., racial) biases. We advocate an alternative parametrization and estimation scheme using abundance-based constraints (ABCs). ABCs induce a model parametrization that is both interpretable and equitable. Crucially, we show that with ABCs, the addition of cat-modifiers 1) leaves main effect estimates unchanged and 2) enhances their statistical power, under reasonable conditions. Thus, analysts can, and arguably should include cat-modifiers in linear regression models to discover potential heterogeneous effects--without compromising estimation, inference, and interpretability for the main effects. Using simulated data, we verify these invariance properties for estimation and inference and showcase the capabilities of ABCs to increase statistical power. We apply these tools to study demographic heterogeneities among the effects of social and environmental factors on STEM educational outcomes for children in North Carolina. An R package lmabc is available.
Methodology,Statistics Theory,Applications
What problem does this paper attempt to address?
This paper aims to solve the problems faced when using the interaction between categorical variables and continuous or categorical variables (i.e., cat - modifiers) in regression analysis. Specifically, the standard method of introducing cat - modifiers will change the estimated values of the main effects and their interpretations, usually leading to an increase in standard errors and raising significant concerns about group (such as protected groups like race, gender, etc.) bias. The authors propose an alternative parameterization and estimation scheme based on Abundance - Based Constraints (ABCs) to overcome these problems. By using ABCs, the paper demonstrates that under reasonable conditions, adding cat - modifiers not only does not change the estimated values of the main effects but also can improve their statistical power. Therefore, researchers can discover potential heterogeneous effects by including cat - modifiers without compromising the estimation, inference, and interpretability of the main effects. In short, the paper attempts to solve the problem of how to effectively use the interaction between categorical variables and continuous or categorical variables to reveal the heterogeneity in data without sacrificing the quality of the main effect estimation.