Regression with race-modifiers: towards equity and interpretability

Daniel Kowal
DOI: https://doi.org/10.1101/2024.01.04.23300033
2024-08-08
Abstract:The pervasive effects of structural racism and racial discrimination are well-established and offer strong evidence that the effects of many important variables on health and life outcomes vary by race. Alarmingly, standard practices for statistical regression analysis introduce racial biases into the estimation and presentation of these race-modified effects. We introduce abundance-based constraints (ABCs) to eliminate these racial biases. ABCs offer a remarkable invariance property: estimates and inference for main effects are nearly unchanged by the inclusion of race-modifiers. Thus, quantitative researchers can estimate race-specific effects "for free"--without sacrificing parameter interpretability, equitability, or statistical efficiency. The benefits extend to prominent statistical learning techniques, especially regularization and selection. We leverage these tools to estimate the joint effects of environmental, social, and other factors on 4th end-of-grade readings scores for students in North Carolina (n=27,638) and identify race-modified effects for racial (residential) isolation, PM2.5 exposure, and mother's age at birth.
Epidemiology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the issue of racial bias when introducing race - modifiers in statistical regression analysis. Specifically, the author points out that the standard regression analysis method will introduce racial bias when estimating and presenting race - modified effects, which affects the interpretability and fairness of parameters. To eliminate these biases, the author proposes a method based on Abundance - Based Constraints (ABCs). This method can provide more fair and easily interpretable parameter estimates, and in the case of including race - modifiers, the estimation of main effects is hardly affected. ### Background of the Paper and Problem Description 1. **The Influence of Race on Health and Life Outcomes** - The influence of structural racism and racial discrimination on health and life outcomes is widespread, and the effects of many important variables vary by race. - The standard statistical regression analysis method introduces racial bias when dealing with these race - modified effects, which leads to unfairness in parameter estimation and interpretation. 2. **Limitations of Existing Methods** - **Reference Group Encoding (RGE)** : This is the most commonly used method. A reference group (usually non - Hispanic whites) is selected, and the effects of other groups are estimated relative to the reference group. This method has the following problems: - **Unfairness** : It elevates the status of one racial group, and the effects of other racial groups are interpreted relative to the reference group. - **Unclarity** : It does not clearly indicate that the intercept and some effects are specific to a certain racial group. - **Misleading** : It may lead to misunderstandings of some effects. For example, the effect of racial isolation (RI) may be underestimated or misjudged as insignificant. - **Sum - to - Zero Constraints (STZ)** : Although it solves some of the unfairness problems of RGE, the parameters are difficult to interpret, and the estimator does not have attractive statistical properties. - **Overparameterized Estimation** : It does not use any identification constraints and relies on regularized regression to produce unique estimates, but the parameters are still difficult to interpret. ### Proposed Method 1. **Abundance - Based Constraints (ABCs)** - Use the proportion of racial groups as a constraint condition to ensure the fairness and interpretability of parameter estimates. - The specific constraint form is: \[ \sum_{r} \hat{\pi}_r \beta_r = 0, \quad \sum_{r} \hat{\pi}_r \gamma_{r,j} = 0 \quad \text{for } j = 1, \ldots, p \] where \(\hat{\pi}_r\) is the proportion of racial group \(r\). - **Main Effects** : Under ABCs, each main effect can be expressed as the racial average slope: \[ \alpha_j = \sum_{r} \hat{\pi}_r \mu'_x(r) = \mathbb{E}_{\hat{\pi}} \left\{ \mu'_x(R) \right\} \] - **Race - Modified Effects** : Expressed as the difference between the race - specific slope and the racial average slope: \[ \gamma_{r,j} = \mu'_x(r) - \alpha_j = \mu'_x(r) - \mathbb{E}_{\hat{\pi}} \left\{ \mu'_x(R) \right\} \] 2. **Statistical Properties** - **Estimation Invariance** : Under appropriate conditions, the estimation of main effects is almost the same between the model including race - modifiers and the model only containing main effects. - **Inferential Properties** : The standard error of main effects in the model including race...