Abstract:Factorial designs are widely used due to their ability to accommodate multiple factors simultaneously. The factor-based regression with main effects and some interactions is the dominant strategy for downstream data analysis, delivering point estimators and standard errors via one single regression. Justification of these convenient estimators from the design-based perspective requires quantifying their sampling properties under the assignment mechanism conditioning on the potential outcomes. To this end, we derive the sampling properties of the factor-based regression estimators from both saturated and unsaturated models, and demonstrate the appropriateness of the robust standard errors for the Wald-type inference. We then quantify the bias-variance trade-off between the saturated and unsaturated models from the design-based perspective, and establish a novel design-based Gauss--Markov theorem that ensures the latter's gain in efficiency when the nuisance effects omitted indeed do not exist. As a byproduct of the process, we unify the definitions of factorial effects in various literatures and propose a location-shift strategy for their direct estimation from factor-based regressions. Our theory and simulation suggest using factor-based inference for general factorial effects, preferably with parsimonious specifications in accordance with the prior knowledge of zero nuisance effects.
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on causal inference in factorial - design experiments. Specifically, the paper is concerned with how to estimate causal effects through regression methods in factorial - design experiments, and pays special attention to the statistical properties of these estimators based on the design. The main contributions of the paper can be summarized as follows:
1. **Clarification of Causal Explanation**:
- The paper clarifies the causal interpretation of coefficients in factor - based linear regression and proposes a location - shift strategy to reproduce design - based inferences of various factor effects.
- The author proves that the robust covariance matrix provides an asymptotically conservative estimate of the true sampling covariance based on the design, thus providing a theoretical basis for its use in large - sample Wald - type inferences.
2. **Unified Definition of Factor Effects**:
- The paper reviews and clarifies the standard definitions of factor effects in the literature of causal inference, experimental design, epidemiology, and social sciences, and extends them to allow for arbitrary weighting schemes to adapt to external validity problems.
3. **Design - Based Properties of Unsaturated Regression**:
- For the first time, the paper derives the design - based properties of unsaturated factor regression estimators and quantifies the bias - variance trade - off between saturated and unsaturated regression.
### Specific Problem Analysis
#### 1. Background of Factorial - Design Experiments
Factorial - design experiments are widely used in fields such as social sciences, agriculture, industry, and biomedicine because they can accommodate multiple factors simultaneously. Factor - based regression analysis is the main strategy for downstream analysis and can provide point estimates and standard errors simultaneously through a single least - squares fit.
#### 2. Research Objectives
The main objectives of the paper are:
- To clarify the causal interpretation of coefficients in factor - based linear regression.
- To propose a location - shift strategy to reproduce design - based inferences of various factor effects.
- To derive the design - based properties of unsaturated factor regression estimators and quantify their bias - variance trade - off.
#### 3. Main Contributions
- **Clarification of Causal Explanation**: Through the location - shift strategy, the author shows how to reproduce the least - squares inferences of various factor effects based on the design. They prove that the robust covariance matrix provides an asymptotically conservative estimate of the true sampling covariance based on the design, thus providing a theoretical basis for its use in large - sample Wald - type inferences.
- **Unified Definition of Factor Effects**: The author reviews and clarifies the standard definitions of factor effects in different fields and extends them to allow for arbitrary weighting schemes to adapt to external validity problems.
- **Design - Based Properties of Unsaturated Regression**: The author derives for the first time the design - based properties of unsaturated factor regression estimators and quantifies their bias - variance trade - off.
### Mathematical Symbols and Formulas
- **Potential Outcome Framework**: Let \( Y_i(z) \) be the potential outcome when unit \( i \) is assigned to treatment level \( z \), and \( \bar{Y}(z)=N^{-1}\sum_{i = 1}^N Y_i(z) \) be the average potential outcome.
- **Finite Population Covariance Matrix**: Let \( S=(S(z,z'))_{z,z'\in T} \) be the finite population covariance matrix of potential outcomes, where \( S(z,z')=(N - 1)^{-1}\sum_{i = 1}^N \{Y_i(z)-\bar{Y}(z)\} \{Y_i(z')-\bar{Y}(z')\} \).
- **Contrast Matrix**: Let \( \tau = G\bar{Y} \) be the product of some contrast matrix \( G \) and the average potential outcome \( \bar{Y} \), where the row vectors of \( G \) are orthogonal to \( \mathbf{1}_Q \).
### Conclusion
By clarifying the causal interpretation of coefficients in factor - based regression, proposing the location - shift strategy, and deriving the design - based properties of unsaturated factor regression estimators, the paper provides an important theoretical basis for causal inference in factorial - design experiments.