Abstract:Estimating causal effects under interference is pertinent to many real-world settings. Recent work with low-order potential outcomes models uses a rollout design to obtain unbiased estimators that require no interference network information. However, the required extrapolation can lead to prohibitively high variance. To address this, we propose a two-stage experiment that selects a sub-population in the first stage and restricts treatment rollout to this sub-population in the second stage. We explore the role of clustering in the first stage by analyzing the bias and variance of a polynomial interpolation-style estimator under this experimental design. Bias increases with the number of edges cut in the clustering of the interference network, but variance depends on qualities of the clustering that relate to homophily and covariate balance. There is a tension between clustering objectives that minimize the number of cut edges versus those that maximize covariate balance across clusters. Through simulations, we explore a bias-variance trade-off and compare the performance of the estimator under different clustering strategies.
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper aims to solve the high - variance problem encountered in causal inference under low - order interference. Specifically, when using the polynomial interpolation estimator to estimate the Total Treatment Effect (TTE), extrapolation may lead to excessive variance. To solve this problem, the author proposes a two - stage experimental design method and explores the role of clustering in the first stage.
#### Background and Motivation
1. **Limitations of the Stable Unit Treatment Value Assumption (SUTVA)**:
- In many classical causal inference methods, the Stable Unit Treatment Value Assumption (SUTVA) is a key assumption, but this assumption is violated in the presence of interference. Interference means that an individual's outcome may be affected by the treatment assignments of other individuals.
- Ignoring interference may lead to incorrect estimation of the causal effect and thus inaccurate conclusions. For example, when evaluating the public health intervention effect of vaccines, the herd immunity effect will affect the results; when evaluating the effect of new features on social media platforms, user participation will also be affected by the behaviors of other users in their social networks.
2. **Limitations of Existing Methods**:
- Many existing causal inference methods rely on simplified assumptions of the potential outcome model, such as linear or generalized linear models, which will introduce the anonymous interference assumption, that is, only the number of treatment units rather than their identities affect individual outcomes.
- These methods perform poorly in dealing with complex interference networks, especially when the interference effect is limited to a small part of the population (β - order interactions), the variance of the polynomial interpolation estimator will become too high.
#### Proposed Method
To reduce variance and improve estimation accuracy, the author proposes the following methods:
1. **Two - stage experimental design**:
- First stage: Select a subset \( U \) from the population, with a size proportion of \( \frac{p}{q} \).
- Second stage: Conduct a staggered rollout experiment on this subset \( U \), while other individuals remain untreated.
- In this way, a higher effective treatment budget \( q \) can be obtained on a smaller subset, thereby reducing the variance of the polynomial interpolation estimator.
2. **Role of clustering**:
- Explore the impact of using clustering techniques in the first stage on estimator bias and variance. Clustering can reduce the number of cut edges, but at the same time will affect the covariate balance within the cluster. Therefore, there is a bias - variance trade - off.
#### Main Contributions
1. **Reduced the variance of the polynomial interpolation estimator**: Through the two - stage design, especially for larger \( \beta \) values (i.e., more complex models), the variance is significantly reduced.
2. **No need for network information**: Unlike most existing methods, this method does not need to know the specific structure of the underlying interference network, but in the case of graph knowledge, choosing a good clustering can further improve the estimator performance.
3. **Theoretical analysis and experimental proof**: Through theoretical derivation and simulation experiments, the effectiveness of the two - stage design is proved, and the impact of different clustering strategies on estimator performance is shown.
In conclusion, this paper effectively solves the high - variance problem in causal inference under low - order interference by introducing two - stage experimental design and clustering techniques, and provides a more accurate method for estimating the Total Treatment Effect.