Abstract:Estimating causal effects under interference is pertinent to many real-world settings. Recent work with low-order potential outcomes models uses a rollout design to obtain unbiased estimators that require no interference network information. However, the required extrapolation can lead to prohibitively high variance. To address this, we propose a two-stage experiment that selects a sub-population in the first stage and restricts treatment rollout to this sub-population in the second stage. We explore the role of clustering in the first stage by analyzing the bias and variance of a polynomial interpolation-style estimator under this experimental design. Bias increases with the number of edges cut in the clustering of the interference network, but variance depends on qualities of the clustering that relate to homophily and covariate balance. There is a tension between clustering objectives that minimize the number of cut edges versus those that maximize covariate balance across clusters. Through simulations, we explore a bias-variance trade-off and compare the performance of the estimator under different clustering strategies.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the high - variance problem encountered in causal inference under low - order interference. Specifically, when using the polynomial interpolation estimator to estimate the Total Treatment Effect (TTE), extrapolation may lead to excessive variance. To solve this problem, the author proposes a two - stage experimental design method and explores the role of clustering in the first stage. #### Background and Motivation 1. **Limitations of the Stable Unit Treatment Value Assumption (SUTVA)**: - In many classical causal inference methods, the Stable Unit Treatment Value Assumption (SUTVA) is a key assumption, but this assumption is violated in the presence of interference. Interference means that an individual's outcome may be affected by the treatment assignments of other individuals. - Ignoring interference may lead to incorrect estimation of the causal effect and thus inaccurate conclusions. For example, when evaluating the public health intervention effect of vaccines, the herd immunity effect will affect the results; when evaluating the effect of new features on social media platforms, user participation will also be affected by the behaviors of other users in their social networks. 2. **Limitations of Existing Methods**: - Many existing causal inference methods rely on simplified assumptions of the potential outcome model, such as linear or generalized linear models, which will introduce the anonymous interference assumption, that is, only the number of treatment units rather than their identities affect individual outcomes. - These methods perform poorly in dealing with complex interference networks, especially when the interference effect is limited to a small part of the population (β - order interactions), the variance of the polynomial interpolation estimator will become too high. #### Proposed Method To reduce variance and improve estimation accuracy, the author proposes the following methods: 1. **Two - stage experimental design**: - First stage: Select a subset \( U \) from the population, with a size proportion of \( \frac{p}{q} \). - Second stage: Conduct a staggered rollout experiment on this subset \( U \), while other individuals remain untreated. - In this way, a higher effective treatment budget \( q \) can be obtained on a smaller subset, thereby reducing the variance of the polynomial interpolation estimator. 2. **Role of clustering**: - Explore the impact of using clustering techniques in the first stage on estimator bias and variance. Clustering can reduce the number of cut edges, but at the same time will affect the covariate balance within the cluster. Therefore, there is a bias - variance trade - off. #### Main Contributions 1. **Reduced the variance of the polynomial interpolation estimator**: Through the two - stage design, especially for larger \( \beta \) values (i.e., more complex models), the variance is significantly reduced. 2. **No need for network information**: Unlike most existing methods, this method does not need to know the specific structure of the underlying interference network, but in the case of graph knowledge, choosing a good clustering can further improve the estimator performance. 3. **Theoretical analysis and experimental proof**: Through theoretical derivation and simulation experiments, the effectiveness of the two - stage design is proved, and the impact of different clustering strategies on estimator performance is shown. In conclusion, this paper effectively solves the high - variance problem in causal inference under low - order interference by introducing two - stage experimental design and clustering techniques, and provides a more accurate method for estimating the Total Treatment Effect.

Combining Rollout Designs and Clustering for Causal Inference under Low-order Interference

Low-order outcomes and clustered designs: combining design and analysis for causal inference under network interference

Causal clustering: design of cluster experiments under network interference

Causal inference for interfering units with cluster and population level treatment allocation programs

Cascade-based Randomization for Inferring Causal Effects under Diffusion Interference

Causal Inference under Network Interference Using a Mixture of Randomized Experiments

A Graph-Theoretic Approach to Randomization Tests of Causal Effects Under General Interference

Nonparametric Causal Survival Analysis with Clustered Interference

Independent-Set Design of Experiments for Estimating Treatment and Spillover Effects under Network Interference

Cluster Randomized Designs for One-Sided Bipartite Experiments

Estimation of Causal Effects Under K-Nearest Neighbors Interference

A systematic investigation of classical causal inference strategies under mis-specification due to network interference

Inference for Two-stage Experiments under Covariate-Adaptive Randomization

Causal Inference under Interference and Model Uncertainty

Efficient Nonparametric Estimation of Stochastic Policy Effects with Clustered Interference

Exploiting Neighborhood Interference with Low Order Interactions under Unit Randomized Design

Model-Based Inference and Experimental Design for Interference Using Partial Network Data

Design-Based Inference for Spatial Experiments under Unknown Interference

Bipartite Causal Inference with Interference

Causal effect estimation under network interference with mean-field methods

Estimating Causal Effects Under Interference Using Bayesian Generalized Propensity Scores