Matrix Completion Methods for Causal Panel Data Models

Susan Athey,Mohsen Bayati,Nikolay Doudchenko,Guido Imbens,Khashayar Khosravi

DOI: https://doi.org/10.1080/01621459.2021.1891924

2022-04-22

Abstract:In this paper we study methods for estimating causal effects in settings with panel data, where some units are exposed to a treatment during some periods and the goal is estimating counterfactual (untreated) outcomes for the treated unit/period combinations. We propose a class of matrix completion estimators that uses the observed elements of the matrix of control outcomes corresponding to untreated unit/periods to impute the "missing" elements of the control outcome matrix, corresponding to treated units/periods. This leads to a matrix that well-approximates the original (incomplete) matrix, but has lower complexity according to the nuclear norm for matrices. We generalize results from the matrix completion literature by allowing the patterns of missing data to have a time series dependency structure that is common in social science applications. We present novel insights concerning the connections between the matrix completion literature, the literature on interactive fixed effects models and the literatures on program evaluation under unconfoundedness and synthetic control methods. We show that all these estimators can be viewed as focusing on the same objective function. They differ solely in the way they deal with identification, in some cases solely through regularization (our proposed nuclear norm matrix completion estimator) and in other cases primarily through imposing hard restrictions (the unconfoundedness and synthetic control approaches). The proposed method outperforms unconfoundedness-based or synthetic control estimators in simulations based on real data.

Statistics Theory,Econometrics

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the problem of estimating causal effects in the panel data setting, where some units are exposed to treatment conditions in certain periods, and the goal is to estimate the counterfactual (untreated) outcomes of these treated units/periods. Specifically, the paper proposes a class of matrix completion estimators, which use the observed elements in the control outcome matrices of the untreated units/periods to impute the "missing" control outcome matrix elements of the treated units/periods. This method aims to produce a matrix that can well approximate the original (incomplete) matrix while having lower complexity in terms of the nuclear norm of the matrix. The main contributions of the paper include: 1. Generalizing the results in the matrix completion literature, allowing the missing data pattern to have a time - series dependent structure, which is common in social science applications. 2. Demonstrating the connections among the matrix completion literature, the interactive fixed - effects model literature, and the unconfoundedness and synthetic control method literature, pointing out that all these estimators can be regarded as focusing on the same objective function, and their differences only lie in the ways of handling identification. 3. Proposing a matrix completion estimator based on nuclear norm minimization, which outperforms estimators based on unconfoundedness or synthetic control in simulations based on real data. By introducing new methods, the paper aims to improve the accuracy and reliability of estimating causal effects in panel data, especially in cases where the treatment pattern may change over time.

Matrix Completion Methods for Causal Panel Data Models

Doubly Robust Identification for Causal Panel Data Models

Causal Inference With Noisy And Missing Covariates Via Matrix Factorization

A Practical Guide to Counterfactual Estimators for Causal Inference with Time-Series Cross-Sectional Data

Matrix Completion with Covariate Information

Causal Inference for Comprehensive Cohort Studies

Causal models for longitudinal and panel data: a survey

Estimating Counterfactual Matrix Means with Short Panel Data

What can the millions of random treatments in nonexperimental data reveal about causes?

Retrospective causal inference via matrix completion, with an evaluation of the effect of European integration on cross-border employment

Program Evaluation and Causal Inference with High-Dimensional Data

Causal inference for community-based multi-layered intervention study.

Causal Effects for Time-Varying Treatments and Outcomes 2 Composite Causal Effects for Time-Varying Treatments and Time-Varying Outcomes

A Cross-Moment Approach for Causal Effect Estimation

Statistical modeling of causal effects in continuous time

Marginal Structural Models and Causal Inference in Epidemiology

Compositional Models for Estimating Causal Effects

Proxy Controls and Panel Data

Causal Inference for a Hidden Treatment

Probably approximately correct high-dimensional causal effect estimation given a valid adjustment set