Matrix Completion Methods for Causal Panel Data Models

Susan Athey,Mohsen Bayati,Nikolay Doudchenko,Guido Imbens,Khashayar Khosravi
DOI: https://doi.org/10.1080/01621459.2021.1891924
2022-04-22
Abstract:In this paper we study methods for estimating causal effects in settings with panel data, where some units are exposed to a treatment during some periods and the goal is estimating counterfactual (untreated) outcomes for the treated unit/period combinations. We propose a class of matrix completion estimators that uses the observed elements of the matrix of control outcomes corresponding to untreated unit/periods to impute the "missing" elements of the control outcome matrix, corresponding to treated units/periods. This leads to a matrix that well-approximates the original (incomplete) matrix, but has lower complexity according to the nuclear norm for matrices. We generalize results from the matrix completion literature by allowing the patterns of missing data to have a time series dependency structure that is common in social science applications. We present novel insights concerning the connections between the matrix completion literature, the literature on interactive fixed effects models and the literatures on program evaluation under unconfoundedness and synthetic control methods. We show that all these estimators can be viewed as focusing on the same objective function. They differ solely in the way they deal with identification, in some cases solely through regularization (our proposed nuclear norm matrix completion estimator) and in other cases primarily through imposing hard restrictions (the unconfoundedness and synthetic control approaches). The proposed method outperforms unconfoundedness-based or synthetic control estimators in simulations based on real data.
Statistics Theory,Econometrics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the problem of estimating causal effects in the panel data setting, where some units are exposed to treatment conditions in certain periods, and the goal is to estimate the counterfactual (untreated) outcomes of these treated units/periods. Specifically, the paper proposes a class of matrix completion estimators, which use the observed elements in the control outcome matrices of the untreated units/periods to impute the "missing" control outcome matrix elements of the treated units/periods. This method aims to produce a matrix that can well approximate the original (incomplete) matrix while having lower complexity in terms of the nuclear norm of the matrix. The main contributions of the paper include: 1. Generalizing the results in the matrix completion literature, allowing the missing data pattern to have a time - series dependent structure, which is common in social science applications. 2. Demonstrating the connections among the matrix completion literature, the interactive fixed - effects model literature, and the unconfoundedness and synthetic control method literature, pointing out that all these estimators can be regarded as focusing on the same objective function, and their differences only lie in the ways of handling identification. 3. Proposing a matrix completion estimator based on nuclear norm minimization, which outperforms estimators based on unconfoundedness or synthetic control in simulations based on real data. By introducing new methods, the paper aims to improve the accuracy and reliability of estimating causal effects in panel data, especially in cases where the treatment pattern may change over time.