Ecological Regression with Partial Identification

Wenxin Jiang,Gary King,Allen Schmaltz,Martin A. Tanner
DOI: https://doi.org/10.1017/pan.2019.19
2018-04-24
Abstract:Ecological inference (EI) is the process of learning about individual behavior from aggregate data. We study a partially identified linear contextual effects model for EI and describe how to estimate the district level parameter averaging over many precincts in the presence of the non-identified parameter of the contextual effect. This may be regarded as a first attempt in this venerable literature to limit the scope of the key form of non-identifiability in EI. To study the operating characteristics of our model, we have amassed the largest collection of data with known ground truth ever applied to evaluate solutions to the EI problem. We collect and study 459 datasets from a variety of fields including public health, political science, and sociology. The datasets contain a total of 2,370,854 geographic units (e.g., precincts), with an average of 5,165 geographic units per dataset. Our replication data are publicly available via the Harvard Dataverse (Jiang et al. 2018) and may serve as a useful resource for future researchers. For all real data sets in our collection that fit our proposed rules, our approach reduces the width of the Duncan and Davis (1953) deterministic bound, on average, by about 45\%, while still capturing the true district level parameter in excess of 97\% of the time. .
Applications
What problem does this paper attempt to address?