A Large Scale Benchmark for Individual Treatment Effect Prediction and Uplift Modeling

Eustache Diemert,Artem Betlei,Christophe Renaudin,Massih-Reza Amini,Théophane Gregoir,Thibaud Rahier
DOI: https://doi.org/10.48550/arXiv.2111.10106
2021-11-19
Abstract:Individual Treatment Effect (ITE) prediction is an important area of research in machine learning which aims at explaining and estimating the causal impact of an action at the granular level. It represents a problem of growing interest in multiple sectors of application such as healthcare, online advertising or socioeconomics. To foster research on this topic we release a publicly available collection of 13.9 million samples collected from several randomized control trials, scaling up previously available datasets by a healthy 210x factor. We provide details on the data collection and perform sanity checks to validate the use of this data for causal inference tasks. First, we formalize the task of uplift modeling (UM) that can be performed with this data, along with the relevant evaluation metrics. Then, we propose synthetic response surfaces and heterogeneous treatment assignment providing a general set-up for ITE prediction. Finally, we report experiments to validate key characteristics of the dataset leveraging its size to evaluate and compare - with high statistical significance - a selection of baseline UM and ITE prediction methods.
Machine Learning,Artificial Intelligence,Applications
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to provide a large - scale, high - quality data set for Individual Treatment Effect (ITE) prediction and Uplift Modeling (UM). Specifically, the paper introduces a data set of 13.9 million samples collected by the online advertising company Criteo. These data are from multiple randomized controlled trials and are 210 times larger than the existing data sets. The paper uses this data set to promote research in the fields of ITE prediction and UM, provides details of data collection, and conducts verification checks to ensure that the data is suitable for causal inference tasks. ### Main Contributions 1. **Large - scale Real - world Data Set**: This data set provides a two - order - of - magnitude increase for UM. Compared with the existing benchmark data sets, it also presents a more complex setting in terms of covariate dimensions. Some features have thousands of possible values, which better represents the problems in modern Web applications. 2. **Realistic ITE Prediction Benchmark**: For ITE prediction, this data set provides a four - order - of - magnitude increase and proposes additional response surfaces that conform to the actual data patterns, enriching the diversity of the existing benchmarks. ### Data Set Characteristics - **Sample Size**: It contains 13.9 million samples. - **Feature Types**: It includes continuous features, binary features, and multi - modal features. - **Treatment Imbalance**: Only a small portion of users are assigned to the control group, and the overall positive outcome rate is low. - **Anonymization**: Feature values are hashed to protect user privacy and company assets. ### Experimental Verification The paper also reports a series of experiments to verify the key characteristics of the data set, including: - **Treatment Independence**: The independence between treatment variables and feature variables is verified through the Classifier Two - Sample Test (C2ST). - **Feature Informativeness**: The effectiveness of the recorded features for predicting results is verified by training classifiers to predict the results of visits and conversions. ### Conclusion The release of this data set aims to promote research in the field of causal inference, especially ITE prediction and UM. By providing a large - scale, high - quality data set, researchers can better evaluate and compare different methods, thus promoting the further development of this field.