Abstract:Individual Treatment Effect (ITE) prediction is an important area of research in machine learning which aims at explaining and estimating the causal impact of an action at the granular level. It represents a problem of growing interest in multiple sectors of application such as healthcare, online advertising or socioeconomics. To foster research on this topic we release a publicly available collection of 13.9 million samples collected from several randomized control trials, scaling up previously available datasets by a healthy 210x factor. We provide details on the data collection and perform sanity checks to validate the use of this data for causal inference tasks. First, we formalize the task of uplift modeling (UM) that can be performed with this data, along with the relevant evaluation metrics. Then, we propose synthetic response surfaces and heterogeneous treatment assignment providing a general set-up for ITE prediction. Finally, we report experiments to validate key characteristics of the dataset leveraging its size to evaluate and compare - with high statistical significance - a selection of baseline UM and ITE prediction methods.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to provide a large - scale, high - quality data set for Individual Treatment Effect (ITE) prediction and Uplift Modeling (UM). Specifically, the paper introduces a data set of 13.9 million samples collected by the online advertising company Criteo. These data are from multiple randomized controlled trials and are 210 times larger than the existing data sets. The paper uses this data set to promote research in the fields of ITE prediction and UM, provides details of data collection, and conducts verification checks to ensure that the data is suitable for causal inference tasks. ### Main Contributions 1. **Large - scale Real - world Data Set**: This data set provides a two - order - of - magnitude increase for UM. Compared with the existing benchmark data sets, it also presents a more complex setting in terms of covariate dimensions. Some features have thousands of possible values, which better represents the problems in modern Web applications. 2. **Realistic ITE Prediction Benchmark**: For ITE prediction, this data set provides a four - order - of - magnitude increase and proposes additional response surfaces that conform to the actual data patterns, enriching the diversity of the existing benchmarks. ### Data Set Characteristics - **Sample Size**: It contains 13.9 million samples. - **Feature Types**: It includes continuous features, binary features, and multi - modal features. - **Treatment Imbalance**: Only a small portion of users are assigned to the control group, and the overall positive outcome rate is low. - **Anonymization**: Feature values are hashed to protect user privacy and company assets. ### Experimental Verification The paper also reports a series of experiments to verify the key characteristics of the data set, including: - **Treatment Independence**: The independence between treatment variables and feature variables is verified through the Classifier Two - Sample Test (C2ST). - **Feature Informativeness**: The effectiveness of the recorded features for predicting results is verified by training classifiers to predict the results of visits and conversions. ### Conclusion The release of this data set aims to promote research in the field of causal inference, especially ITE prediction and UM. By providing a large - scale, high - quality data set, researchers can better evaluate and compare different methods, thus promoting the further development of this field.

A Large Scale Benchmark for Individual Treatment Effect Prediction and Uplift Modeling

Estimating individual treatment effect: generalization bounds and algorithms

Optimising Individual-Treatment-Effect Using Bandits

Uncovering individualised treatment effects for educational trials

Multiple Instance Learning for Uplift Modeling

Uplift Modeling with Multiple Treatments and General Response Types

Modeling Item-Level Heterogeneous Treatment Effects With the Explanatory Item Response Model: Leveraging Large-Scale Online Assessments to Pinpoint the Impact of Educational Interventions

Deep Representation Learning for Individualized Treatment Effect Estimation Using Electronic Health Records.

Imbalance-Aware Uplift Modeling for Observational Data

Uplift Modeling for Multiple Treatments with Cost Optimization

Improving uplift model evaluation on RCT data

On Learning Disentangled Representations for Individual Treatment Effect Estimation.

DESCN: Deep Entire Space Cross Networks for Individual Treatment Effect Estimation

Optimal Statistical Inference for Individualized Treatment Effects in High-dimensional Models

Transfer Learning for Individual Treatment Effect Estimation

About evaluation metrics for contextual uplift modeling

Estimating individualized treatment effects using an individual participant data meta-analysis

Emulate randomized clinical trials using heterogeneous treatment effect estimation for personalized treatments: Methodology review and benchmark

A Perspective on Individualized Treatment Effects Estimation from Time-series Health Data

A Unified Survey of Treatment Effect Heterogeneity Modelling and Uplift Modelling

Heterogeneous Treatment Effect Estimation using machine learning for Healthcare application: tutorial and benchmark