Benchmarking for Deep Uplift Modeling in Online Marketing

Dugang Liu,Xing Tang,Yang Qiao,Miao Liu,Zexu Sun,Xiuqiang He,Zhong Ming
2024-06-01
Abstract:Online marketing is critical for many industrial platforms and business applications, aiming to increase user engagement and platform revenue by identifying corresponding delivery-sensitive groups for specific incentives, such as coupons and bonuses. As the scale and complexity of features in industrial scenarios increase, deep uplift modeling (DUM) as a promising technique has attracted increased research from academia and industry, resulting in various predictive models. However, current DUM still lacks some standardized benchmarks and unified evaluation protocols, which limit the reproducibility of experimental results in existing studies and the practical value and potential impact in this direction. In this paper, we provide an open benchmark for DUM and present comparison results of existing models in a reproducible and uniform manner. To this end, we conduct extensive experiments on two representative industrial datasets with different preprocessing settings to re-evaluate 13 existing models. Surprisingly, our experimental results show that the most recent work differs less than expected from traditional work in many cases. In addition, our experiments also reveal the limitations of DUM in generalization, especially for different preprocessing and test distributions. Our benchmarking work allows researchers to evaluate the performance of new models quickly but also reasonably demonstrates fair comparison results with existing models. It also gives practitioners valuable insights into often overlooked considerations when deploying DUM. We will make this benchmarking library, evaluation protocol, and experimental setup available on GitHub.
Machine Learning
What problem does this paper attempt to address?
This paper focuses on the benchmark testing of Deep Uplift Modeling (DUM) in online marketing. The current research lacks standardized benchmarks and unified evaluation protocols, which limits the reproducibility and practical value of experimental results. The paper provides an open DUM benchmark and compares existing models in a reproducible and unified manner. The researchers conducted extensive experiments on two representative industrial datasets, re-evaluating 13 existing models, and found that recent work differs from traditional work in many cases, but not as much as expected. Additionally, the experiments revealed limitations of DUM in terms of generalization, especially for different preprocessing and test distributions. This benchmark work allows researchers to quickly evaluate the performance of new models and fairly compare them with existing models, while also providing valuable insights for practitioners often overlooked when deploying DUM. The paper will make the benchmark repository, evaluation protocols, and experimental settings publicly available on GitHub.