Model-based Constrained MDP for Budget Allocation in Sequential Incentive Marketing

Shuai Xiao,Le Guo,Zaifan Jiang,Lei Lv,Yuanbo Chen,Jun Zhu,Shuang Yang
2023-03-02
Abstract:Sequential incentive marketing is an important approach for online businesses to acquire customers, increase loyalty and boost sales. How to effectively allocate the incentives so as to maximize the return (e.g., business objectives) under the budget constraint, however, is less studied in the literature. This problem is technically challenging due to the facts that 1) the allocation strategy has to be learned using historically logged data, which is counterfactual in nature, and 2) both the optimality and feasibility (i.e., that cost cannot exceed budget) needs to be assessed before being deployed to online systems. In this paper, we formulate the problem as a constrained Markov decision process (CMDP). To solve the CMDP problem with logged counterfactual data, we propose an efficient learning algorithm which combines bisection search and model-based planning. First, the CMDP is converted into its dual using Lagrangian relaxation, which is proved to be monotonic with respect to the dual variable. Furthermore, we show that the dual problem can be solved by policy learning, with the optimal dual variable being found efficiently via bisection search (i.e., by taking advantage of the monotonicity). Lastly, we show that model-based planing can be used to effectively accelerate the joint optimization process without retraining the policy for every dual variable. Empirical results on synthetic and real marketing datasets confirm the effectiveness of our methods.
Artificial Intelligence
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve This paper aims to address the problem of how to effectively allocate incentives under budget constraints to maximize returns (e.g., business objectives). Specifically, the paper focuses on Sequential Incentive Marketing, a method commonly used by online businesses to attract customers, increase loyalty, and boost sales by offering incentives such as cash rewards, coupons, etc. ### Technical Challenges of the Problem 1. **Learning from Counterfactual Data**: The allocation strategy needs to learn from historical data, which is inherently counterfactual (i.e., only the feedback of the actually chosen recommendations is observed, while the potential effects of the unchosen recommendations cannot be directly observed). 2. **Optimization and Feasibility Assessment**: Before deploying to an online system, the optimality and feasibility of the strategy need to be evaluated (i.e., the cost cannot exceed the budget). ### Solution To address these challenges, the authors model the problem as a Constrained Markov Decision Process (CMDP). They propose an efficient learning algorithm that combines binary search and model-based planning. The specific steps are as follows: 1. **Lagrangian Dual Problem**: Transform the original CMDP problem into its Lagrangian dual problem by introducing Lagrange multipliers (dual variables) to handle the budget constraints. 2. **Monotonicity Proof**: Prove that the cost of incentive allocation decreases monotonically with the increase of the dual variables, thus allowing the optimal dual variables to be found efficiently through binary search. 3. **Model-Based Planning**: Use model-based planning methods to accelerate the joint optimization process, avoiding the need to retrain the strategy each time. ### Experimental Results The authors conducted experiments on synthetic data and real marketing datasets to validate the effectiveness of the proposed method. The experimental results show that the method can effectively maximize user engagement (e.g., the total number of times users redeem coupons) under budget constraints. ### Main Contributions 1. **Novel Formulation Method**: Propose a new formulation method for the sequential incentive allocation problem, allowing the strategy to be learned and validated from offline recorded data and supporting batch training, which is crucial for deep neural networks and large-scale datasets. 2. **Efficient Algorithm**: Design an efficient algorithm based on theoretical findings, accelerating the learning process through binary search. 3. **Model-Based Planning**: Use model-based planning methods to update the strategy, requiring only one training of the strategy throughout the dual variable search process. ### Related Work - **Batch Learning from Feedback Data**: Most methods use importance sampling estimators to calculate the counterfactual risk of new strategies. - **Counterfactual Policy Evaluation**: In industrial applications, it is necessary to evaluate the safety of new strategies, usually using importance sampling methods for counterfactual policy evaluation. - **Constrained Policy Optimization**: Sequential allocation problems are often modeled as Constrained Markov Decision Processes (CMDP), but existing methods mostly rely on online data collection, which is not suitable for industrial environments. ### Conclusion This paper successfully addresses the problem of sequential incentive allocation under budget constraints by proposing an efficient learning algorithm that combines binary search and model-based planning, providing a new solution for online marketing.