DISCO: An End-to-End Bandit Framework for Personalised Discount Allocation

Jason Shuo Zhang,Benjamin Howson,Panayiota Savva,Eleanor Loh
2024-06-13
Abstract:Personalised discount codes provide a powerful mechanism for managing customer relationships and operational spend in e-commerce. Bandits are well suited for this product area, given the partial information nature of the problem, as well as the need for adaptation to the changing business environment. Here, we introduce DISCO, an end-to-end contextual bandit framework for personalised discount code allocation at ASOS. DISCO adapts the traditional Thompson Sampling algorithm by integrating it within an integer program, thereby allowing for operational cost control. Because bandit learning is often worse with high dimensional actions, we focused on building low dimensional action and context representations that were nonetheless capable of good accuracy. Additionally, we sought to build a model that preserved the relationship between price and sales, in which customers increasing their purchasing in response to lower prices ("negative price elasticity"). These aims were achieved by using radial basis functions to represent the continuous (i.e. infinite armed) action space, in combination with context embeddings extracted from a neural network. These feature representations were used within a Thompson Sampling framework to facilitate exploration, and further integrated with an integer program to allocate discount codes across ASOS's customer base. These modelling decisions result in a reward model that (a) enables pooled learning across similar actions, (b) is highly accurate, including in extrapolation, and (c) preserves the expected negative price elasticity. Through offline analysis, we show that DISCO is able to effectively enact exploration and improves its performance over time, despite the global constraint. Finally, we subjected DISCO to a rigorous online A/B test, and find that it achieves a significant improvement of >1% in average basket value, relative to the legacy systems.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
This paper introduces an end-to-end framework called DISCO for personalized discount allocation, specifically targeting customer relationship management and operation cost control in e-commerce. In this problem, due to the characteristics of partial information and the need to adapt to a constantly changing business environment, bandit algorithms are very suitable. DISCO integrates the traditional Thompson sampling algorithm into integer programming, allowing for control of operational costs and solving the problem of reduced learning efficiency caused by high-dimensional actions. DISCO uses radial basis functions to represent the continuous action space and combines context embeddings extracted by neural networks to achieve low-dimensional but highly expressive feature representation. This approach aims to preserve neighboring information between actions while maintaining prediction accuracy, including when extrapolating. In addition, the model design preserves the traditional negative price elasticity relationship between price and sales, which is a key indicator for evaluating the effectiveness of pricing models. The paper demonstrates through offline analysis that DISCO is able to effectively explore and improve performance over time despite global constraints. Online A/B testing shows that DISCO achieves over 1% significant improvement in average shopping cart value compared to traditional systems. Keywords include personalized discount codes, bandit algorithms, reinforcement learning, retail science, pricing, and Thompson sampling.