Abstract:Significant research effort has been devoted in recent years to developing personalized pricing, promotions, and product recommendation algorithms that can leverage rich customer data to learn and earn. Systematic benchmarking and evaluation of these causal learning systems remains a critical challenge, due to the lack of suitable datasets and simulation environments. In this work, we propose a multi-stage model for simulating customer shopping behavior that captures important sources of heterogeneity, including price sensitivity and past experiences. We embedded this model into a working simulation environment -- RetailSynth. RetailSynth was carefully calibrated on publicly available grocery data to create realistic synthetic shopping transactions. Multiple pricing policies were implemented within the simulator and analyzed for impact on revenue, category penetration, and customer retention. Applied researchers can use RetailSynth to validate causal demand models for multi-category retail and to incorporate realistic price sensitivity into emerging benchmarking suites for personalized pricing, promotions, and product recommendations.

What problem does this paper attempt to address?

The paper attempts to address the issue of the lack of suitable benchmark datasets and simulation environments in the evaluation of retail AI systems. Specifically, existing public datasets are small in scale, biased, and lack key fields, making it difficult to reliably evaluate complex systems such as personalized pricing, promotions, and product recommendation algorithms. To solve this problem, the authors propose a multi-stage model to simulate customer shopping behavior and embed it into a working simulation environment called RetailSynth. The synthetic shopping transaction data generated by RetailSynth can be used to validate causal demand models, evaluate the impact of different pricing strategies on revenue, category penetration, and customer retention, and provide researchers with a tool to test the robustness of AI systems. ### Main contributions of the paper include: 1. **Multi-stage model**: A multi-stage decision framework covering various stages of the customer lifecycle, including whether to visit the store, selecting the category to purchase, choosing the product to buy, and the quantity to purchase. 2. **Synthetic data generation**: Development of an interpretable multi-stage decision model capable of generating synthetic customer trajectories for a large number of products while maintaining efficient computational performance. 3. **Price sensitivity modeling**: Introduction of heterogeneous price sensitivity for customers and products in the model, making the generated data more consistent with real-world shopping behavior. 4. **Calibration and validation**: Detailed description of how to calibrate the model to public grocery data and comparison of the choice distribution and overall purchasing behavior of synthetic data with real data. 5. **Scenario analysis**: Demonstration of changes in customer demand through the simulation of different pricing strategies and validation of the model's heterogeneous response in different customer segments. ### Background and motivation of the paper: With the development of digital marketing and e-commerce, retailers invest significant resources in developing AI systems for sales promotions, dynamic pricing, product search and recommendation services, and online advertising. However, the reliable evaluation and benchmarking of these systems face many challenges, mainly due to the lack of suitable benchmark datasets and simulation environments. Existing public datasets are usually small in scale, biased, and lack key fields, making it impossible to comprehensively and accurately simulate customer behavior. Additionally, privacy and competition issues also limit the sharing of high-quality data. Therefore, developing a simulation environment capable of generating synthetic data is of great significance for accelerating the evaluation and optimization of retail AI systems.

RetailSynth: Synthetic Data Generation for Retail AI Systems Evaluation

Simulation-Based Benchmarking of Reinforcement Learning Agents for Personalized Retail Promotions

Advancing Retail Data Science: Comprehensive Evaluation of Synthetic Data

Consumer Transactions Simulation through Generative Adversarial Networks

Synthesizing Credit Card Transactions

Simulating Customer Experience and Word Of Mouth in Retail - A Case Study

Towards the Development of a Simulator for Investigating the Impact of People Management Practices on Retail Performance

Data-Driven Analytics for Benchmarking and Optimizing Retail Store Performance

Analytics for an Online Retailer: Demand Forecasting and Price Optimization

A Probabilistic Simulator of Spatial Demand for Product Allocation

A two-sided sales promotions modeling based on agent-based simulation

Sales prediction hybrid models for retails using promotional pricing strategy as a key demand driver

Artificial intelligence-based inventory management for retail supply chain optimization: a case study of customer retention and revenue growth

Generating Diverse Synthetic Datasets for Evaluation of Real-life Recommender Systems

Personalized Product Assortment with Real-time 3D Perception and Bayesian Payoff Estimation

Revolutionizing Retail Analytics: Advancing Inventory and Customer Insight with AI

A Hybrid Statistical-Machine Learning Approach for Analysing Online Customer Behavior: An Empirical Study

Simulation Based Sales Forecasting on Retail Small Stores

Agent-based simulation of pricing strategy for agri-products considering customer preference

Predicting Consumer In-Store Purchase Using Real-Time Retail Video Analytics

Simulating human interactions in supermarkets to measure the risk of COVID-19 contagion at scale