Abstract:We consider a seller who repeatedly sells a nondurable product to a single customer whose valuations of the product are drawn from a certain distribution. The seller, who initially does not know the valuation distribution, may use the customer's purchase history to learn, and wishes to choose a pricing policy that maximizes her long‐run revenue. Such a problem is at the core of personalized revenue management where the seller can access each customer's individual purchase history and offer personalized prices. In this paper, we study such a learning problem when the customer is aware of the seller's policy, and thus may behave strategically when making a purchase decision. By using a Bayesian setting with a binary prior, we first show that a popular policy in this setting — the myopic Bayesian policy (MBP) proposed by Harrison et al. (2012) — may lead to incomplete learning of the seller, namely, the seller may never be able to ascertain the true type of the customer and the regret may grow linearly in time. The failure of the MBP is due to the strategic action taken by the customer. To address the strategic behavior of the customers, we first analyze a Stackelberg game under a two‐period model. We derive the optimal policy of the seller in the two‐period model and show that the regret can be significantly reduced by using the optimal policy rather than the myopic policy. However, such game is hard to analyze in general. Nevertheless, based on the idea used in the two‐period model, we propose a randomized Bayesian policy (RBP), which updates the posterior belief of the customer in each period with a certain probability, as well as a deterministic Bayesian policy (DBP), in which the seller updates the posterior belief periodically and always defers her update to the next cycle. For both the RBP and the DBP, we show that the seller can learn the customer type exponentially fast even if the customer is strategic, and the regret is bounded by a constant. We also propose policies that achieve asymptotically optimal regrets when only a finite number of price changes are allowed. This article is protected by copyright. All rights reserved

Airline dynamic pricing with patient customers using deep exploration-based reinforcement learning

Spatial-temporal Pricing for Ride-Sourcing Platform with Reinforcement Learning

Dynamic Pricing for Airline Ancillaries with Customer Context

Autonomous Airline Revenue Management: A Deep Reinforcement Learning Approach to Seat Inventory Control and Overbooking

Online Learning and Pricing for Multiple Products with Reference Price Effects

Deep Reinforcement Learning for Strategic Bidding in Electricity Markets

Optimizing Revenue Maximization and Demand Learning in Airline Revenue Management

Dynamic offer creation for airline ancillaries using a Markov chain choice model

Behaviour-driven Dynamic Pricing Modelling Via Hidden Markov Model

Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning: A Field Experiment

A deep Q-learning approach to optimize ordering and dynamic pricing decisions in the presence of strategic customers

Dynamic Retail Pricing via Q-Learning -- A Reinforcement Learning Framework for Enhanced Revenue Management

Machine Learning based Framework for Robust Price-Sensitivity Estimation with Application to Airline Pricing

Joint Dynamic Pricing for Two Parallel Flights Based on Passenger Choice Behavior

Dynamic Airline Scheduling

Dual-Agent Deep Reinforcement Learning for Dynamic Pricing and Replenishment

Modeling Joint Choice of Airline Itinerary and Fare Product

Airline Seat Inventory Control Based on Passenger Choice Behavior

Bayesian dynamic learning and pricing with strategic customers

Dynamic Pricing for Smart Mobile Edge Computing: A Reinforcement Learning Approach

Dynamic Pricing and Learning with Long-term Reference Effects