A Tale of Two Cities: Pessimism and Opportunism in Offline Dynamic Pricing

Zeyu Bian,Zhengling Qi,Cong Shi,Lan Wang
2024-11-13
Abstract:This paper studies offline dynamic pricing without data coverage assumption, thereby allowing for any price including the optimal one not being observed in the offline data. Previous approaches that rely on the various coverage assumptions such as that the optimal prices are observable, would lead to suboptimal decisions and consequently, reduced profits. We address this challenge by framing the problem to a partial identification framework. Specifically, we establish a partial identification bound for the demand parameter whose associated price is unobserved by leveraging the inherent monotonicity property in the pricing problem. We further incorporate pessimistic and opportunistic strategies within the proposed partial identification framework to derive the estimated policy. Theoretically, we establish rate-optimal finite-sample regret guarantees for both strategies. Empirically, we demonstrate the superior performance of the newly proposed methods via a synthetic environment. This research provides practitioners with valuable insights into offline pricing strategies in the challenging no-coverage setting, ultimately fostering sustainable growth and profitability of the company.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to conduct offline dynamic pricing without the data coverage assumption. Specifically, existing methods usually rely on certain coverage assumptions. For example, the optimal price can be observed in offline data. However, this assumption often does not hold in practical applications because the optimal price may not appear in historical data. If these methods are directly used, sub - optimal decisions may be made, thereby reducing profits. To address this challenge, the paper proposes a new framework, namely the partial identification framework, which establishes partial identification boundaries of demand parameters by leveraging the inherent monotonicity property in the pricing problem. Even if some prices are not observed, this framework can still provide useful information. In addition, the paper combines the pessimistic strategy and the opportunistic strategy to derive the estimation policy. Theoretically, the paper establishes the optimal regret bounds of these two strategies in the finite - sample case. Empirically, the paper demonstrates the superior performance of the new method through a synthetic environment. ### Main Contributions 1. **Offline Dynamic Pricing Technique without Data Coverage Assumption**: This is the first paper to develop a statistically sound offline dynamic pricing technique without assuming data coverage. 2. **Partial Identification Framework**: A new partial identification framework is introduced for learning pessimistic and opportunistic strategies when there are unobserved prices. 3. **Theoretical Guarantees**: The regret bounds of the two strategies are established, and these regret bounds consist of two parts: one part comes from the estimation error of the demand parameters of the observed prices, and the other part comes from the error of the potentially unobserved optimal price. 4. **Algorithm Implementation**: Two efficient algorithms are proposed, and their superior performance is demonstrated through simulation studies. ### Pessimistic Strategy and Opportunistic Strategy - **Pessimistic Strategy**: Select the action that maximizes the return in the worst - case scenario. - **Opportunistic Strategy**: Select the action that minimizes the maximum loss or regret. ### Application Background Dynamic pricing is very important in modern revenue management, aiming to optimize profits, improve operational efficiency, and maintain market competitiveness. By exploring dynamic pricing strategies, enterprise decision - makers can better balance supply and demand, effectively utilize inventory, and respond to market dynamics, ultimately achieving sustainable growth and profitability. ### Related Work - **Offline Reinforcement Learning**: The main challenge faced by offline RL is the limited interaction with the environment, resulting in the data set may not fully cover all possible states and actions. Existing methods usually rely on data coverage assumptions. - **Dynamic Pricing**: Existing literature mainly focuses on online settings, while there are fewer studies on offline settings. This paper fills this gap by introducing the partial identification framework. ### Conclusion By introducing the partial identification framework and combining the pessimistic and opportunistic strategies, this paper provides a new solution to the offline dynamic pricing problem, which has important theoretical and practical significance.