Proximal policy optimization algorithm for dynamic pricing with online reviews

Chao Wu,Wenjie Bi,Haiying Liu
DOI: https://doi.org/10.1016/j.eswa.2022.119191
IF: 8.5
2023-03-01
Expert Systems with Applications
Abstract:This study investigates whether the presence of both quality- and value-based online reviews help firms make decisions. To adapt to a complex real-world environment, we construct two simulated environments with high and low initial consumer-perceived quality and employ a Proximal Policy Optimization algorithm (PPO) to derive optimal pricing strategies. The simulation results show that retailers can gain higher revenue by considering quality-based reviews only when the consumers' initial perceived quality is low. In addition, retailers must choose an appropriate promotion method based on the social learning speed of the consumer group. When the social learning speed is slow, retailers should invest more in promotion costs to improve the initial perceived quality of consumers and thus increase revenue. Compared to the Advantage Actor-Critic algorithm, the PPO algorithm exhibits better performance, provides a new approach for complex and continuous revenue management problems, and can be applied to a wider range of areas.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science
What problem does this paper attempt to address?