Online Resource Allocation: Bandits feedback and Advice on Time-varying Demands

Lixing Lyu,Wang Chi Cheung
2023-06-12
Abstract:We consider a general online resource allocation model with bandit feedback and time-varying demands. While online resource allocation has been well studied in the literature, most existing works make the strong assumption that the demand arrival process is stationary. In practical applications, such as online advertisement and revenue management, however, this process may be exogenous and non-stationary, like the constantly changing internet traffic. Motivated by the recent Online Algorithms with Advice framework [Mitazenmacher and Vassilvitskii, \emph{Commun. ACM} 2022], we explore how online advice can inform policy design. We establish an impossibility result that any algorithm perform poorly in terms of regret without any advice in our setting. In contrast, we design an robust online algorithm that leverages the online predictions on the total demand volumes. Empowered with online advice, our proposed algorithm is shown to have both theoretical performance and promising numerical results compared with other algorithms in literature. We also provide two explicit examples for the time-varying demand scenarios and derive corresponding theoretical performance guarantees. Finally, we adapt our model to a network revenue management problem, and numerically demonstrate that our algorithm can still performs competitively compared to existing baselines.
Machine Learning,Optimization and Control
What problem does this paper attempt to address?
The paper attempts to address the non - stationary demand challenges in online resource allocation problems. Specifically, it focuses on how to utilize bandit feedback and online advice to optimize decision - making in an online resource allocation model with non - stationary demand. ### Main Problems 1. **Non - stationary Demand**: Most existing studies assume that the demand arrival process is stationary. However, in practical applications such as online advertising and revenue management, the quantity of demand is usually exogenous and non - stationary. For example, Internet traffic is constantly changing. 2. **Exploration and Exploitation in the Bandit - feedback Environment**: Since only partial information can be obtained through bandit feedback, decision - makers need to balance between exploration and exploitation while also dealing with limited resource constraints. 3. **Performance Limitations without Advice**: The paper proves that in the absence of any advice, any algorithm will suffer from a linear - level regret, which indicates the difficulty of designing effective algorithms in this setting. ### Solutions To solve the above problems, the paper proposes the following methods: 1. **Introducing Predictive Advice**: By introducing a predictive advice mechanism, decision - makers can better estimate the total future demand. This advice can come from time - series prediction or other machine - learning methods. 2. **Designing a Robust Algorithm**: The paper designs an algorithm named OA - UCB (Online - Advice - Upper Confidence Bound), which can utilize predictive advice and theoretically guarantee good performance. 3. **Theoretical Analysis**: The paper provides two impossibility results, proving that any algorithm will perform poorly without advice. At the same time, it also derives the regret upper bound in the presence of advice and shows the approximately optimal performance of this algorithm in specific non - stationary demand scenarios. ### Application Examples The paper also provides two specific examples of non - stationary demand scenarios: the linear growth model and the AR(1) model, and designs corresponding predictive advice mechanisms for each example and derives explicit regret upper bounds. ### Summary In general, the paper aims to solve the online resource allocation problem with non - stationary demand. By introducing a predictive advice mechanism, it designs a robust online algorithm and theoretically proves its effectiveness.