Abstract:We consider a general online resource allocation model with bandit feedback and time-varying demands. While online resource allocation has been well studied in the literature, most existing works make the strong assumption that the demand arrival process is stationary. In practical applications, such as online advertisement and revenue management, however, this process may be exogenous and non-stationary, like the constantly changing internet traffic. Motivated by the recent Online Algorithms with Advice framework [Mitazenmacher and Vassilvitskii, \emph{Commun. ACM} 2022], we explore how online advice can inform policy design. We establish an impossibility result that any algorithm perform poorly in terms of regret without any advice in our setting. In contrast, we design an robust online algorithm that leverages the online predictions on the total demand volumes. Empowered with online advice, our proposed algorithm is shown to have both theoretical performance and promising numerical results compared with other algorithms in literature. We also provide two explicit examples for the time-varying demand scenarios and derive corresponding theoretical performance guarantees. Finally, we adapt our model to a network revenue management problem, and numerically demonstrate that our algorithm can still performs competitively compared to existing baselines.

What problem does this paper attempt to address?

The paper attempts to address the non - stationary demand challenges in online resource allocation problems. Specifically, it focuses on how to utilize bandit feedback and online advice to optimize decision - making in an online resource allocation model with non - stationary demand. ### Main Problems 1. **Non - stationary Demand**: Most existing studies assume that the demand arrival process is stationary. However, in practical applications such as online advertising and revenue management, the quantity of demand is usually exogenous and non - stationary. For example, Internet traffic is constantly changing. 2. **Exploration and Exploitation in the Bandit - feedback Environment**: Since only partial information can be obtained through bandit feedback, decision - makers need to balance between exploration and exploitation while also dealing with limited resource constraints. 3. **Performance Limitations without Advice**: The paper proves that in the absence of any advice, any algorithm will suffer from a linear - level regret, which indicates the difficulty of designing effective algorithms in this setting. ### Solutions To solve the above problems, the paper proposes the following methods: 1. **Introducing Predictive Advice**: By introducing a predictive advice mechanism, decision - makers can better estimate the total future demand. This advice can come from time - series prediction or other machine - learning methods. 2. **Designing a Robust Algorithm**: The paper designs an algorithm named OA - UCB (Online - Advice - Upper Confidence Bound), which can utilize predictive advice and theoretically guarantee good performance. 3. **Theoretical Analysis**: The paper provides two impossibility results, proving that any algorithm will perform poorly without advice. At the same time, it also derives the regret upper bound in the presence of advice and shows the approximately optimal performance of this algorithm in specific non - stationary demand scenarios. ### Application Examples The paper also provides two specific examples of non - stationary demand scenarios: the linear growth model and the AR(1) model, and designs corresponding predictive advice mechanisms for each example and derives explicit regret upper bounds. ### Summary In general, the paper aims to solve the online resource allocation problem with non - stationary demand. By introducing a predictive advice mechanism, it designs a robust online algorithm and theoretically proves its effectiveness.

Online Resource Allocation: Bandits feedback and Advice on Time-varying Demands

Online Resource Allocation with Non-Stationary Customers

Online Resource Allocation with Convex-set Machine-Learned Advice

Online Stochastic Allocation of Reusable Resources

Single-Leg Revenue Management with Advice

Inventory Balancing with Online Learning

A Unified Model for the Two-stage Offline-then-Online Resource Allocation

Online Optimization for Network Resource Allocation and Comparison with Reinforcement Learning Techniques

Online Resource Allocation with Customer Choice

Offline Planning and Online Learning Under Recovering Rewards

Best of Many in Both Worlds: Online Resource Allocation with Predictions under Unknown Arrival Model

Exponentially Weighted Algorithm for Online Network Resource Allocation with Long-Term Constraints

Online Bayesian Recommendation with No Regret

Stochastic Bandits with Graph Feedback in Non-Stationary Environments

Cascading Bandits: Optimizing Recommendation Frequency in Delayed Feedback Environments.

Non-stationary Continuum-armed Bandits for Online Hyperparameter Optimization.

Stochastic Averaging for Constrained Optimization With Application to Online Resource Allocation

The Best of Many Worlds: Dual Mirror Descent for Online Allocation Problems

Online Optimization for Randomized Network Resource Allocation with Long-Term Constraints

Online Ad Procurement in Non-stationary Autobidding Worlds

Dynamic Resource Allocation: Algorithmic Design Principles and Spectrum of Achievable Performances