Abstract:We study a target-following variation of online resource allocation. As in classical resource allocation, the decision-maker must assign sequentially arriving jobs to one of multiple available resources. However, in addition to the assignment costs incurred from these decisions, the decision-maker is also penalized for deviating from exogenously given, nonstationary target allocations throughout the horizon. The goal is to minimize the total expected assignment and deviation penalty costs incurred throughout the horizon when the distribution of assignment costs is unknown. In contrast to traditional online resource allocation, in our setting the timing of allocation decisions is critical due to the nonstationarity of allocation targets. Examples of practical problems that fit this framework include many physical resource settings where capacity is time-varying, such as manual warehouse processes where staffing levels change over time, and assignment of packages to outbound trucks whose departure times are scheduled throughout the day. We first show that naive extensions of state-of-the-art algorithms for classical resource allocation problems can fail dramatically when applied to target-following resource allocation. We then propose a novel ``proxy assignment" primal-dual algorithm for the target-following online resource allocation problem that uses current arrivals to simulate the effect of future arrivals. We prove that our algorithm incurs the optimal $O(\sqrt{T})$ regret bound when the assignment costs of the arriving jobs are drawn i.i.d. from a fixed distribution. We demonstrate the practical performance of our approach by conducting numerical experiments on synthetic datasets, as well as real-world datasets from retail fulfillment operations.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the **Target - Following Online Resource Allocation Problem**. Specifically, the researchers are concerned with how to allocate a series of dynamically arriving tasks (or jobs) to multiple available resources within a limited time range, and in the allocation process, two types of costs need to be minimized as much as possible: 1. **Allocation cost**: The direct cost incurred each time a task is allocated to a certain resource. 2. **Deviation cost**: The penalty cost due to the inconsistency between the actual allocation situation and the pre - set time - varying target allocation amount. #### Background and challenges In traditional online resource allocation problems, decision - makers only need to consider how to optimally allocate currently arriving tasks without worrying about future arriving tasks. However, in the problem studied in this paper, the allocation target changes over time (i.e., non - stationary), so decision - makers must consider both current and future allocation targets. This makes the problem more complex because decision - makers not only need to optimize current allocation decisions but also need to ensure that these decisions do not deviate too much from future targets. #### Practical application scenarios This problem has a wide range of applications in many practical scenarios, such as: - **Warehouse management**: In a retail warehouse, the labor level changes over time, and the system needs to reasonably allocate tasks according to the current workload and future expectations to avoid excessive idleness or overstocking. - **Logistics transportation**: Allocate packages to different transport trucks with different departure times. It is necessary to ensure that the number of packages allocated to trucks in each time period is close to the preset target to maximize transportation efficiency and avoid wasting transport capacity. #### Main contributions of the paper To address the above challenges, the authors propose a new algorithm - the primal - dual algorithm of **Proxy Assignments**. This algorithm helps decision - makers better balance current and future allocation decisions by simulating future arrival situations, thereby minimizing the total expected allocation and deviation costs. The authors prove that this algorithm can reach the optimal $O(\sqrt{T})$ regret bound under certain assumptions and verify its performance on actual data sets through numerical experiments. ### Summary of mathematical formulas - **Total cost formula**: \[ V^\pi[\omega] := \sum_{j \in [n]} \sum_{i \in [m]} c_{ji} Z^\pi_{ji}(T) + \frac{T}{K} \sum_{k \in [K]} \sum_{i \in [m]} k g_{ki}\left(\frac{Z^\pi_i(kT/K)}{kT/K}\right) \] where $c_{ji}$ is the cost of allocating a task of type $j$ to resource $i$, $Z^\pi_{ji}(T)$ is the number of type $j$ tasks allocated to resource $i$ by the end of time $T$, and $g_{ki}$ is the deviation cost function of resource $i$ in the $k$ - th time period. - **Regret definition**: \[ \text{Reg}[\omega] := V^\pi[\omega] - V^{\text{off}}[\omega] \] where $V^{\text{off}}[\omega]$ is the cost of the offline optimal solution. Through these formulas, the authors show how to quantify and minimize the total cost and regret value in online resource allocation.

Target-Following Online Resource Allocation Using Proxy Assignments

A Target Allocation Method Inspired by Hungarian Algorithm

Online Optimization for Network Resource Allocation and Comparison with Reinforcement Learning Techniques

The Best of Many Worlds: Dual Mirror Descent for Online Allocation Problems

Online Resource Allocation with Non-Stationary Customers

Dynamic Resource Allocation: Algorithmic Design Principles and Spectrum of Achievable Performances

Online Proactive Multi-Task Assignment with Resource Availability Anticipation

Online Resource Allocation in Episodic Markov Decision Processes

Exponentially Weighted Algorithm for Online Network Resource Allocation with Long-Term Constraints

Online Optimization for Randomized Network Resource Allocation with Long-Term Constraints

Online Resource Allocation: Bandits feedback and Advice on Time-varying Demands

A Unified Model for the Two-stage Offline-then-Online Resource Allocation

Stochastic Averaging for Constrained Optimization With Application to Online Resource Allocation

Online Stochastic Allocation of Reusable Resources

An Online Convex Optimization Approach to Proactive Network Resource Allocation

Online Resource Allocation with Customer Choice

Multi-resource allocation and care sequence assignment in patient management: a stochastic programming approach

Assignment Algorithms for Multi-Robot Multi-Target Tracking with Sufficient and Limited Sensing Capability

Gradient and Projection Free Distributed Online Min-Max Resource Optimization

Inverse Risk-sensitive Multi-Robot Task Allocation