Target-Following Online Resource Allocation Using Proxy Assignments

Chamsi Hssaine,Huseyin Topaloglu,Garrett van Ryzin
2024-12-17
Abstract:We study a target-following variation of online resource allocation. As in classical resource allocation, the decision-maker must assign sequentially arriving jobs to one of multiple available resources. However, in addition to the assignment costs incurred from these decisions, the decision-maker is also penalized for deviating from exogenously given, nonstationary target allocations throughout the horizon. The goal is to minimize the total expected assignment and deviation penalty costs incurred throughout the horizon when the distribution of assignment costs is unknown. In contrast to traditional online resource allocation, in our setting the timing of allocation decisions is critical due to the nonstationarity of allocation targets. Examples of practical problems that fit this framework include many physical resource settings where capacity is time-varying, such as manual warehouse processes where staffing levels change over time, and assignment of packages to outbound trucks whose departure times are scheduled throughout the day. We first show that naive extensions of state-of-the-art algorithms for classical resource allocation problems can fail dramatically when applied to target-following resource allocation. We then propose a novel ``proxy assignment" primal-dual algorithm for the target-following online resource allocation problem that uses current arrivals to simulate the effect of future arrivals. We prove that our algorithm incurs the optimal $O(\sqrt{T})$ regret bound when the assignment costs of the arriving jobs are drawn i.i.d. from a fixed distribution. We demonstrate the practical performance of our approach by conducting numerical experiments on synthetic datasets, as well as real-world datasets from retail fulfillment operations.
Optimization and Control
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the **Target - Following Online Resource Allocation Problem**. Specifically, the researchers are concerned with how to allocate a series of dynamically arriving tasks (or jobs) to multiple available resources within a limited time range, and in the allocation process, two types of costs need to be minimized as much as possible: 1. **Allocation cost**: The direct cost incurred each time a task is allocated to a certain resource. 2. **Deviation cost**: The penalty cost due to the inconsistency between the actual allocation situation and the pre - set time - varying target allocation amount. #### Background and challenges In traditional online resource allocation problems, decision - makers only need to consider how to optimally allocate currently arriving tasks without worrying about future arriving tasks. However, in the problem studied in this paper, the allocation target changes over time (i.e., non - stationary), so decision - makers must consider both current and future allocation targets. This makes the problem more complex because decision - makers not only need to optimize current allocation decisions but also need to ensure that these decisions do not deviate too much from future targets. #### Practical application scenarios This problem has a wide range of applications in many practical scenarios, such as: - **Warehouse management**: In a retail warehouse, the labor level changes over time, and the system needs to reasonably allocate tasks according to the current workload and future expectations to avoid excessive idleness or overstocking. - **Logistics transportation**: Allocate packages to different transport trucks with different departure times. It is necessary to ensure that the number of packages allocated to trucks in each time period is close to the preset target to maximize transportation efficiency and avoid wasting transport capacity. #### Main contributions of the paper To address the above challenges, the authors propose a new algorithm - the primal - dual algorithm of **Proxy Assignments**. This algorithm helps decision - makers better balance current and future allocation decisions by simulating future arrival situations, thereby minimizing the total expected allocation and deviation costs. The authors prove that this algorithm can reach the optimal \(O(\sqrt{T})\) regret bound under certain assumptions and verify its performance on actual data sets through numerical experiments. ### Summary of mathematical formulas - **Total cost formula**: \[ V^\pi[\omega] := \sum_{j \in [n]} \sum_{i \in [m]} c_{ji} Z^\pi_{ji}(T) + \frac{T}{K} \sum_{k \in [K]} \sum_{i \in [m]} k g_{ki}\left(\frac{Z^\pi_i(kT/K)}{kT/K}\right) \] where \(c_{ji}\) is the cost of allocating a task of type \(j\) to resource \(i\), \(Z^\pi_{ji}(T)\) is the number of type \(j\) tasks allocated to resource \(i\) by the end of time \(T\), and \(g_{ki}\) is the deviation cost function of resource \(i\) in the \(k\) - th time period. - **Regret definition**: \[ \text{Reg}[\omega] := V^\pi[\omega] - V^{\text{off}}[\omega] \] where \(V^{\text{off}}[\omega]\) is the cost of the offline optimal solution. Through these formulas, the authors show how to quantify and minimize the total cost and regret value in online resource allocation.