The Price of Clustering in Bin-Packing with Applications to Bin-Packingwith Delays

Yossi Azar,Yuval Emek,Rob van Stee,Danny Vainstein
DOI: https://doi.org/10.1145/3323165.3323180
2019-06-17
Abstract:One of the most significant algorithmic challenges in the "big data era" is handling instances that are too large to be processed by a single machine. The common practice in this regard is to partition the massive problem instance into smaller ones and process each one of them separately. In some cases, the solutions for the smaller instances are later on assembled into a solution for the whole instance, but in many cases this last stage cannot be pursued (e.g., because it is too costly, because of locality issues, or due to privacy considerations). Motivated by this phenomenon, we consider the following natural combinatorial question: Given a bin-packing instance (namely, a set of items with sizes in (0, 1] that should be packed into unit capacity bins) I and a partition Ii \ i of I into clusters, how large is the ratio ∑i Øpt(Ii) / Øpt(I), where Øpt(J) denotes the optimal number of bins into which the items in J can be packed? In this paper, we investigate the supremum of this ratio over all instances I and partitions Ii \ i, referred to as the bin-packing price of clustering (¶oC ). It is trivial to observe that if each cluster contains only one tiny item (and hence, Øpt(Ii) = 1), then the ¶oC is unbounded. On the other hand, a relatively straightforward argument shows that under the constraint that Øpt(Ii) ≥ 2, the ¶oC is 2. Our main challenge was to determine whether the ¶oC drops below 2 when Øpt(Ii) > 2. In addition, one may hope that łimk -> ∞ ¶oC(k) = 1, where ¶oC(k) denotes the ¶oC under the restriction to clusters Ii with Øpt(Ii) ≥ k. We resolve the former question affirmatively and the latter one negatively: Our main results are that ¶oC(k) łeq 1.951 for any k ≥ 3 and łimk -> ∞ ¶oC(k) = 1.691... Moreover, the former bound cannot be significantly improved as ¶oC(3) > 1.933. In addition to the immediate contribution of this combinatorial result to "big data" kind of applications, it turns out that it is useful also for an interesting online problem called bin-packing with delays.
What problem does this paper attempt to address?