Abstract:We consider the first-come-first-serve (FCFS) [Formula: see text] queue and prove the first simple and explicit bounds that scale as [Formula: see text] under only the assumption that interarrival times have finite second moment, and service times have finite [Formula: see text] moment for some [Formula: see text]. Here, ρ denotes the corresponding traffic intensity. Conceptually, our results can be viewed as a multiserver analogue of Kingman’s bound. Our main results are bounds for the tail of the steady-state queue length and the steady-state probability of delay. The strength of our bounds (e.g., in the form of tail decay rate) is a function of how many moments of the service distribution are assumed finite. Our bounds scale gracefully, even when the number of servers grows large and the traffic intensity converges to unity simultaneously, as in the Halfin-Whitt scaling regime. Some of our bounds scale better than [Formula: see text] in certain asymptotic regimes. In these same asymptotic regimes, we also prove bounds for the tail of the steady-state number in service. Our main proofs proceed by explicitly analyzing the bounding process that arises in the stochastic comparison bounds of Gamarnik and Goldberg for multiserver queues. Along the way, we derive several novel results for suprema of random walks and pooled renewal processes, which may be of independent interest. We also prove several additional bounds using drift arguments (which have much smaller prefactors) and point out a conjecture that would imply further related bounds and generalizations. We also show that when all moments of the service distribution are finite and satisfy a mild growth rate assumption, our bounds can be strengthened to yield explicit tail estimates decaying as [Formula: see text], with [Formula: see text], depending on the growth rate of these moments. Funding: Financial support from the National Science Foundation [Grant 1333457] is gratefully acknowledged. Supplemental Material: The supplemental appendix is available at https://doi.org/10.1287/moor.2022.0131 .

Flexible Queueing Architectures

Managing flexibility: optimal sizing and scheduling of flexible servers

Transportation Polytope and its Applications in Parallel Server Systems

Queueing system with batch arrival of heterogeneous orders, flexible limited processor sharing and dynamical change of priorities

Diffusion approximation for efficiency-driven queues: A space-time scaling approach

Scheduling in Parallel Queues with Randomly Varying Connectivity and Switchover Delay

Stability of Decentralized Queueing Networks Beyond Complete Bipartite Cases

Zero Queueing for Multi-Server Jobs

Discrete-Time Informal Queue with an Infinite Number of Groups for Resource Management of a Data Center

Fluid limits for interacting queues in sparse dynamic graphs

Simple and explicit bounds for multi-server queues with $1/(1-ρ)$ scaling

Group-Server Queues

Spatial Queues with Nearest Neighbour Shifts

On the Benefit of Virtualization: Strategies for Flexible Server Allocation

A finite‐capacity queue with exhaustive vacation/close‐down/setup times and Markovian arrival processes

On Universal Scaling of Distributed Queues under Load Balancing

Balanced Routing with Partial Information in a Distributed Parallel Many-Server Queueing System

Simple and Explicit Bounds for Multiserver Queues with [math] Scaling

Server Routing-Scheduling Problem in Distributed Queueing System with Time-Varying Demand and Queue Length Control

Generalized Parallel-Server Fork-Join Queues with Dynamic Task Scheduling.

Asymptotic Optimality of Balanced Routing