Robin Hood Hashing really has constant average search cost and variance in full tables

Patricio V. Poblete,Alfredo Viola
DOI: https://doi.org/10.48550/arXiv.1605.04031
2016-05-13
Abstract:Thirty years ago, the Robin Hood collision resolution strategy was introduced for open addressing hash tables, and a recurrence equation was found for the distribution of its search cost. Although this recurrence could not be solved analytically, it allowed for numerical computations that, remarkably, suggested that the variance of the search cost approached a value of $1.883$ when the table was full. Furthermore, by using a non-standard mean-centered search algorithm, this would imply that searches could be performed in expected constant time even in a full table. In spite of the time elapsed since these observations were made, no progress has been made in proving them. In this paper we introduce a technique to work around the intractability of the recurrence equation by solving instead an associated differential equation. While this does not provide an exact solution, it is sufficiently powerful to prove a bound for the variance, and thus obtain a proof that the variance of Robin Hood is bounded by a small constant for load factors arbitrarily close to 1. As a corollary, this proves that the mean-centered search algorithm runs in expected constant time. We also use this technique to study the performance of Robin Hood hash tables under a long sequence of insertions and deletions, where deletions are implemented by marking elements as deleted. We prove that, in this case, the variance is bounded by $1/(1-\alpha)+O(1)$, where $\alpha$ is the load factor. To model the behavior of these hash tables, we use a unified approach that can be applied also to study the First-Come-First-Served and Last-Come-First-Served collision resolution disciplines, both with and without deletions.
Data Structures and Algorithms
What problem does this paper attempt to address?