Partial-Match Queries with Random Wildcards: In Tries and Distributed Hash Tables

Junichiro Fukuyama
DOI: https://doi.org/10.48550/arXiv.1601.04213
2016-01-26
Abstract:Consider an $m$-bit query $q$ to a bitwise trie $T$. A wildcard $*$ is an unspecified bit in $q$ for which the query asks the membership for both cases $*=0$ and $*=1$. It is common that such partial-match queries with wildcards are issued in tries. With uniformly random occurrences of $w$ wildcards in $q$ assumed, the obvious upper bound on the average number of traversal steps in $T$ is $2^w m$. We show that the average does not exceed \[ \frac{m+1}{w+1} \left( 2^{w+2} - 2 w - 4 \right) + m = O \left( \frac{2^w m}{w} \right), \] and equals the value exactly when $T$ includes all the $m$-bit keys as the worst case. Here the query $q$ performs with the naive backtracking algorithm in $T$. It is similarly shown that the average is $O \left( \frac{k^w m}{w} \right)$ in a general trie of maximum out-degree $k$. Our analysis for tries is extended to a distributed hash table (DHT), which is among the most frequently used decentralized data structures in networking. We show, under a natural probabilistic assumption for the largest class of DHTs, that the average number of hops required by an $m$-bit query $q$ to a DHT $D$ with random $w$ wildcards meets the same asymptotic bound. As a result, $q$ is answered with average $O \left( \frac{2^w m}{w} \right)$ hops rather than $\Theta \left( 2^w m \right)$ in the four major DHTs Chord, Pastry, Tapestry and Kademlia. In addition, with a uniform key distribution for sufficiently many entries, we prove that a lookup request to the DHT Chord is answered correctly with $O(m)$ hops and probability $1 - 2^{-\Omega (m)}$. To the author's knowledge, the probability $1 - 2^{-\Omega (m)}$ of correct lookup in Chord has not been identified so far.
Data Structures and Algorithms
What problem does this paper attempt to address?