The Size of a $t$-Digest

Ted Dunning
DOI: https://doi.org/10.48550/arXiv.1903.09921
2019-03-24
Abstract:A $t$-digest is a compact data structure that allows estimates of quantiles which increased accuracy near $q = 0$ or $q=1$. This is done by clustering samples from $\mathbb R$ subject to a constraint that the number of points associated with any particular centroid is constrained so that the so-called $k$-size of the centroid is always $\le 1$. The $k$-size is defined using a scale function that maps quantile $q$ to index $k$. This paper provides bounds on the sizes of $t$-digests created using any of four known scale functions.
Computation
What problem does this paper attempt to address?