On The I/O Complexity of Dynamic Distinct Counting.

Xiaocheng Hu,Yufei Tao,Yi Yang,Shengyu Zhang,Shuigeng Zhou
DOI: https://doi.org/10.4230/LIPIcs.ICDT.2015.265
2015-01-01
Abstract:In dynamic distinct counting, we want to maintain a multi-setS of integers under insertions to answer eciently the query: how many distinct elements are there in S? In external memory, the problem admits two standard solutions. The first one maintainsS in a hash structure, so that the distinct count can be incrementally updated after each insertion using O(1) expected I/Os. A query is answered for free. The second one storesS in a linked list, and thus supports an insertion in O(1/B) amortized I/Os. A query can be answered in O( N logM/B N ) I/Os by sorting, where N =|S|, B is the block size, and M is the memory size. In this paper, we show that the above two naive solutions are already optimal within a polylog factor. Specifically, for any Las Vegas structure using N O(1) blocks, if its expected amortized insertion cost is o( 1 logB ), then it must incur ( N B logB ) expected I/Os answering a query in the worst case, under the (realistic) condition that N is a polynomial of B. This means that the problem is repugnant to update buering: the query cost jumps from 0 dramatically to almost
What problem does this paper attempt to address?