Approximate Algorithm for Mining Frequent Items in Data Streams

王述云,张成洪,范颖捷,徐和祥,胡运发
2009-01-01
Abstract:A new data structure-ESBF(extensible and scalable Bloom Filter)is introduced here and a ESBF-based algorithm is also proposed for estimating the frequent items in data streams approximatly.The proposed algorithm can work with high precision and it is more efficient in terms of time and memory consuming than the other algorithms dealing with the frequent item mining in data streams in most cases.It is also proved here that the number of counter needed is only ln((-M)/(lnρ))·e/e·1/(e·M)for required precision and probability.
What problem does this paper attempt to address?