Approximate NN queries on Streams with Guaranteed Error/performance Bounds

Nick Koudas,Beng Chin Ooi,Kian-lee Tan,Rui Zhang
DOI: https://doi.org/10.1016/B978-012088469-8.50071-1
2004-01-01
Abstract:Abstract In data stream applications, data arrive con - tinuously and can only be scanned once as the query processor has very limited memory (rel - ative to the size of the stream) to work with Hence, queries on data streams do not have ac - cess to the entire data set and query answers are typically approximate While there have been many studies on the k Nearest Neigh - bors (kNN) problem in conventional multi - dimensional databases, the solutions cannot be directly applied to data streams for the above reasons In this paper, we investigate the kNN problem over data streams We first intro - duce the e - approximate kNN (ekNN) problem that finds the approximate kNN answers of a query point Q such that the absolute error of the k - th nearest neighbor distance is bounded by e To support ekNN queries over streams, we propose a technique called DISC (aDaptive Indexing on Streams by space - filling Curves) DISC can adapt to di?erent data distributions to either (a) optimize memory utilization to answer ekNN queries under certain accuracy requirements or (b) achieve the best accuracy under a given memory constraint At the same time, DISC provide e±cient updates and query processing which are important requirements in data stream applications Extensive exper - iments were conducted using both synthetic and real data sets and the results confirm the e?ectiveness and e±ciency of DISC
What problem does this paper attempt to address?