A Learning-Based Approach to Estimate Statistics of Operators in Continuous Queries: a Case Study.

Like Gao,Min Wang,Xiaoyang Sean Wang,Sriram Padmanabhan
DOI: https://doi.org/10.1145/882082.882097
2003-01-01
Abstract:Statistic estimation such as output size estimation of operators is a well-studied subject in the database research community, mainly for the purpose of query optimization. The assumption, however, is that queries are ad-hoc and therefore the emphasis has been on capturing the data distribution. When long standing continuous queries on a changing database are concerned, a more direct approach, namely building an estimation model for each operator, is possible. In this paper, we propose a novel learning-based method. Our method consists of two steps. The first step is to design a dedicated feature extraction algorithm that can be used incrementally to obtain feature values from the underlying data. The second step is to use a data mining algorithm to generate an estimation model based on the feature values extracted from the historical data. To illustrate the approach, this paper studies the case of similarity-based searches over streaming time series. Experimental results show this approach provides accurate statistic estimates with a low overhead.
What problem does this paper attempt to address?