Quantitative Model and Fast Estimation Algorithm of Sampling Error for Association Rule Mining

JIA Cai-Yan,LU Ru-Qian
DOI: https://doi.org/10.3321/j.issn:0254-4164.2006.04.015
2006-01-01
Chinese Journal of Computers
Abstract:Sampling is a simple and effective technique to improve the efficiency and the scalability of algorithms for association rule mining.However,there is lack of necessary research to define the degree of error with respect to the outcome of the algorithm,i.e.,a quantitative model to measure the sampling error,and to estimate the error efficiently.In this paper,based on the systematic analysis, the authors point out the deficiency of the current results in this field and give a novel,flexible quantitative model to measure the sampling error,and propose a high efficient computational method,interval estimation algorithm of cardinal error,for estimating sampling error based on the real observation and the theoretical analysis.Both of theoretical analysis and realistic experiments show the error between sample set and original dataset can be obtained effectively and accurately by the model,the representative capability of sample set to original dataset also can be estimated efficiently and exactly by the interval estimation algorithm of cardinal error.What's more,the interval estimation algorithm of cardinal error can be conveniently nested into sampling algorithms to speed up them and is useful for distributed,parallel association rule mining algorithms.
What problem does this paper attempt to address?