Exploration of Hybrid Multi-Dimensional Histograms for Hybrid Multi-Dimensional Data Distribution

曹巍,王珊
DOI: https://doi.org/10.3724/sp.j.1087.2009.02487
2009-01-01
Journal of Computer Applications
Abstract:In reality,it is often the case that multi-dimensional data distributions do not exhibit one single type of data distribution as a whole,but rather,in different regions of the data space,different types of data distributions are obviously shown.The authors proposed a new kind of hybrid multi-dimensional histograms-COCA-Hist-based hybrid data distributions to tackle the problem.The method built up COCA-Hist,which was composed of different kinds of buckets according to different regions in the data space with different data distribution characteristics,under the given space budget.The aim was to enhance the estimation accuracy of the multi-dimensional histograms in general.Because COCA-Hist had to scan the tree structure twice of the histogram being built to discern the different data regions and allocated the space budget among them,COCA-Hist was a little inferior in efficiency.But the improvement in both universality and estimation accuracy made the cost in time worthwhile.
What problem does this paper attempt to address?