Equi-depth Histogram Construction for Big Data with Quality Guarantees

Burak Yıldız,Tolga Büyüktanır,Fatih Emekci
DOI: https://doi.org/10.48550/arXiv.1606.05633
2016-06-18
Abstract:The amount of data generated and stored in cloud systems has been increasing exponentially. The examples of data include user generated data, machine generated data as well as data crawled from the Internet. There have been several frameworks with proven efficiency to store and process the petabyte scale data such as Apache Hadoop, HDFS and several NoSQL frameworks. These systems have been widely used in industry and thus are subject to several research. The proposed data processing techniques should be compatible with the above frameworks in order to be practical. One of the key data operations is deriving equi-depth histograms as they are crucial in understanding the statistical properties of the underlying data with many applications including query optimization. In this paper, we focus on approximate equi-depth histogram construction for big data and propose a novel merge based histogram construction method with a histogram processing framework which constructs an equi-depth histogram for a given time interval. The proposed method constructs approximate equi-depth histograms by merging exact equi-depth histograms of partitioned data by guaranteeing a maximum error bound on the number of items in a bucket (bucket size) as well as any range on the histogram.
Databases,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?