Supporting Real-Time Analytic Queries In Big And Fast Data Environments

Guangjun Wu,Xiao-chun Yun,Chao Li,Shupeng Wang,Yipeng Wang,Xiaoyu Zhang,Siyu Jia,Guangyan Zhang
DOI: https://doi.org/10.1007/978-3-319-55699-4_29
2017-01-01
Abstract:Recently there has been a significant interest to perform real-time analytical queries in systems that can handle both "big data" and "fast data". In this paper, we propose an approximate answering approach, called ROSE, which can manage the big and fast data streams and support complex analytical queries against the data streams. To achieve this goal, we start with an analysis of existing query processing techniques in big data systems to understand the requirements of building a distributed analytic sketch. We then propose a sampling-based sketch that can extract multi-faced samples from asynchronous data streams, and augment its usability with accuracy-lossless distributed sketch construction operations, such as splitting, merging and union. The experimental results with real-world data sets indicate that compared with state-of-the-art approximate answering engine BlinkDB, our techniques can obtain more accurate estimates and improve 2 times of system throughput. When compared with distributed memory-computing system Spark, our system can achieve 2 orders of magnitude improvement on query response time.
What problem does this paper attempt to address?