Abstract:distributed weighted random sampling algorithm in distributed stream systems is novel.Multi-level query processing is presented to solve complex tasks and queries in distributed system.A synthetic processing topology strategy is provided to merge streams on partial repeated computing of overlap data.Early results with statistical estimations are continuously sent to users. Interactive query processing aims at generating approximate results with minimum response time. However, it is quite difficult for a batch-oriented processing system to progressively provide cumulatively accurate results in the context of a distributed environment. MapReduce Online extends the MapReduce framework to support online aggregation, but it is hindered by its processing speed in keeping up with ongoing real-time data events. We deploy the online aggregation algorithm over S4, a scalable stream processing system that is inspired by the combined functionalities of MapReduce and Actor model. Our system applies an asynchronous message communication mechanism from actor model to support online aggregation. It can process large scale data stream with high concurrency in a short response time. In this system, we adopt a distributed weighted random sampling algorithm to solve biased distribution between different streams. Furthermore, a multi-level query processing topology is developed to reduce overlapped processing for multiple queries. Our system can provide continuous window aggregation with a confidence interval and error bound. We have implemented our system and conducted plentiful experiments over the TPC-H benchmark. A large number of experiments are carried out to demonstrate that by using our system, high-quality query results can be generated within a short response time and that the approach outperforms MapReduce Online on data streams.

Approximate Aggregations in Structured P2p Networks

Modeling and Performance Analysis of Unstructured P2P Network

Just-in-time Query Retrieval over Partially Indexed Data on Structured P2P Overlays

P2P Service Performance Analysis of Unstructured P2P Network

AbIx: an Approach to Content-Based Approximate Query Processing in Peer-to-Peer Data Systems

AQP++: Connecting Approximate Query Processing with Aggregate Precomputation for Interactive Analytics

Online Aggregation based Approximate Query Processing: A Literature Survey

Cell Abstract Indices for Content-Based Approximate Query Processing in Structured Peer-to-Peer Data Systems.

Combining Aggregation and Sampling (Nearly) Optimally for Approximate Query Processing

Dynamic Clustering-Based Query Answering in Peer-to-Peer Systems

Progressive online aggregation in a distributed stream system

Adaptive Multi-join Query Processing in PDBMS

SAQP++: Bridging the Gap Between Sampling-Based Approximate Query Processing and Aggregate Precomputation.

A Loosely Synchronized Gossip-Based Algorithm For Aggregate Information Computation

Building a scalable bipartite P2P overlay network

Improving Query Response Delivery Quality in Peer-to-Peer Systems.

An Efficient Window-Based Range Query Algorithm in P2P

An Efficient Architecture for Information Retrieval in P2P Context Using Hypergraph

Efficient Skyline Computation in Structured Peer-to-Peer Systems

Towards Efficient Ranked Query Processing in Peer-to-Peer Networks

Opportunistic Data Aggregation Algorithm Using Any-cast