Abstract:distributed weighted random sampling algorithm in distributed stream systems is novel.Multi-level query processing is presented to solve complex tasks and queries in distributed system.A synthetic processing topology strategy is provided to merge streams on partial repeated computing of overlap data.Early results with statistical estimations are continuously sent to users. Interactive query processing aims at generating approximate results with minimum response time. However, it is quite difficult for a batch-oriented processing system to progressively provide cumulatively accurate results in the context of a distributed environment. MapReduce Online extends the MapReduce framework to support online aggregation, but it is hindered by its processing speed in keeping up with ongoing real-time data events. We deploy the online aggregation algorithm over S4, a scalable stream processing system that is inspired by the combined functionalities of MapReduce and Actor model. Our system applies an asynchronous message communication mechanism from actor model to support online aggregation. It can process large scale data stream with high concurrency in a short response time. In this system, we adopt a distributed weighted random sampling algorithm to solve biased distribution between different streams. Furthermore, a multi-level query processing topology is developed to reduce overlapped processing for multiple queries. Our system can provide continuous window aggregation with a confidence interval and error bound. We have implemented our system and conducted plentiful experiments over the TPC-H benchmark. A large number of experiments are carried out to demonstrate that by using our system, high-quality query results can be generated within a short response time and that the approach outperforms MapReduce Online on data streams.

SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming

Approximate mining of global closed frequent itemsets over data streams

A New Algorithm for Mining Global Frequent Itemsets in a Stream.

Gc-Tree: A Fast Online Algorithm For Mining Frequent Closed Itemsets

Finding Frequent Closed Itemsets in Sliding Window in Linear Time.

Efficient Discovery of Emerging Frequent Patterns in ArbitraryWindows on Data Streams

Efficient Algorithm for Mining of Frequent Itemsets over Uncertain Data Streams

Progressive online aggregation in a distributed stream system

Frequent Items Mining Based on Weight in Data Stream

Mining Frequent Itemsets Over Arbitrary Time Intervals in Data Streams

Mining Robust Frequent Items in Data Streams

Distributed Streaming Analytics on Large-scale Oceanographic Data using Apache Spark

Claim: An Efficient Method For Relaxed Frequent Closed Itemsets Mining Over Stream Data

Methods for Mining Frequent Items in Data Streams: an Overview

State-of-the-art on Frequent Pattern Mining in Data Streams

Web Technologies and Applications

Finding Frequent Items in Time Decayed Data Streams.

YAFIM: A Parallel Frequent Itemset Mining Algorithm with Spark

Mining Algorithm for Frequent Pattern in Data Stream Based on Stream-cube

Novel structures for counting frequent items in time decayed streams

Survey of the Study on Frequent Pattern Mining in Data Streams