Abstract:distributed weighted random sampling algorithm in distributed stream systems is novel.Multi-level query processing is presented to solve complex tasks and queries in distributed system.A synthetic processing topology strategy is provided to merge streams on partial repeated computing of overlap data.Early results with statistical estimations are continuously sent to users. Interactive query processing aims at generating approximate results with minimum response time. However, it is quite difficult for a batch-oriented processing system to progressively provide cumulatively accurate results in the context of a distributed environment. MapReduce Online extends the MapReduce framework to support online aggregation, but it is hindered by its processing speed in keeping up with ongoing real-time data events. We deploy the online aggregation algorithm over S4, a scalable stream processing system that is inspired by the combined functionalities of MapReduce and Actor model. Our system applies an asynchronous message communication mechanism from actor model to support online aggregation. It can process large scale data stream with high concurrency in a short response time. In this system, we adopt a distributed weighted random sampling algorithm to solve biased distribution between different streams. Furthermore, a multi-level query processing topology is developed to reduce overlapped processing for multiple queries. Our system can provide continuous window aggregation with a confidence interval and error bound. We have implemented our system and conducted plentiful experiments over the TPC-H benchmark. A large number of experiments are carried out to demonstrate that by using our system, high-quality query results can be generated within a short response time and that the approach outperforms MapReduce Online on data streams.

Supporting Self-Adaptation in Streaming Data Mining Applications

Adaptive Scheduling for Efficient Execution of Dynamic Stream Workflows

Deep-Reinforcement-Learning-based User-Preference-Aware Rate Adaptation for Video Streaming

Progressive online aggregation in a distributed stream system

Concurrent and Storage-Aware Data Streaming for Data Processing Workflows in Grid Environments

Adaptive Data Analysis for Growing Data

An adaptive algorithm for dealing with data stream evolution and singularity

UGG-DA:Uncertainty-Guided Gradual Distribution Adaptation and Dynamic Prediction with Streaming Data

Dancing with Shackles, Meet the Challenge of Industrial Adaptive Streaming Via Offline Reinforcement Learning

Adaptive Scheduling Framework of Streaming Applications based on Resource Demand Prediction with Hybrid Algorithms

Low-Delay Adaptive Video Streaming Based on Short-Term TCP Throughput Prediction

Optimized Adaptive Streaming Representations based on System Dynamics

Storage Aware Resource Allocation for Grid Data Streaming Pipelines

Batch Adaptative Streaming for Video Analytics

Agile Data Streaming for Grid Applications

A reliable adaptive prototype-based learning for evolving data streams with limited labels

Human-machine interactive streaming anomaly detection by online self-adaptive forest

AWStream

ESA-Stream: Efficient Self-Adaptive Online Data Stream Clustering

An On-the-Fly Scheduling Strategy for Distributed Stream Processing Platform.

A Hybrid Control Scheme for Adaptive Live Streaming