Resource Estimation in Distributed Data Stream Processing Systems

Minglu Fan,Yi Liang,Fei Liu,Mangmang Yang,Haihua Wang
DOI: https://doi.org/10.2991/wartia-16.2016.361
2016-01-01
Abstract:Distributed data stream processing systems(DSPS) are widely used in real-time massive data processing scenarios for its characteristics of real-time and high throughout. In the real-world DSPS, the fluctuating arrival rate of the input data leads the consuming computing resource of DSPS to be time-variable. To guarantee the performance of DSPS, the accurate prediction of DSPS's consuming resources is necessary. In this paper, we proposed approaches to make the online prediction of computing resources that DSPS consumes. We monitor the usage of computing resources such as CPU and memory in a DSPS, and use temporal data streams clustering algorithm and linear regression method to make online prediction of CPU resources and memory resources respectively. Our prediction approaches are proved efficient and quickly enough.
What problem does this paper attempt to address?