A review on big data real-time stream processing and its scheduling techniques

Nicoleta Tantalaki,Stavros Souravlas,Manos Roumeliotis
DOI: https://doi.org/10.1080/17445760.2019.1585848
2019-03-01
International Journal of Parallel, Emergent and Distributed Systems
Abstract:Over the last decade, several interconnected disruptions have happened in the large scale distributed and parallel computing landscape. The volume of data currently produced by various activities of the society has never been so big and is generated at an increasing speed. Data that is received in real-time can become way too valuable at the time it arrives and supports valuable decision making. Systems for managing data streams is not a recently developed concept but its becoming more important due to the multiplication of data stream sources in the context of IoT. This paper refers to the unique processing challenges posed by the nature of streams, and the related mechanisms used to face them in the big data era. Several cloud systems emerged to enable distributed processing of streams of big data. Distributed stream management systems (DSMS) along with their strengths and limitations are presented and compared. Computations in these systems demand elaborate orchestration over a collection of machines. Consequently, a classification and literature review on these systems' scheduling techniques and their enhancements is also provided.
What problem does this paper attempt to address?