A Control-based Approach Towards Adaptive Stream Processing Student :
Luo Mai,Kai Zeng,Rahul Potharaju,Paolo Costa,Sriram Rao
2017-01-01
Abstract:Large-scale Internet-service providers such as Amazon, Google, Facebook, and Microsoft generate tens of millions of data events per second (Bailis et al. 2017). To handle such high throughput, they have traditionally resorted to offline batch systems, e.g., Spark SQL (Armbrust et al. 2015) and Hadoop MapReduce (Dean and Ghemawat 2008). More recently, however, there has been an increasing trend towards switching to online streaming systems to ensure timely processing and avoid the delays incurred by batching (Jindal et al. 2017; Meehan et al. 2017; Abraham et al. 2013). Fully achieving the benefits promised by these online systems, however, proved particularly challenging. To start with, event-based workloads exhibit high temporal and spatial variability, up to an order of magnitude compared to the average load (Kulkarni et al. 2015; NetFlix 2016). Further, due to the large number of servers involved, failures and hardware heterogeneity makes it hard to ensure stable and predictable performance. Together these issues significantly complicate resource provisioning, forcing system administrators to over-provision resources, with the obvious negative consequences on cost and complexity. An alternative and more efficient approach would be to dynamically modify the system reconfiguration (e.g., by adding/removing resources or by redistributing operators) whenever the workload or the environment changes. This would entail extending the streaming platform with a set of control policies and mechanisms that can reconfigure the