Floe: A Continuous Dataflow Framework for Dynamic Cloud Applications

Yogesh Simmhan,Alok Kumbhare
DOI: https://doi.org/10.48550/arXiv.1406.5977
2014-06-24
Abstract:Applications in cyber-physical systems are increasingly coupled with online instruments to perform long running, continuous data processing. Such "always on" dataflow applications are dynamic, where they need to change the applications logic and performance at runtime, in response to external operational needs. Floe is a continuous dataflow framework that is designed to be adaptive for dynamic applications on Cloud infrastructure. It offers advanced dataflow patterns like BSP and MapReduce for flexible and holistic composition of streams and files, and supports dynamic recomposition at runtime with minimal impact on the execution. Adaptive resource allocation strategies allow our framework to effectively use elastic Cloud resources to meet varying data rates. We illustrate the design patterns of Floe by running an integration pipeline and a tweet clustering application from the Smart Power Grids domain on a private Eucalyptus Cloud. The responsiveness of our resource adaptation is validated through simulations for periodic, bursty and random workloads.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?