Programming Visual and Script-based Big Data Analytics Workflows on Clouds

Loris Belcastro,Fabrizio Marozzo,D. Talia,Paolo Trunfio
DOI: https://doi.org/10.3233/978-1-61499-583-8-18
Abstract:Data analysis applications often include large datasets and complex software systems in which multiple data processing tools are executed in a coordinated way. Data analysis workflows are effective in expressing task coordination and they can be designed through visualand script-based programming paradigms. The Data Mining Cloud Framework (DMCF) supports the design and scalable execution of data analysis applications on Cloud platforms. A workflow in DMCF can be developed using a visualor a script-based language. The visual language, called VL4Cloud, is based on a design approach for high-level users, e.g., domain expert analysts having a limited knowledge of programming paradigms. The script-based language JS4Cloud is provided as a flexible programming paradigm for skilled users who prefer to code their workflows through scripts. Both languages implement a data-driven task parallelism that spawns ready-to-run tasks to Cloud resources. In addition, they exploit implicit parallelism that frees users from duties like workload partitioning, synchronization and communication. In this chapter, we present the DMCF framework and discuss how its workflow paradigm has been integrated with the MapReduce model. In particular, we describe how VL4Cloud/JS4Cloud workflows can include MapReduce tools, and how these workflows are executed in parallel on DMCF enabling scalable data processing on Clouds.
What problem does this paper attempt to address?