Characterizing Scientific Applications on Virtualized Cloud Platforms

fan zhang,majd sakr
DOI: https://doi.org/10.5339/qfarf.2011.csp26
2011-01-01
Qatar Foundation Annual Research Forum Proceedings
Abstract:In general, scientific applications require different types of computing resources based on the application's behavior and needs. For example, page indexing in an Arabic search engine requires sufficient network bandwidth to process millions of web pages while seismic modeling is CPU and graphics intensive for real-time fluid analysis and 3D visualization. As a potential solution, cloud computing, with its elastic, on-demand and pay-as-you-go model, can offer a variety of virtualized compute resources to satisfy the demands of various scientific applications. Currently, deploying scientific applications onto large-scale virtualized cloud computing platforms is based on a random mapping or some rule-of-thumb developed through past experience. Such provisioning and scheduling techniques cause overload or inefficient use of the shared underlying computing resources, while delivering little to no satisfactory performance guarantees. Virtualization, a core enabling technology in cloud computing, enables the coveted flexibility and elasticity yet it introduces several difficulties with resource mapping for scientific applications. In order to enable informed provisioning, scheduling and perform optimizations on cloud infrastructures while running scientific workloads, we propose the utilization of a profiling technique to characterize the resource need and behavior of such applications. Our approach provides a framework to characterize scientific applications based on their resource capacity needs, communication patterns, bandwidth needs, sensitivity to latency, and degree of parallelism. Although the programming model could significantly affect these parameters, we focus this initial work on characterizing applications developed using the MapReduce and Dryad programming models. We profile several applications, while varying the cloud configurations and scale of resources in order to study the particular resource needs, behavior and identify potential resources that limit performance. A manual and iterative process using a variety of representative input data sets is necessary to reach informative conclusions about the major characteristics of an application's resource needs and behavior. Using this information, we provision and configure a cloud infrastructure, given the available resources, to best target the given application. In this preliminary work, we show experimental results across a variety of applications and highlight the merit in precise application characterization in order to efficiently utilize the resources available across different applications.
What problem does this paper attempt to address?