A Genetic Algorithm Based Data Replica Placement Strategy for Scientific Applications in Clouds.

Lizhen Cui,Junhua Zhang,Lingxi Yue,Yuliang Shi,Hui Li,Dong Yuan
DOI: https://doi.org/10.1109/tsc.2015.2481421
IF: 11.019
2015-01-01
IEEE Transactions on Services Computing
Abstract:Cloud computing is a promising distributed computing platform for big data applications, e.g., scientific applications, since excessive resources can be obtained from cloud services for processing and storing both existing and generated application datasets. However, when tasks process big data stored in distributed data centers, the inevitable data movements will cause huge bandwidth cost and execution delay. In this paper, we construct a tripartite graph based model to formulate the data replica placement problem and propose a genetic algorithm based data replica placement strategy for scientific applications to reduce data transmissions in cloud. Our approach can reduce 1) the size of moved data, 2) the time of data movement and 3) the number of movements. We conduct experiments to compare the proposed strategy with the random placement strategy used in Hadoop Distributed Files System (HDFS), which demonstrates that our strategy has better performance for scientific applications in clouds.
What problem does this paper attempt to address?