Dynamic Replicas Strategy Based on Predicted Popularity

Zhang Son
Abstract:To improve data availability and performance of cluster,current HDFS adapt uniform data replication. However,different files have different popularity and sometimes the disparity is enormous,access to high popular data may hurt job performance. To address this problem,a dynamic replicas strategy based on predicted popularity is put forward. By making full use of the recent data popularity,based on grey prediction model,we use Markov prediction model to correct the predicted deviation because of the burst access and shifting access,and get the accurate predicted popularity of file. After then,finite channel service model based on the predicted popularity is established to calculate the minimum replicas meeting user demand. Experimental result shows that compared with default data replication,our strategy can more effectively avoid contentions,reduce the time consuming of job,and alleviated the network traffic.
Computer Science
What problem does this paper attempt to address?