Optimization of RDD Cache Replacement Strategy Optimization in Spark Framework

Tian-yu CHEN,Long-xin ZHANG,Ken-li LI,Li-qian ZHOU
DOI: https://doi.org/10.3969/j.issn.1000-1220.2019.06.020
2019-01-01
Abstract:As a distributed computing engine,Spark’s memory-based abstract concept elastic distributed data set ( RDD) produces efficient data processing capabilities. The computational memory is limited in real world,RDD needs to be replaced due to its insuffi-cient memory. The default Least Recently Used algorithm (LRU) in Spark only considers whether RDD partitions are used recently while ignoring other factors. The existing WR cache eviction strategy for weighted RDD value emphasized on weighted RDD substitu-tion. Based on these studies,a Cache weight replacement strategy ( CWS) is presented in this paper,which optimized the selection strategy while considering the historical access time and the computation cost in the replacement phase. In addition,the experiments in this paper are carried out with the open network analysis project provided by Stanford University. The experiment results show that the average execution time of the CWS algorithm for processing small data under sufficient memory conditions performs 2.4% higher than that of WR algorithm,and the memory usage is reduced by 36% .
What problem does this paper attempt to address?