LCRC: A Dependency-Aware Cache Management Policy for Spark

Bo Wang,Jie Tang,Rui Zhang,Wei Ding,Deyu Qi
DOI: https://doi.org/10.1109/bdcloud.2018.00140
2018-01-01
Abstract:Memory is a constrained resource for in-memory big data computing systems. Efficient memory management plays a pivotal role in performance improvement for these systems. However, simple history-based cache replacement strategies, such as Least Recently Used (LRU), usually have poor performance when applied in cluster applications, due to their lack of data dependency knowledge. Even though Least Reference Count (LRC) can be aware of dependency by giving blocks with bigger reference count high priority to reside in memory. However, these blocks will not be accessed in some stages during their entire life cycle, leading to available memory deterioration. To eliminate this shortcoming, we propose LCRC, a dependency-aware cache management policy that considers both intra-stage and inter-stage dependency. By providing a prefetching mechanism, we can rewrite these inter-stages accessed blocks into memory before its next use. Experiments show that compared with previous methods the proposed mechanism can improve computing performance over 65%.
What problem does this paper attempt to address?