Resource-Aware Cache Management for In-Memory Data Analytics Frameworks.

Zhengyang Zhao,Haitao Zhang,Xin Geng,Huadong Ma
DOI: https://doi.org/10.1109/ispa-bdcloud-sustaincom-socialcom48970.2019.00060
2019-01-01
Abstract:Continuously increasing amounts of data have led to the emergence of distributed in-memory computing systems for higher data processing speeds. Finding an efficient memory management method has become key to improving the performance of these systems because of the confined memory resource. Recent research in dependency-aware caching, e.g., Least Reference Count (LRC) and Least Composition Reference Count (LCRC), has enabled significant improvements in efficiency by profiling the application's Directed Acyclic Graph (DAG). However, these methods do not consider the dynamic occupancy mechanism of the Unified Memory Manager (UMM) in Spark, which could cause heavily referenced and costly data blocks to be evicted and produce high recomputing overhead. To alleviate this defect, we propose a resource-aware cache management approach that uses both runtime resource metrics and dependency information. By applying an adaptive approach, we can retain the data blocks that have greater contributions to obtain the final results. We demonstrate the effectiveness of our cache management approach through a series of widely used benchmarks. The experimental results show that compared with current DAG-aware implementations, our approach improves performance by an average of 15% and up to 24%. Compared with the default Least Recently Used (LRU) strategy in Spark, our implementation reduces job runtime by 28% on average and by up to 61%.
What problem does this paper attempt to address?