Towards Dependency-Aware Cache Management for Data Analytics Applications

Yinghao Yu,Chengliang Zhang,Wei Wang,Jun Zhang,K. Letaief
DOI: https://doi.org/10.1109/tcc.2019.2945015
IF: 5.697
2019-10-01
IEEE Transactions on Cloud Computing
Abstract:Memory caches are being used aggressively in today's data analytics systems such as Spark, Tez, and Piccolo. The significant performance impact of caches and their limited sizes call for efficient cache management in data analytics clusters. However, prevalent data analytics systems employ rather simple cache management policies—notably Least Recently Used (LRU) and Least Frequently Used (LFU)—that are oblivious to the application semantics of data dependency, expressed as directed acyclic graphs (DAGs). Without this knowledge, cache management can, at best, be performed by “guessing” the future data access patterns based on history, which frequently results in inefficient, erroneous caching with a low hit rate and a long response time. Worse still, the lack of data dependency knowledge makes it impossible to retain the all-or-nothing cache property of cluster applications, in that a compute task cannot be sped up unless all the dependent data has been kept in the main memory. In this paper, we propose a novel cache replacement policy, named Least Reference Count (LRC), which exploits the application's data dependency information to optimize the cache management. LRC keeps track of the reference count of each data block, defined as the number of dependent child blocks that have not been computed yet, and always evicts the block with the smallest reference count. Furthermore, we incorporate the all-or-nothing requirement into LRC by coordinately managing the reference counts of all the input data blocks for the same computation. We demonstrate the efficacy of LRC through both empirical analysis and cluster deployments against popular benchmarking workloads. Our Spark implementation shows that, the proposed policies well address the all-or-nothing requirement and significantly improve the cache performance. Compared with LRU and a recently proposed caching policy called MEMTUNE, LRC improves the caching performance of typical workloads in production clusters by 22 and 284 percent, respectively.
Computer Science
What problem does this paper attempt to address?