SAC: Dynamic Caching Upon Sketch for In-Memory Big Data Analytics

Mingtao Ji,Mingxian Zhou,Haodong Zou,Ming Tang,Zhuzhong Qian,Xiaoliang Wang
DOI: https://doi.org/10.1109/bigcom61073.2023.00032
2023-01-01
Abstract:Caching intermediate results in memory, instead of flushing them to disks, actually shortens the completion of big data analytics, because there is no need to reload them for follow-up computations. Constrained by the limited memory, traditional approaches only cache part of the results due to explicit user triggers or simply their accesses, but fail to capture instantaneous system dynamics, including the execution order of parallel stages as well as the current uncompleted dependencies per stage. The data to be cached has to capture such system dynamics and minimize job completion. We thus design a dynamic caching mechanism (SAC) for big data analytics, by using both static sketch upon the stage dependencies in jobs’ preparation and the dynamic adjustment of the such sketch during the job’s execution. The static sketch essentially determines a minimal subset of stages in each job for maximizing the caching benefit while the dynamic adjustment tries to change the caching priority among pending stages. We implement our proposed SAC in Spark, and extensive experiments upon real-world workloads show that the SAC reduces the average job completion by at least 24.6%, compared with those state-of-the-art alternatives.
What problem does this paper attempt to address?