Software-Defined Far Memory in Warehouse-Scale Computers

Andres Lagar-Cavilla,Junwhan Ahn,Suleiman Souhlal,Neha Agarwal,Radoslaw Burny,Shakeel Butt,Jichuan Chang,Ashwin Chaugule,Nan Deng,Junaid Shahid,Greg Thelen,Kamil Adam Yurtsever,Yu Zhao,Parthasarathy Ranganathan
DOI: https://doi.org/10.1145/3297858.3304053
2019-04-04
Abstract:Increasing memory demand and slowdown in technology scaling pose important challenges to total cost of ownership (TCO) of warehouse-scale computers (WSCs). One promising idea to reduce the memory TCO is to add a cheaper, but slower, "far memory" tier and use it to store infrequently accessed (or cold) data. However, introducing a far memory tier brings new challenges around dynamically responding to workload diversity and churn, minimizing stranding of capacity, and addressing brownfield (legacy) deployments. We present a novel software-defined approach to far memory that proactively compresses cold memory pages to effectively create a far memory tier in software. Our end-to-end system design encompasses new methods to define performance service-level objectives (SLOs), a mechanism to identify cold memory pages while meeting the SLO, and our implementation in the OS kernel and node agent. Additionally, we design learning-based autotuning to periodically adapt our design to fleet-wide changes without a human in the loop. Our system has been successfully deployed across Google's WSC since 2016, serving thousands of production services. Our software-defined far memory is significantly cheaper (67% or higher memory cost reduction) at relatively good access speeds (6us) and allows us to store a significant fraction of infrequently accessed data (on average, 20%), translating to significant TCO savings at warehouse scale.
What problem does this paper attempt to address?