DmRPC: Disaggregated Memory-aware Datacenter RPC for Data-intensive Applications

Jie Zhang,Xuzheng Chen,Yin Zhang,Zeke Wang
DOI: https://doi.org/10.1109/icde60146.2024.00291
2024-01-01
Abstract:Modern datacenter applications are increasingly being built using a microservices architecture. These microservices communicate with each other using datacenter RPCs. RPC's pass by value semantics incur redundant data movement along the network, especially for data-intensive applications. Naively introducing a shared global address space to datacenter RPC does not work as it would couple microservices and require microservices to handle data consistency, significantly complicating the development and deployment of applications. Fortunately, the modern datacenter is embracing disaggregated memory (DM). In a DM-enabled datacenter, servers running the microservices can be all connected to one global disaggregated memory pool, thus the pass by value semantics can be replaced by pass by reference. However, prior work on DM requires complicated synchronization primitives to share data across physical machines, so naively adopting them to datacenter RPC would harm microservices' agility and modularity. To this end, we present DmRPC, a DM-aware datacenter RPC for data-intensive datacenter applications to our knowledge. First, DmRPC introduces a DM-aware shared global address space to provide the semantics of pass by reference to datacenter RPC, thus alleviating the redundant data movement issue. Second, DmRPC adopts a copy-on-write mechanism to avoid complicating application logic to handle data consistency while guaranteeing high performance. We have applied DmRPC to two different implementations of DM, one is network-based (DmRPC-net) while the other is CXL-based (DmRPC-CXL). Our evaluations on synthetic 7-tier microservices workloads show that DmRPC-net (or DmRPC-CXL) achieves 4.2× (or 8.3×) higher throughput and achieves 1.1 × (or 1.7 ×) lower average latency than that of the baseline, respectively. On a widely used microservice benchmark DeathStarBench, DmRPC-net can achieve 3.1 × higher throughput and 2.5 × lower average latency than the baseline.
What problem does this paper attempt to address?