A Generic Service to Provide In-Network Aggregation for Key-Value Streams

Yongchao He,Wenfei Wu,Yanfang Le,Ming Liu,ChonLam Lao
DOI: https://doi.org/10.1145/3575693.3575708
2023-01-01
Abstract:Key-value stream aggregation is a common operation in distributed systems, which requires intensive computation and network resources. We propose a generic in-network aggregation service for key-value streams, ASK, to accelerate the aggregation operations in diverse distributed applications. ASK is a switch-host co-designed system, where the programmable switch provides a best-effort aggregation service, and the host runs a daemon to interact with applications. ASK makes in-depth optimization tailored to traffic characteristics, hardware restrictions, and network unreliable natures: it vectorizes multiple key-value tuples' aggregation of one packet in one switch pipeline pass, which improves the per-host's goodput; it develops a lightweight reliability mechanism for keyvalue stream's asynchronous aggregation, which guarantees computation correctness; it designs a hot-key agnostic prioritization for key-skewed workloads, which improves the switch memory utilization. We prototype ASK and use it to support Spark and BytePS. The evaluation shows that ASK could accelerate pure keyvalue aggregation tasks by up to 155 times and big data jobs by 3-5 times, and be backward compatible with existing INA-empowered distributed training solutions with the same speedup.
What problem does this paper attempt to address?