Leveraging Apache Arrow for Zero-copy, Zero-serialization Cluster Shared Memory

Philip Groet,Joost Hoozemans,Andreas Grapentin,Felix Eberhardt,Zaid Al-Ars,H. Peter Hofstee
2024-04-04
Abstract:This paper describes a distributed implementation of Apache Arrow that can leverage cluster-shared load-store addressable memory that is hardware-coherent only within each node. The implementation is built on the ThymesisFlow prototype that leverages the OpenCAPI interface to create a shared address space across a cluster. While Apache Arrow structures are immutable, simplifying their use in a cluster shared memory, this paper creates distributed Apache Arrow tables and makes them accessible in each node.
Emerging Technologies
What problem does this paper attempt to address?