Object Proxy Patterns for Accelerating Distributed Applications

J. Gregory Pauloski,Valerie Hayot-Sasson,Logan Ward,Alexander Brace,André Bauer,Kyle Chard,Ian Foster
2024-07-02
Abstract:Workflow and serverless frameworks have empowered new approaches to distributed application design by abstracting compute resources. However, their typically limited or one-size-fits-all support for advanced data flow patterns leaves optimization to the application programmer -- optimization that becomes more difficult as data become larger. The transparent object proxy, which provides wide-area references that can resolve to data regardless of location, has been demonstrated as an effective low-level building block in such situations. Here we propose three high-level proxy-based programming patterns -- distributed futures, streaming, and ownership -- that make the power of the proxy pattern usable for more complex and dynamic distributed program structures. We motivate these patterns via careful review of application requirements and describe implementations of each pattern. We evaluate our implementations through a suite of benchmarks and by applying them in three substantial scientific applications, in which we demonstrate substantial improvements in runtime, throughput, and memory usage.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
This paper aims to solve the performance and optimization problems encountered by distributed applications when processing large - scale data streams. Specifically, although the existing workflow and server - less frameworks provide flexibility and scalability for distributed application design, their support for complex data flow patterns is limited or too general, causing application developers to need to perform optimization on their own, especially when the amount of data increases, the optimization becomes more difficult. The paper proposes to solve these problems by introducing three advanced programming patterns based on proxies (proxy - based), thereby accelerating and simplifying the development of complex distributed program structures: 1. **Distributed Futures**: This pattern allows data dependencies to be seamlessly injected into any computational task to overlap computation and communication, thereby improving efficiency. 2. **Object Streaming**: This pattern decouples event notifications from bulk data transfer, enabling data producers to unilaterally determine the optimal transfer method and reducing bottlenecks in the intermediate scheduling process. 3. **Ownership Model**: This pattern provides a client - side mechanism to manage the object life cycle and prevent data races in the distributed task workflow. These patterns work together and can significantly improve runtime, throughput, and memory usage, especially when dealing with large - scale scientific applications. For example, it reduces the workflow completion time by 36% in the 1000 Genomes project, improves the inference latency by 32% in the DeepDriveMD project, and optimizes the proxy life - cycle management in MOF Generation. Through these improvements, the paper shows how to use the proxy pattern to effectively manage and optimize data streams in a distributed computing environment, thereby achieving more efficient task execution and resource utilization.