LMStream: When Distributed Micro-Batch Stream Processing Systems Meet GPU

Suyeon Lee,Sungyong Park
DOI: https://doi.org/10.48550/arXiv.2111.04289
2021-11-08
Abstract:This paper presents LMStream, which ensures bounded latency while maximizing the throughput on the GPU-enabled micro-batch streaming systems. The main ideas behind LMStream's design can be summarized as two novel mechanisms: (1) dynamic batching and (2) dynamic operation-level query planning. By controlling the micro-batch size, LMStream significantly reduces the latency of individual dataset because it does not perform unconditional buffering only for improving GPU utilization. LMStream bounds the latency to an optimal value according to the characteristics of the window operation used in the streaming application. Dynamic mapping between a query to an execution device based on the data size and dynamic device preference improves both the throughput and latency as much as possible. In addition, LMStream proposes a low-overhead online cost model parameter optimization method without interrupting the real-time stream processing. We implemented LMStream on Apache Spark, which supports micro-batch stream processing. Compared to the previous throughput-oriented method, LMStream showed an average latency improvement up to a maximum of 70.7%, while improving average throughput up to 1.74x.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to maximize throughput while ensuring low latency in a GPU - accelerated distributed micro - batch streaming processing system. Specifically, the existing micro - batch streaming processing models improve GPU utilization by unconditionally buffering data, which leads to an unlimited increase in the latency of a single data set, thus affecting the performance of real - time streaming processing applications. The paper proposes a new method named LMStream, which ensures the boundedness of latency and improves the overall throughput of the system by dynamically adjusting the micro - batch size and the execution device (CPU or GPU). ### Core Problems of the Paper 1. **Trade - off between Latency and Throughput**: In a streaming processing system, latency and throughput are usually contradictory. To increase throughput, existing methods usually buffer data unconditionally, but this will lead to a significant increase in latency. The goal of the paper is to increase the throughput of the system as much as possible without sacrificing latency. 2. **Dynamic Micro - batch Control**: The traditional micro - batch model uses a fixed trigger time to decide when to process data, which leads to an uncontrollable growth in latency. The paper proposes a dynamic micro - batch control mechanism, which dynamically adjusts the size of the micro - batch according to the characteristics of window operations to ensure that the latency remains within a reasonable range. 3. **Effective Query Plan**: To further optimize latency and throughput, the paper proposes an operation - level query plan mechanism, which dynamically selects an appropriate execution device (CPU or GPU) according to the size of the data. This can reduce the total processing time and improve the performance of the system at the same time. 4. **Online Parameter Optimization**: When a streaming processing application starts running, the system has no prior information about the characteristics of the workload. The paper proposes a low - overhead online cost model parameter optimization method, which can dynamically adjust system parameters to adapt to different workload types without affecting real - time streaming processing. ### Main Contributions - **Dynamic Micro - batch Mechanism**: LMStream proposes a dynamic micro - batch mechanism, which ensures the boundedness of latency by dynamically adjusting the size of the micro - batch. - **Effective Operation - level Query Plan**: LMStream reduces the processing time and improves throughput and latency at the same time by dynamically selecting an appropriate execution device (CPU or GPU). - **Online Parameter Optimization**: LMStream implements a low - overhead online parameter optimization method, which can dynamically adjust system parameters without interrupting real - time streaming processing. - **Implementation in a Real System**: The paper implements LMStream on Apache Spark and verifies its effectiveness and performance improvement through a variety of real - world streaming processing benchmark tests. ### Specific Formulas 1. **Objective Function**: - Maximize the average throughput: \[ \max_{i} \text{AvgThPut}_i \] - Constraints: \[ \text{MaxLat}_i < \text{SlideTime} \quad (\text{when} \text{SlideTime} > 0) \] \[ \text{MaxLat}_i \leq \sum_{k = 0}^{i - 1} \text{MaxLat}_k \quad (\text{when} \text{SlideTime} = 0) \] 2. **Definitions of Throughput and Latency**: - Average throughput: \[ \text{AvgThPut}_i=\frac{\sum_{k = 0}^{i} \sum_{j = 0}^{\text{NumCores}} \text{Part}(k, j)}{\sum_{k = 0}^{i} \text{Proc}_k} \] - Maximum latency: \[ \text{MaxLat}_i=\max_{j \in \text{NumDS}_i} (\text{Buff}(i, j))