TriJoin: A Time-Efficient and Scalable Three-Way Distributed Stream Join System

Shuiying Yu,Yinting Zheng,Fan Zhang,Hanhua Chen,Hai Jin
DOI: https://doi.org/10.53106/160792642023032402024
2023-01-01
Abstract:Stream join is one of the most fundamental operations in data stream processing applications. Existing distribut-ed stream join systems can support efficient two-way join, which is a join operation between two streams. Based the two-way join, implementing a three-way join require to be split into double two-way joins, where the second two-way join needs to wait for the join result transmitted from the first two-way join. We show through experiments that such a design raises prohibitively high processing latency. To solve this problem, we propose TriJoin, a time-efficient three-way distributed stream join system. We design a symmetric wait -free structure by symmetrically partitioning tuples and reused join. TriJoin utilizes reused join to join each new tuple with the intermediate result of the other two streams and stored tuples locally. For a new tuple, TriJoin only joins it with the intermediate result to generate the final result without wait-ing, greatly reducing the processing latency. In TriJoin, we design two partitioning and storage schemes according to two different forms of three-way stream join. We implement TriJoin and conduct comprehensive experiments to evaluate the performance using real-world traces. Results show that TriJoin significantly reduces the processing latency by up to 68%, compared to existing designs.
What problem does this paper attempt to address?