X-TED: Massive Parallelization of Tree Edit Distance
Dayi Fan,Rubao Lee,Xiaodong Zhang
DOI: https://doi.org/10.14778/3654621.3654634
IF: 2.5
2024-03-01
Proceedings of the VLDB Endowment
Abstract:The tree edit distance (TED) has been found in a wide spectrum of applications in artificial intelligence, bioinformatics, and other areas, which serves as a metric to quantify the dissimilarity between two trees. As applications continue to scale in data size, with a growing demand for fast response time, TED has become even more increasingly data- and computing-intensive. Over the years, researchers have made dedicated efforts to improve sequential TED algorithms by reducing their high complexity. However, achieving efficient parallel TED computation in both algorithm and implementation is challenging due to its dynamic programming nature involving non-trivial issues of data dependency, runtime execution pattern changes, and optimal utilization of limited parallel resources. Having comprehensively investigated the bottlenecks in the existing parallel TED algorithms, we develop a massive parallel computation framework for TED and its implementation on GPU, which is called X-TED. For a given TED computation, X-TED applies a fast preprocessing algorithm to identify dependency relationships among millions of dynamic programming tables. Subsequently, it adopts a dynamic parallel strategy to handle various processing stages, aiming to best utilize GPU cores and the limited device memory in an adaptive and automatic way. Our intensive experimental results demonstrate that X-TED surpasses all existing solutions, achieving up to 42x speedup over the state-of-the-art sequential AP-TED, and outperforming the existing multicore parallel MC-TED by an average speedup of 31x.
computer science, information systems, theory & methods