GEqO: ML-Accelerated Semantic Equivalence Detection

Brandon Haynes,Rana Alotaibi,Anna Pavlenko,Jyoti Leeka,Alekh Jindal,Yuanyuan Tian
DOI: https://doi.org/10.1145/3626710
2024-01-03
Abstract:Large scale analytics engines have become a core dependency for modern data-driven enterprises to derive business insights and drive actions. These engines support a large number of analytic jobs processing huge volumes of data on a daily basis, and workloads are often inundated with overlapping computations across multiple jobs. Reusing common computation is crucial for efficient cluster resource utilization and reducing job execution time. Detecting common computation is the first and key step for reducing this computational redundancy. However, detecting equivalence on large-scale analytics engines requires efficient and scalable solutions that are fully automated. In addition, to maximize computation reuse, equivalence needs to be detected at the semantic level instead of just the syntactic level (i.e., the ability to detect semantic equivalence of seemingly different-looking queries). Unfortunately, existing solutions fall short of satisfying these requirements.
Databases,Machine Learning
What problem does this paper attempt to address?