Coral: Federated Query Join Order Optimization Based on Deep Reinforcement Learning

Rong Gu,Yi Zhang,Liangliang Yin,Lingyi Song,Wenjie Huang,Chunfeng Yuan,Zhaokang Wang,Guanghui Zhu,Yihua Huang
DOI: https://doi.org/10.1007/s11280-023-01156-0
2023-01-01
World Wide Web
Abstract:The rise of diversified data engines has created the need for federated queries. A federated query can take a query and provide data analysis based on data from various data engines. Since the query data originates from multiple data engines, federated queries usually rely on join operation and data migration to complete the query and take a long time. The challenges of optimizing federated queries lie on join order selection and data migration coordination. However, enumerating all join orders is impractical because the set of join orders grows exponentially with the number of relations to be joined. To improve the performance of federated queries, we present a deep reinforcement learning-based approach on optimizing join order and join engine selection for federated queries and design an deep Q-network-based (DQN-based) optimizer. The DQN-based optimizer can generate join search policies that optimize the join order selection for datasets with a given cost model. Based on the DQN-based optimizer, we implement a federated query system Coral which can provide optimization for join order selection of federated queries. With the optimized join order, Coral can transform a federated query into a set of subqueries which will be assigned to and executed on different data engines. We also propose a subquery cache optimization to optimize data migration during the query execution. The extensive experimental evaluation demonstrates that Coral can significantly reduce the query latency of federated queries and achieve a speedup of up to 5.03 × compared to the cutting-edge federated query systems.
What problem does this paper attempt to address?