Abstract:Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans for a query optimizer in DBMS. In the last decade, an increasing number of advanced CardEst methods (especially ML-based) have been proposed with outstanding estimation accuracy and inference latency. However, there exists no study that systematically evaluates the quality of these methods and answer the fundamental problem: to what extent can these methods improve the performance of query optimizer in real-world settings, which is the ultimate goal of a CardEst method. In this paper, we comprehensively and systematically compare the effectiveness of CardEst methods in a real DBMS. We establish a new benchmark for CardEst, which contains a new complex real-world dataset STATS and a diverse query workload STATS-CEB. We integrate multiple most representative CardEst methods into an open-source DBMS PostgreSQL, and comprehensively evaluate their true effectiveness in improving query plan quality, and other important aspects affecting their applicability. We obtain a number of key findings under different data and query settings. Furthermore, we find that the widely used estimation accuracy metric (Q-Error) cannot distinguish the importance of different sub-plan queries during query optimization and thus cannot truly reflect the generated query plan quality. Therefore, we propose a new metric P-Error to evaluate the performance of CardEst methods, which overcomes the limitation of Q-Error and is able to reflect the overall end-to-end performance of CardEst methods. It could serve as a better optimization objective for future CardEst methods.

Study and improvement on equivalence classes of PostgreSQL query optimization

Optimization Factor Analysis Of Large-Scale Join Queries On Different Platforms

Schema-Driven Performance Evaluation for Highly Concurrent Scenarios.

COMPARISON OF METHODS FOR THE QUERY PLAN SELECTION PROBLEM IN A POSTGRESQL RELATIONAL DATABASE

Comparing Oracle and PostgreSQL, Performance and Optimization

Optimizing Window Aggregate Functions in Relational Database Systems

Expressing And Optimizing Similarity-Based Queries In Sql

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Efficient Query Re-optimization with Judicious Subquery Selections

Analysis of the possibilities of optimizing SQL queries

New Distributed Spatial Query Optimization Approach by Using Query Analyzer

Query optimization for massively parallel data processing.

Cache Associativity on the Performance of Database Systems :Problems,and Optimization Strategies

First Past the Post: Evaluating Query Optimization in MongoDB

Cardinality Estimation in DBMS

Studies on Some Modern Optimization Algorithms

Design and Implementation of OSCAR Query Optimizer

Learned Query Optimizers: Evaluation and Improvement

Joint Optimization of Cost and Coverage of Query Plans in Data Integration

Cost-based or Learning-based? A Hybrid Query Optimizer for Query Plan Selection

Query optimization mechanisms in the cloud environments: A systematic study