First Past the Post: Evaluating Query Optimization in MongoDB

Dawei Tao,Enqi Liu,Sidath Randeni Kadupitige,Michael Cahill,Alan Fekete,Uwe Röhm
2024-09-25
Abstract:Query optimization is crucial for every database management system (DBMS) to enable fast execution of declarative queries. Most DBMS designs include cost-based query optimization. However, MongoDB implements a different approach to choose an execution plan that we call "first past the post" (FPTP) query optimization. FPTP does not estimate costs for each execution plan, but rather partially executes the alternative plans in a round-robin race and observes the work done by each relative to the number of records returned. In this paper, we analyze the effectiveness of MongoDB's FPTP query optimizer. We see whether the optimizer chooses the best execution plan among the alternatives and measure how the chosen plan compares to the optimal plan. We also show how to visualize the effectiveness and identify situations where the MongoDB 7.0.1 query optimizer chooses suboptimal query plans. Through experiments, we conclude that FPTP has a preference bias, choosing index scans even in many cases where collection scans would run faster. We identify the reasons for the preference bias, which can lead MongoDB to choose a plan with more than twice the runtime compared to the optimal plan for the query.
Databases
What problem does this paper attempt to address?
The paper attempts to address the issue of evaluating the effectiveness of the "First Past the Post" (FPTP) query optimizer in MongoDB. Specifically, the authors aim to experimentally analyze whether the FPTP optimizer can select the best execution plan and measure the performance gap between the chosen plan and the optimal plan. Additionally, the paper explores the reasons why the FPTP optimizer might choose suboptimal query plans in certain situations. ### Main Research Questions: 1. **Effectiveness of FPTP Optimizer**: Can the FPTP optimizer select the best execution plan among multiple candidate plans? 2. **Performance Comparison**: How does the performance of the execution plan chosen by the FPTP optimizer compare to the optimal plan? 3. **Preference Bias**: Does the FPTP optimizer exhibit preference bias, such as favoring index scans over collection scans in certain situations? 4. **Cause Analysis**: What are the reasons behind the FPTP optimizer choosing suboptimal plans? ### Research Methodology: - **Experimental Design**: The authors designed a series of experiments using queries with different selectivities to test the performance of the FPTP optimizer. - **Dataset**: The dataset used in the experiments contains 100,000 documents, each with two fields A and B, with uniformly distributed and non-repetitive field values. - **Query Template**: Conjunctive range queries with two range predicates were used to control the selectivity of the queries. - **Performance Metrics**: The effectiveness of the optimizer's choices was evaluated by recording the actual execution time of each query, including accuracy and performance impact. ### Experimental Results: - **Accuracy**: The FPTP optimizer failed to select the optimal execution plan in certain cases. - **Performance Impact**: The plans chosen by the FPTP optimizer were more than twice as slow as the optimal plans in some cases. - **Preference Bias**: The FPTP optimizer exhibited a clear preference bias, tending to choose index scans even when collection scans were faster in some situations. ### Conclusion: - **Limitations of the FPTP Optimizer**: Although the FPTP optimizer performs well for some simple queries, its chosen plans can be far from optimal for complex queries and specific data distributions. - **Improvement Suggestions**: The authors suggest further improvements to the FPTP optimizer to reduce preference bias and enhance query performance. This paper provides a detailed experimental evaluation of MongoDB's FPTP query optimizer, revealing its limitations in selecting the optimal execution plan and offering suggestions for improvement. These findings are significant for the design and optimization of database management systems.