RIOS: Runtime Integrated Optimizer for Spark.

Youfu Li,Mingda Li,Ling Ding,Matteo Interlandi
DOI: https://doi.org/10.1145/3267809.3267814
2018-01-01
Abstract:Many Data-Intensive Scalable Computing (DISC) systems do not support sophisticated cost-based query optimizers because they lack the necessary data statistics. Consequently many crucial optimizations, such as join order and plan selection, are not well supported in DISC systems. RIOS is a Runtime Integrated Optimizer for Spark that lazily binds to execution plans at runtime, after collecting the statistics needed to make more optimal decisions. We evaluate the efficacy of our approach and show that better plans can be derived at runtime, achieving more than an order-of-magnitude performance improvement compared to compile time generated plans produced by the Apache Spark rule-base optimizer.
What problem does this paper attempt to address?