ROVEC: Runtime Optimization of Vectorized Expression Evaluation for Column Store.

Meng Li,Zheyu Miao,Di Wu,Feifei Li,Sheng Wang,Wei Cao,Zhi Qiao,Yubin Ruan,Yukun Liang,Jimmy Yang,Haipeng Dai,Guihai Chen
DOI: https://doi.org/10.1109/tkde.2021.3124669
IF: 9.235
2021-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Due to the increasing demand for scalable and interactive data analytics, column stores have become the de-facto choice in many analytical databases. As a common and fundamental operation in column stores, expression evaluation has a remarkable effect on many queries. To speed up expression evaluation, vectorized techniques such as Single-Instruction-Multiple-Data (SIMD) instructions are widely used. However, there are few works concerning dedicated optimizations for SIMD-based expression evaluation for column stores. In this paper, we propose a runtime optimization framework named ROVEC that enables effective optimizations for SIMD-based expression evaluation. The key idea is to optimize logical expression at execution time, by leveraging lightweight compression and fine-grained statistics associated with the compressed data. ROVEC removes unnecessary type casting and finds the tightest type during evaluation, which maximizes the concurrent operands in SIMD instructions. ROVEC can be applied to many expression-evaluation-intensive operators (e.g., table scan and theta join) for different data types (e.g., numeric, time and string). To validate the effectiveness of ROVEC, we integrate it into a columnar database PolarDB-C. Our evaluation results show that ROVEC improves up to 125% (60% on average) throughput of table scan and up to 50% (30% on average) latency of theta join.
What problem does this paper attempt to address?