DeFiHap

Yuetian Mao,Shuai Yuan,Nan Cui,Tianjiao Du,Beijun Shen,Yuting Chen
DOI: https://doi.org/10.14778/3476311.3476316
IF: 2.5
2021-01-01
Proceedings of the VLDB Endowment
Abstract:The emergence of Hive greatly facilitates the management of massive data stored in various places. Meanwhile, data scientists face challenges during HiveQL programming - they may not use correct and/or efficient HiveQL statements in their programs; developers may also introduce anti-patterns indeliberately into HiveQL programs, leading to poor performance, low maintainability, and/or program crashes. This paper presents an empirical study on HiveQL programming, in which 38 HiveQL anti-patterns are revealed. We then design and implement DeFiHap, the first tool for automatically detecting and fixing HiveQL anti-patterns. DeFiHap detects HiveQL anti-patterns via analyzing the abstract syntax trees of HiveQL statements and Hive configurations, and generates fix suggestions by rule-based rewriting and performance tuning techniques. The experimental results show that DeFiHap is effective. In particular, DeFiHap detects 25 anti-patterns and generates fix suggestions for 17 of them.
What problem does this paper attempt to address?