Towards Unified Ad-Hoc Data Processing

Xiaogang Shi,Bin Cui,Gillian Dobbie,Beng Chin Ooi
DOI: https://doi.org/10.1145/2588555.2610492
2014-01-01
Abstract:It is important to provide efficient execution for ad-hoc data processing programs. In contrast to constructing complex declarative queries, many users prefer to write their programs using procedural code with simple queries. As many users are not expert programmers, their programs usually exhibit poor performance in practice and it is a challenge to automatically optimize these programs and efficiently execute the programs. In this paper, we present UniAD, a system designed to simplify the programming of data processing tasks and provide efficient execution for user programs. We propose a novel intermediate representation named UniQL which utilizes HOQs to describe the operations performed in programs. By combining both procedural and declarative logics, we can perform various optimizations across the boundary between procedural and declarative codes. We describe optimizations and conduct extensive empirical studies using UniAD. The experimental results on four benchmarks demonstrate that our techniques can significantly improve the performance of a wide range of data processing programs.
What problem does this paper attempt to address?