Challenging SQL-on-Hadoop Performance with Apache Druid

José Correia,Carlos Costa,Maribel Yasmina Santos
DOI: https://doi.org/10.1007/978-3-030-20485-3_12
2019-01-01
Business Information Systems
Abstract:In Big Data, SQL-on-Hadoop tools usually provide satisfactory performance for processing vast amounts of data, although new emerging tools may be an alternative. This paper evaluates if Apache Druid, an innovative column-oriented data store suited for online analytical processing workloads, is an alternative to some of the well-known SQL-on-Hadoop technologies and its potential in this role. In this evaluation, Druid, Hive and Presto are benchmarked with increasing data volumes. The results point Druid as a strong alternative, achieving better performance than Hive and Presto, and show the potential of integrating Hive and Druid, enhancing the potentialities of both tools.
What problem does this paper attempt to address?