Network Log Analysis with SQL-on-Hadoop

ZHANG Si-yu,JIANG Kai-da,WEI Jian-wen,LUO Xuan,WANG Hai-yang
DOI: https://doi.org/10.3969/j.issn.1000-436x.2014.z1.004
2014-01-01
Abstract:With the rapid expansion of network bandwidth, devices and applications, log management is facing the chal-lenge of exploding data volumes. Log analysis platform built on SQL-on-Hadoop is capable of storing and querying hun-dreds of billions of log entries effectively. Columnar and compressed data formats for Hadoop are benchmarked with real-world multi-TB dataset. Conditional and statistical querying efficiency of Hive and Impala is tested. With gzipped parquet format, log data can be compressed by 80%, and querying with impala is 5 times faster. On this platform, six se-curity incident analysis and detection applications are already deployed.
What problem does this paper attempt to address?