Log analysis in cloud computing environment with Hadoop and Spark

Xiuqin Lin,Peng Wang,Bin Wu
DOI: https://doi.org/10.1109/icbnmt.2013.6823956
2013-11-01
Abstract:Log is the main source of the system operation status, user behavior analysis etc. Log analysis system needs not only the massive and stable data processing ability but also the adaptation to a variety of scenarios under the requirement of efficiency, which can't be achieved from standalone analysis tools or even single cloud computing framework. We present a unified cloud platform for batch log data analysis with the combination of Hadoop and Spark. Hadoop provides a distributed file system and off-line batch computing framework, while the computing pattern in Spark is based on distributed memory. The joint of Hadoop, Spark and the data warehouse and analysis tools of Hive and Shark makes it possible to provide a unified cloud platform with batch analysis and in-memory computing capacity in order to process log in a high available, stable and efficient way.
What problem does this paper attempt to address?