Analysis farm: A cloud-based scalable aggregation and query platform for network log analysis
Jianwen Wei,Yusu Zhao,Kaida Jiang,Rui Xie,Yaohui Jin
DOI: https://doi.org/10.1109/CSC.2011.6138547
2011-01-01
Abstract:Network monitoring data provides insights into the network operation status. With increasingly sophisticated ways of probing, sampling and recording network activities, the huge amount of monitoring data brings both an opportunity and a challenge for network data analysis. We aim to build a scalable platform, named Analysis Farm, for analyzing network logs. Analysis Farm's targets include fast log aggregation and agile log query. To achieve these goals, storage scalability, computation scalability and query agility should be addressed. The cloud computing and NoSQL technologies meet our needs by providing manageable on-demand hardware resources and novel data storage models. We choose OpenStack, an open-source cloud tool set, for resource provisioning, and MongoDB, a RDBMS-like document-oriented NoSQL system, for log storage and analysis. By combining scalability at both OpenStack and MongoDB, we build Analysis Farm capable of storage scale-out, computation scale-out and agile query. The Analysis Farm prototype in use, consisting of 10 MongoDB servers, aggregates about 3 million log records in a 10-minute interval and handle ad hoc query effectively in the log database accumulated with more than 400 million records per day. In this paper, we describe Analysis Farm's background, targets, architecture and some experimental results. We believe Analysis Farm will benefit those who work on big-log-style data analysis.