Utilizing a NoSQL Data Store for Scalable Log Analysis

Khalid Mahmood,Tore Risch,Minpeng Zhu
DOI: https://doi.org/10.1145/2790755.2790772
2014-01-01
Abstract:A potential problem for persisting large volume of data logs with a conventional relational database is that loading massive logs produced at high rates is not fast enough due to the strong consistency model and high cost of indexing. As a possible alternative, a modern NoSQL data store, which sacrifices transactional consistency to achieve higher performance and scalability, can be utilized. In this paper, we investigate to what degree a state-of-the-art NoSQL database can achieve high performance persisting and fundamental analyses of large-scale data logs from real world applications. For the evaluation, a state-of-the-art NoSQL database, MongoDB, is compared with a relational DBMS from a major commercial vendor and with a popular open source relational DBMS. MongoDB is chosen as it provides both primary and secondary indexing compared to other popular NoSQL systems. These indexing techniques are essential for scalable processing of queries over large scale data logs. To explore the impact of parallelism on query execution, sharding was investigated for MongoDB. Our results revealed that relaxing the consistency did not provide substantial performance enhancement in persisting large-scale data logs for any of the systems. However, for high-performance loading and analysis of data logs, MongoDB is shown to be a viable alternative compared to relational databases for queries where the choice of an optimal execution plan is not critical.
What problem does this paper attempt to address?