Anomaly Detection for Network Connection Logs

Swapneel Mehta,Prasanth Kothuri,Daniel Lanza Garcia
DOI: https://doi.org/10.48550/arXiv.1812.01941
2018-12-01
Abstract:We leverage a streaming architecture based on ELK, Spark and Hadoop in order to collect, store, and analyse database connection logs in near real-time. The proposed system investigates outliers using unsupervised learning; widely adopted clustering and classification algorithms for log data, highlighting the subtle variances in each model by visualisation of outliers. Arriving at a novel solution to evaluate untagged, unfiltered connection logs, we propose an approach that can be extrapolated to a generalised system of analysing connection logs across a large infrastructure comprising thousands of individual nodes and generating hundreds of lines in logs per second.
Machine Learning,Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?