Efficient tracking of significant communication patterns in computer networks
Dah Ming Chiu,Xingang Shi
2011-01-01
Abstract:The scale and complexity of today's networks are increasing at a staggering pace, and so are the characteristics of data traffic and diverse applications or services in the networks. Their interdependencies also become more and more complicated, which ask for advanced network traffic measurement and analysis techniques. Besides packet level and flow level statistics, it is also important to monitor and understand the behavior of network users and applications, from the perspective of how they communicate with each other. For example, a popular server may attract a lot of connections from interested users; P2P peers often form clusters with intensive communications with each other; Botnet zombies receive regular commands from their botmaster and they may join a malicious campaign later to spread out a mass of spam emails or launch a DDoS attack. Such high level communication patterns as massive concurrent connections or causality of events are often useful behavior signatures of certain applications, or act as indications of anomaly. They can be very helpful in network management, traffic engineering, application behavior analysis, and anomaly detection. In this thesis, we study three interesting and useful communication patterns, including top spreaders, top scanners, and flow correlations. They have practical usage, especially in network management and anomaly detection. However, there is very little support from the network itself for high quality measurement of such non-trivial statistics, and the ever-increasing link speed and traffic volume have brought even greater challenges to our measurement and analysis. We take the approach of data streaming algorithms. First, we propose a general scheme called multiplexed sketches to efficiently estimate statistics of a large number of streams. Then we design appropriate algorithms that can accompany the multiplexed sketches to efficiently track each of the three communication patterns we have proposed. Particularly, we design a general "filter-tracker-digester" framework, where the filter provides a rough statistics estimation, the tracker tracks the IDs of potential candidate spreaders or scanners, and the digester is implemented as multiplexed sketches for accurate statistics estimation. Several challenges are addressed in our design, including traffic scale, skewness, speed, memory usage, and result accuracy. The performance of our algorithms is analyzed both mathematically and experimentally. We show they can achieve accuracy and speed of at least an order of magnitude higher than alternative approaches.