Abstract:AbstractGiven a directed graph of millions of nodes, how can we automatically spot anomalous, suspicious nodes judging only from their connectivity patterns? Suspicious graph patterns show up in many applications, from Twitter users who buy fake followers, manipulating the social network, to botnet members performing distributed denial of service attacks, disturbing the network traffic graph. We propose a fast and effective method, CatchSync, which exploits two of the tell-tale signs left in graphs by fraudsters: (a) synchronized behavior: suspicious nodes have extremely similar behavior patterns because they are often required to perform some task together (such as follow the same user); and (b) rare behavior: their connectivity patterns are very different from the majority. We introduce novel measures to quantify both concepts (“synchronicity” and “normality”) and we propose a parameter-free algorithm that works on the resulting synchronicity-normality plots. Thanks to careful design, CatchSync has the following desirable properties: (a) it is scalable to large datasets, being linear in the graph size; (b) it is parameter free; and (c) it is side-information-oblivious: it can operate using only the topology, without needing labeled data, nor timing information, and the like., while still capable of using side information if available. We applied CatchSync on three large, real datasets, 1-billion-edge Twitter social graph, 3-billion-edge, and 12-billion-edge Tencent Weibo social graphs, and several synthetic ones; CatchSync consistently outperforms existing competitors, both in detection accuracy by 36% on Twitter and 20% on Tencent Weibo, as well as in speed.

Assessing and ranking structural correlations in graphs.

Static and Dynamic Structural Correlations in Graphs.

The Correlation Properties of Urban Traffic Networks.

Correlation Analysis of Nodes Identifies Real Communities in Networks

Correlation of Centralities: A Study Through Distinct Graph Robustness

Scalable Community Discovery of Large Networks

Event Detection in Scientific Mapping Based on a Novel Structural Community Similarity Algorithm.

How to Measure Significance of Community Structure in Complex Networks

E-rank: A Structural-Based Similarity Measure in Social Networks.

A novel method based on node correlation to evaluate the important nodes in complex networks

Catching Synchronized Behaviors in Large Networks: A Graph Mining Approach

Discovering Organizational Correlations from Twitter

Correlation-Based Community Detection

A cross-correlation-based method for spatial-temporal traffic analysis

Exploring spatio-temporal correlation and complexity of safety monitoring data by complex networks

Finding Structural Hole Spanners Based on Community Forest Model and Diminishing Marginal Utility in Large Scale Social Networks.

Inferring Geographic Coincidence in Ephemeral Social Networks.

Uncovering Community Structure In Social Networks By Clique Correlation

LogMaster: Mining Event Correlations in Logs of Large scale Cluster Systems

CatchSync

Social significance of community structure: Statistical view