GAD: General Activity Detection for Fast Clustering on Large Data

Xin Jin,Sangkyum Kim,Jiawei Han,Liangliang Cao,Zhijun Yin
DOI: https://doi.org/10.1137/1.9781611972795.1
2009-01-01
Abstract:In this paper, we propose GAD (General Activity Detection) for fast clustering on large scale data. Within this framework we design a set of algorithms for difierent scenarios: (1) Exact GAD algorithm E-GAD, which is much faster than K-Means and gets the same clustering result. (2) Approximate GAD algorithms with difierent assumptions, which are faster than E-GAD while achieving difierent de- grees of approximation. (3) GAD based algorithms to han- dle the "large clusters" problem which appears in many large scale clustering applications. Two existing activity detection algorithms GT and CGAUTC are special cases under the framework. The most important contribution of our work is that the framework is the general solution to exploit activity detection for fast clustering in both exact and approximate senarios, and our proposed algorithms within the framework can achieve very high speed. Extensive experiments have been conducted on several large datasets from various real world applications; the results show that our proposed algo- rithms are efiective and e-cient.
What problem does this paper attempt to address?