Learning Distilled Graph for Large-Scale Social Network Data Clustering
Wenhe Liu,Dong Gong,Mingkui Tan,Javen Qinfeng Shi,Yi Yang,Alexander G. Hauptmann
DOI: https://doi.org/10.1109/tkde.2019.2904068
IF: 9.235
2020-07-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Spectral analysis is critical in social network analysis. As a vital step of the spectral analysis, the graph construction in many existing works utilizes content data only. Unfortunately, the content data often consists of noisy, sparse, and redundant features, which makes the resulting graph unstable and unreliable. In practice, besides the content data, social network data also contain link information, which provides additional information for graph construction. Some of previous works utilize the link data. However, the link data is often incomplete, which makes the resulting graph incomplete. To address these issues, we propose a novel Distilled Graph Clustering (DGC) method. It pursuits a distilled graph based on both the content data and the link data. The proposed algorithm alternates between two steps: in the feature selection step, it finds the most representative feature subset w.r.t. an intermediate graph initialized with link data; in graph distillation step, the proposed method updates and refines the graph based on only the selected features. The final resulting graph, which is referred to as the distilled graph, is then utilized for spectral clustering on the large-scale social network data. Extensive experiments demonstrate the superiority of the proposed method.
computer science, information systems, artificial intelligence,engineering, electrical & electronic