LiteWSEC: A Lightweight Framework for Web-Scale Spectral Ensemble Clustering
Geping Yang,Sucheng Deng,Can Chen,Yiyang Yang,Zhiguo Gong,Xiang Chen,Zhifeng Hao
DOI: https://doi.org/10.1109/tkde.2023.3267167
IF: 9.235
2023-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Spectral Clustering (SC) is an effective clustering method for its excellent performance in partitioning non-linearly distributed data. On the other hand, Ensemble Clustering (EC), a different clustering technology, can promote cluster quality by ensembling the results of base clusterings. In this work, we concentrate on an EC framework that utilizes SC as the base method. Nevertheless, SC suffers from scalability due to its high computational complexity in constructing the Laplacian graph and computing the corresponding eigendecomposition. In the past decades, many efforts have been made to it. However, SC suffers from the scalability issue in processing extensive data, especially in web-scale scenarios. Additionally, EC requires multiple clustering results as the ensemble bases, which further aggravates resource consumption. To address this issue, LiteWSEC, a simple yet efficient Lightweight Framework for Web-scale Spectral Ensemble Clustering, is proposed to cluster web-scale data with limited resource requirements. It adopts the Web-scale Spectral Clustering (WSC) as the base method, which has minimal space overhead without computing overall embedding explicitly. LiteWSEC is highly flexible in the memory requirement, which is adaptive to the available resource. It can partition web-scale data (e.g., $n = 8,000~k$) in an resource-limited host (e.g., memory is restricted to 1 GB). Experiments on real-world, large-scale, and web-scale datasets demonstrate both the efficiency and effectiveness of LiteWSEC over state-of-the-art SC and EC methods.
computer science, information systems, artificial intelligence,engineering, electrical & electronic