Large-Scale Subspace Clustering by Independent Distributed and Parallel Coding
Jun Li,Zhiqiang Tao,Yue Wu,Bineng Zhong,Yun Fu
DOI: https://doi.org/10.1109/tcyb.2021.3052056
IF: 11.8
2021-01-01
IEEE Transactions on Cybernetics
Abstract:Subspace clustering is a popular method to discover underlying low-dimensional structures of high-dimensional multimedia data (e.g., images, videos, and texts). In this article, we consider a large-scale subspace clustering (LS<sup>2</sup>C) problem, that is, partitioning million data points with a millon dimensions. To address this, we explore an independent distributed and parallel framework by dividing big data/variable matrices and regularization by both columns and rows. Specifically, LS<sup>2</sup>C is independently decomposed into many subproblems by distributing those matrices into different machines by columns since the regularization of the code matrix is equal to a sum of that of its submatrices (e.g., square-of-Frobenius/ l<sub>1</sub> -norm). Consensus optimization is designed to solve these subproblems in a parallel way for saving communication costs. Moreover, we provide theoretical guarantees that LS<sup>2</sup>C can recover consensus subspace representations of high-dimensional data points under broad conditions. Compared with the state-of-the-art LS<sup>2</sup>C methods, our approach achieves better clustering results in public datasets, including a million images and videos.
automation & control systems,computer science, cybernetics, artificial intelligence