Abstract:The rapid growth of the number of data brings great challenges to clustering, especially the introduction of multi-view data, which collected from multiple sources or represented by multiple features, makes these challenges more arduous. How to clustering large-scale data efficiently has become the hottest topic of current large-scale clustering tasks. Although several accelerated multi-view methods have been proposed to improve the efficiency of clustering large-scale data, they still cannot be applied to some scenarios that require high efficiency because of the high computational complexity. To cope with the issue of high computational complexity of existing multi-view methods when dealing with large-scale data, a fast multi-view clustering model via nonnegative and orthogonal factorization (FMCNOF) is proposed in this paper. Instead of constraining the factor matrices to be nonnegative as traditional nonnegative and orthogonal factorization (NOF), we constrain a factor matrix of this model to be cluster indicator matrix which can assign cluster labels to data directly without extra post-processing step to extract cluster structures from the factor matrix. Meanwhile, the F-norm instead of the L2-norm is utilized on the FMCNOF model, which makes the model very easy to optimize. Furthermore, an efficient optimization algorithm is proposed to solve the FMCNOF model. Different from the traditional NOF optimization algorithm requiring dense matrix multiplications, our algorithm can divide the optimization problem into three decoupled small size subproblems that can be solved by much less matrix multiplications. Combined with the FMCNOF model and the corresponding fast optimization method, the efficiency of the clustering process can be significantly improved, and the computational complexity is nearly $O(n)$ . Extensive experiments on various benchmark data sets validate our approach can greatly -mprove the efficiency when achieve acceptable performance.

A General Framework for Fast Co-clustering on Large Datasets Using Matrix Decomposition

CRD: Fast Co-Clustering on Large Datasets Utilizing Sampling-Based Matrix Decomposition

A Novel Kernel Possibitistic Fuzzy C-Means Clustering Algorithm For Large Scale Data Sets

Scalable Co-Clustering for Large-Scale Data through Dynamic Partitioning and Hierarchical Merging

A Fast Algorithm for Density-Based Clustering in Large Database

Multimodal Co-clustering Analysis of Big Data Based on Matrix and Tensor Decomposition

An Easy-to-Implement Framework of Fast Subspace Clustering for Big Data Sets.

A framework for simultaneous co-clustering and learning from complex data

Distributed Bayesian Matrix Decomposition for Big Data Mining and Clustering

Discovering Multiple Co-Clusterings With Matrix Factorization

Single multiplicatively updated matrix factorization for co-clustering.

Penalized Nonnegative Nonnegative Matrix Tri-Factorization For Co-Clustering

Distributed structural clustering on large graph

Parallel Non-negative Matrix Tri-Factorization for Text Data Co-clustering

Fast Multi-View Clustering via Nonnegative and Orthogonal Factorization

Auto-weighted multi-view co-clustering via fast matrix factorization

Bilateral k-Means Algorithm for Fast Co-Clustering.

Jointly Clustering Rows and Columns of Binary Matrices: Algorithms and Trade-offs

HICC: an Entropy Splitting-Based Framework for Hierarchical Co-Clustering

Efficient Matrix Sketching over Distributed Data

Adaptive Voronoi-based Column Selection Methods for Interpretable Dimensionality Reduction