Scalable Multi-view Clustering via Explicit Kernel Features Maps

Chakib Fettal, Lazhar Labiod, Mohamed Nadif
2024-02-08
Abstract:A growing awareness of multi-view learning as an important component in data science and machine learning is a consequence of the increasing prevalence of multiple views in real-world applications, especially in the context of networks. In this paper we introduce a new scalability framework for multi-view subspace clustering. An efficient optimization strategy is proposed, leveraging kernel feature maps to reduce the computational burden while maintaining good clustering performance. The scalability of the algorithm means that it can be applied to large-scale datasets, including those with millions of data points, using a standard machine, in a few minutes. We conduct extensive experiments on real-world benchmark networks of various sizes in order to evaluate the performance of our algorithm against state-of-the-art multi-view subspace clustering methods and attributed-network multi-view approaches.
Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problems of computational efficiency and clustering performance of multi - view subspace clustering on large - scale datasets. Specifically: 1. **Challenges of multi - view subspace clustering**: - With the explosive growth of data from different sources, such as social media, sensor networks and online platforms, complex high - dimensional datasets have emerged. - These datasets usually contain multiple views of the same underlying data, and each view captures different aspects or perspectives. - Traditional clustering algorithms are designed for single - view data and face significant challenges when dealing with multi - view data, and it is difficult to effectively capture the complex relationships and structures across multiple views. 2. **Limitations of existing methods**: - Although some efficient methods have been proposed, such as anchor - based techniques to reduce the size of matrices used in optimization, tests show that existing methods often fail to achieve the required efficiency and clustering performance on large - scale multi - view datasets. - Computational complexity limits the wide application of multi - view subspace clustering in practical applications. 3. **Solutions proposed in this paper**: - A new scalable framework for multi - view subspace clustering is introduced. - An efficient optimization strategy is proposed, which uses kernel feature maps to reduce the computational burden while maintaining good clustering performance. - The algorithm can process large - scale datasets containing millions of data points within a few minutes on a standard machine. - Through extensive experimental verification, the algorithm performs excellently on real - world benchmark networks of different scales and outperforms existing multi - view subspace clustering methods and multi - view methods for attributed networks. 4. **Specific contributions**: - **Unified consensus subspace structure graph**: Integrate the data subspace structure graphs of multiple views into a unified consensus subspace structure graph, effectively capturing the complementary information between views. - **Efficient optimization strategy**: By taking advantage of the properties of kernel feature maps, the computational burden is significantly reduced while maintaining state - of - the - art clustering performance. - **Extensive experimental verification**: Extensive experiments including statistical significance tests are carried out to ensure the reproducibility and transparency of the results, enabling others to verify these results. In conclusion, this paper aims to solve the problems of computational efficiency and clustering performance of multi - view subspace clustering on large - scale datasets by introducing a new scalable framework and optimization strategy, thereby unlocking its full potential in practical applications.