Abstract:Document clustering, a fundamental task in natural language processing, aims to divede large collections of documents into meaningful groups based on their similarities. Multi-view document clustering (MvDC) has emerged as a promising approach, leveraging information from diverse views to improve clustering accuracy and robustness. However, existing multi-view clustering methods suffer from two issues: (1) a lack of inter-relations across documents during consensus semantic learning; (2) the neglect of consensus structure mining in the multi-view document clustering. To address these issues, we propose a Hierarchical Consensus Learning model for Multi-view Document Clustering, termed as MvDC-HCL. Our model incorporates two key modules: The Data-oriented Consensus Semantic Learning (CSeL) module focuses on learning consensus semantics across various views by leveraging a hybrid contrastive consensus objective. The Task-oriented Consensus Structure Clustering (CStC) module employs a gated fusion network and clustering-driven structure contrastive learning to mine consensus structures effectively. Specifically, CSeL module constructs a contrastive consensus learning objective based on intra-sample and inter-sample relationships in multi-view data, aiming to optimize the view semantic representations obtained by the semantic learner. This facilitates consistent semantic learning across various views of the same sample and consistent relationship learning among samples from different views. Then, the learned view semantic representations are fed into the fusion network of CStC to obtain fused sample semantic representations. Together with the view semantic representations, sample-level and view-level clustering structures are derived for consensus structure mining. Additionally, CStC introduces clustering-driven objectives to guide consensus structure mining and achieve consistent clustering results. By hierarchically extracting implicit consensus semantics and structures within multi-view document data and tasks, MvDC-HCL significantly enhances clustering performance. Through comprehensive experiments, we demonstrate that proposed model can consistently perform better over the state-of-the-art methods. Our code is publicly available at https://github.com/m22453/MvDC_HCRL.

Deep Multi-View Document Clustering with Enhanced Semantic Embedding.

A Hierarchical Consensus Learning Model for Deep Multi-View Document Clustering

Deep embedded multi-view clustering with collaborative training

Deep Multi-View Semi-Supervised Clustering with Sample Pairwise Constraints

One-step Multi-View Clustering Via Deep-Level Semantics Exploiting

Deep Multiview Adaptive Clustering With Semantic Invariance

Deep Multi-View Clustering via View-Specific Representation and Global Graph

Deep Multi-View Spectral Clustering Via Ensemble

Structural deep multi-view clustering with integrated abstraction and detail

Deep Multi-View Subspace Clustering With Unified and Discriminative Learning.

Deep Multiview Collaborative Clustering

Deep Incomplete Multi-View Clustering Via Mining Cluster Complementarity.

Auto-attention Mechanism for Multi-view Deep Embedding Clustering

Double embedding-transfer-based multi-view spectral clustering

Multi-View Maximum Entropy Clustering by Jointly Leveraging Inter-View Collaborations and Intra-View-Weighted Attributes

Concept-Enhanced Multi-view Co-clustering of Document Data

Relaxed multi-view clustering in latent embedding space

Self-Supervised Discriminative Feature Learning for Deep Multi-View Clustering

Self-Weighted Contrastive Fusion for Deep Multi-View Clustering

Jointly Deep Multi-View Learning for Clustering Analysis

Multi-view deep subspace clustering via level-by-level guided multi-level features learning