Multi-view Granular-ball Contrastive Clustering

Peng Su,Shudong Huang,Weihong Ma,Deng Xiong,Jiancheng Lv
2024-12-18
Abstract:Previous multi-view contrastive learning methods typically operate at two scales: instance-level and cluster-level. Instance-level approaches construct positive and negative pairs based on sample correspondences, aiming to bring positive pairs closer and push negative pairs further apart in the latent space. Cluster-level methods focus on calculating cluster assignments for samples under each view and maximize view consensus by reducing distribution discrepancies, e.g., minimizing KL divergence or maximizing mutual information. However, these two types of methods either introduce false negatives, leading to reduced model discriminability, or overlook local structures and cannot measure relationships between clusters across views explicitly. To this end, we propose a method named Multi-view Granular-ball Contrastive Clustering (MGBCC). MGBCC segments the sample set into coarse-grained granular balls, and establishes associations between intra-view and cross-view granular balls. These associations are reinforced in a shared latent space, thereby achieving multi-granularity contrastive learning. Granular balls lie between instances and clusters, naturally preserving the local topological structure of the sample set. We conduct extensive experiments to validate the effectiveness of the proposed method.
Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve two main problems in multi - view contrastive learning: 1. **Limitations of instance - level contrastive learning**: - Instance - level contrastive learning methods construct positive and negative sample pairs with the aim of bringing positive sample pairs closer and pushing negative sample pairs farther in the latent space. However, since instance labels are unknown in the unsupervised paradigm, improper construction of negative sample pairs may introduce false negatives, thus reducing the discriminative ability of the model. 2. **Limitations of cluster - level contrastive learning**: - Cluster - level contrastive learning methods calculate cluster assignments under different views and achieve view consistency by reducing distribution differences (such as minimizing KL divergence or maximizing mutual information). However, these methods often ignore local structure information and cannot explicitly measure the relationships between cross - view clusters. To solve the above problems, the authors propose a method named **Multi - view Granular - ball Contrastive Clustering (MGBCC)**. The main innovations of MGBCC are: - **Granular - ball modeling**: MGBCC divides the sample set into coarse - grained granular balls and establishes intra - view and inter - view granular - ball associations. Granular balls are between instances and clusters and naturally preserve the local topological structure of the sample set. - **Multi - granularity contrastive learning**: By strengthening these associations in the shared latent space, MGBCC achieves multi - granularity contrastive learning, thus avoiding the problem of directly using neighboring samples to construct negative sample pairs while preserving the local structure information of the sample set. Specifically, the MGBCC method includes the following key steps: 1. **Intra - view reconstruction**: Extract low - dimensional embedding representations from the original features through a deep auto - encoder. 2. **Intra - view granular - ball generation**: Divide the sample set into multiple granular balls according to the granularity parameter and calculate the center and radius of each granular ball. 3. **Inter - view granular - ball association**: Establish associations between granular balls based on the size of overlaps and intersections. 4. **Granular - ball contrastive learning**: Optimize associated granular - ball pairs through a contrastive loss function, making them as close as possible in the latent space while making unrelated granular - ball pairs as far away as possible. This method not only solves the deficiencies of instance - level and cluster - level contrastive learning but also shows superior performance on multiple multi - view datasets.