Abstract:Correlation clustering seeks a partition of the vertex set of a given graph/network into groups of closely related, or just close enough, vertices so that elements of different groups are not close to each other. The problem has been previously modeled and studied as a graph editing problem, namely Cluster Editing, which assumes that closely related data elements must be adjacent. As such, the main objective (of the Cluster Editing problem) is to turn clusters into cliques as a way to identify them. This is to be obtained via two main edge editing operations: additions and deletions. There are two problems with the Cluster Editing model that we seek to address in this paper. First, ``closely'' related does not necessarily mean ``directly'' related. So closeness should be measured by relatively short distance. As such, we seek to turn clusters into (sub)graphs of small diameter. Second, in real applications, a data element can belong, or have roles, in multiple groups. In some cases, without allowing data elements to belong to more than one cluster each, makes it hard to achieve any clustering via classical partition-based methods. We address this latter problem by allowing vertex cloning, also known as vertex splitting. Heuristic methods for the introduced problem are presented along with experimental results showing the effectiveness of the proposed model and algorithmic approach.

What problem does this paper attempt to address?

The problems that this paper attempts to solve mainly focus on improving two key issues in the traditional correlation clustering model: 1. "Close" does not mean "directly" related: In the traditional Cluster Editing model, it is required that closely related data elements must be adjacent. However, in practical applications, being closely related does not necessarily mean being directly related, but rather refers to a relatively short distance. Therefore, the author proposes to transform clusters into subgraphs of small diameter instead of strict cliques. 2. Data elements are not allowed to belong to multiple clusters: In many real - world application scenarios, a data element may belong to multiple clusters simultaneously. For example, in the protein - protein interaction (PPI) network, a protein can have multiple biological functions. However, the traditional partitioning - based methods do not allow a data point to belong to multiple clusters, which makes it difficult to achieve effective clustering in some cases. To this end, the author introduces the concept of vertex cloning or vertex splitting, allowing a vertex to be split into multiple vertices, each belonging to a different cluster. Specifically, the paper proposes a new model - **2 - Club Cluster Edge Deletion with Vertex Splitting (2CCEDVS)**, and combines heuristic algorithms to solve these problems. This model transforms the graph into a disjoint union composed of 2 - clubs through edge deletion and vertex splitting operations, thus allowing overlap between clusters and relaxing the requirement that elements within a cluster must be directly connected. ### Formulas and Definitions Some key concepts and formulas involved in the paper are as follows: - **s - club**: A set of vertices in which the distance between any two vertices does not exceed \( s \). - For \( s = 2 \), that is, 2 - club, it means that the length of the shortest path between any two vertices does not exceed 2. - **Cluster Editing Problem**: - The goal is to transform a graph into a disjoint union composed of cliques by adding and deleting edges. - Formally defined as: Given a graph \( G=(V, E) \) and a positive integer \( k \), can \( G \) be transformed into a disjoint union composed of cliques by at most \( k \) edge - editing operations (adding or deleting edges)? - **2 - Club Cluster Edge Deletion with Vertex Splitting (2CCEDVS)**: - The goal is to transform a graph into a disjoint union composed of \( s \)-clubs by at most \( k \) edge - deletion and vertex - splitting operations. - Formally defined as: Given a graph \( G=(V, E) \) and positive integers \( s \) and \( k \), can \( G \) be transformed into a disjoint union composed of \( s \)-clubs by at most \( k \) edge - deletion and vertex - splitting operations? ### Conclusion By introducing the concepts of 2 - club and vertex splitting, this paper aims to provide a more flexible and more practical correlation clustering method. The experimental results show that the proposed 2CCEDVS model exhibits good clustering quality and efficiency when processing synthetic data and real - life biological network data.

Correlation Clustering with Overlap: a Heuristic Graph Editing Approach

Correlation Clustering with Vertex Splitting

Cluster Editing with Vertex Splitting

On the Complexity of 2-club Cluster Editing with Vertex Splitting

CoHomo: A Cluster-Attribute Correlation Aware Graph Clustering Framework

Modification-Fair Cluster Editing

Correlation Clustering with Sherali-Adams

The Complexity of Cluster Vertex Splitting and Company

Efficient Enumeration of the Optimal Solutions to the Correlation Clustering problem

Handling Correlated Rounding Error Via Preclustering: A 1.73-Approximation for Correlation Clustering

Combinatorial Correlation Clustering

Hierarchical Overlapping Clustering of Network Data Using Cut Metrics

Overlapping and Robust Edge-Colored Clustering in Hypergraphs

A Scalable Approach for General Correlation Clustering

Online Correlation Clustering for Dynamic Complete Signed Graphs

Parameterized Correlation Clustering in Hypergraphs and Bipartite Graphs

Understanding the Cluster LP for Correlation Clustering

Editing Graphs into Disjoint Unions of Dense Clusters

Parameterized Dynamic Cluster Editing

Understanding the Cluster Linear Program for Correlation Clustering

Fully Dynamic Correlation Clustering: Breaking 3-Approximation