Star-based learning correlation clustering

Jialin Hua,Jian Yu,Miin-Shen Yang
DOI: https://doi.org/10.1016/j.patcog.2021.107966
IF: 8
2021-08-01
Pattern Recognition
Abstract:Correlation clustering (CC) is a clustering method using a signed graph as input without specifying the number of clusters a priori. It had been widely used in real applications, such as social network and text mining. However, its exact optimization or approximate algorithms often give unsatisfactory results, especially for large-scale signed graphs. This paper tackles this problem and proposes a novel CC algorithm, termed star-based learning correlation clustering (SL-CC). The proposed SL-CC contains two phases. The first is a scale reduction for signed graphs. We propose a special motif, called a star structure, for reducing the scale of signed graphs. We assign the vertices within a star structure to have the same cluster label and then merge these vertices as a new vertex in the graph so we can shrink a large-scale graph to a much small-scale one. The second is to give a learning schema for the local search on the reduced graphs. It can discover some important stars as seeds of clusters according to the graph structure, and then justify whether the other stars need to be merged with seeds or not. We also construct a new integer linear programing (ILP) model based on cycle inequalities to perform the local search with final clustering results. The experiments and comparisons of the proposed SL-CC with some existing CC methods on synthetic and real data sets with variant scale structures of signed graphs demonstrate the efficiency and usefulness of the SL-CC algorithm.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?