Axioms for Distanceless Graph Partitioning

James Willson,Tandy Warnow
2024-06-18
Abstract:In 2002, Kleinberg proposed three axioms for distance-based clustering, and proved that it was impossible for a clustering method to satisfy all three. While there has been much subsequent work examining and modifying these axioms for distance-based clustering, little work has been done to explore axioms relevant to the graph partitioning problem when the graph is unweighted and given without a distance matrix. Here, we propose and explore axioms for graph partitioning for this case, including modifications of Kleinberg's axioms and three others: two axioms relevant to the ``Resolution Limit'' and one addressing well-connectedness. We prove that clustering under the Constant Potts Model satisfies all the axioms, while Modularity clustering and iterative k-core both fail many axioms we pose. These theoretical properties of the clustering methods are relevant both for theoretical investigation as well as to practitioners considering which methods to use for their domain science studies.
Discrete Mathematics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: when clustering simple unweighted graphs, how to define and evaluate reasonable clustering criteria (axioms). Specifically, the paper explores which clustering methods can satisfy the proposed axioms when the input network has no distance matrix and is a simple unweighted graph, and reveals the limitations of existing clustering methods. ### Background and Motivation In 2002, Kleinberg proposed three axioms (richness, consistency, scale - invariance) for distance - based clustering and proved that no clustering method can satisfy these three axioms simultaneously. Although subsequent research has modified and extended these axioms, few works have specifically proposed corresponding axioms for the graph partitioning problem of unweighted graphs. This paper aims to fill this gap by proposing clustering axioms applicable to simple unweighted graphs and evaluating the performance of different clustering methods under this new set of axioms. ### Main Contributions 1. **Proposing New Axioms**: The paper proposes seven clustering axioms applicable to simple unweighted graphs, including: - **Richness**: For any clustering Γ on a node set V, there exists an edge set E such that the clustering method M(N) = Γ. - **Standard Consistency**: If edges are only added within clusters or deleted between clusters, the clustering result remains unchanged. - **Refinement Consistency**: Allows splitting clusters when adding edges within clusters, but no other changes are allowed. - **Inter - edge Consistency**: Deleting edges between clusters should not change the clustering result. - **Connectivity**: Each non - singleton cluster should have a sufficient minimum number of cut edges. - **Pair - of - Cliques**: For a sufficiently large n, the edges connecting two n - cliques will not merge them into one cluster. - **Fixed Point**: Repeated application of the clustering method will not change the clustering result. 2. **Evaluating Existing Clustering Methods**: The paper evaluates the performance of several common clustering methods (such as modularity optimization, constant Potts model optimization, iterative k - core, etc.) under the above axioms. The research shows that: - **Constant Potts Model Optimization (CPM)** satisfies all the proposed axioms. - **Modularity** and **Iterative k - core (IKC)** fail to satisfy most of the axioms. ### Conclusion By introducing a new axiom system, the paper reveals the limitations of existing clustering methods in dealing with simple unweighted graphs and provides a theoretical basis for selecting clustering methods suitable for specific application scenarios. In particular, the constant Potts model optimization (CPM) is proven to have superior theoretical properties, which provides an important reference for method selection in practical applications. ### Formula Examples - **Modularity Score**: \[ H=\sum_{c \in C}\left(\frac{e_c}{|E|}-\left(\frac{d_c}{2|E|}\right)^2\right) \] where \(e_c\) is the number of edges inside cluster \(c\), and \(d_c\) is the sum of the degrees of the nodes in cluster \(c\). - **Constant Potts Model Score**: \[ H=\sum_{c \in C}\left(e_c-\gamma\binom{n_c}{2}\right) \] where \(e_c\) is the number of edges inside cluster \(c\), \(n_c\) is the number of nodes in cluster \(c\), and \(\gamma\) is the resolution parameter. Through these formula and axiom definitions, the paper provides a systematic framework for understanding and evaluating graph clustering methods.