A Novel Normalized-Cut Solver with Nearest Neighbor Hierarchical Initialization

Feiping Nie,Jitao Lu,Danyang Wu,Rong Wang,Xuelong Li
DOI: https://doi.org/10.1109/TPAMI.2023.3279394
2023-11-26
Abstract:Normalized-Cut (N-Cut) is a famous model of spectral clustering. The traditional N-Cut solvers are two-stage: 1) calculating the continuous spectral embedding of normalized Laplacian matrix; 2) discretization via $K$-means or spectral rotation. However, this paradigm brings two vital problems: 1) two-stage methods solve a relaxed version of the original problem, so they cannot obtain good solutions for the original N-Cut problem; 2) solving the relaxed problem requires eigenvalue decomposition, which has $\mathcal{O}(n^3)$ time complexity ($n$ is the number of nodes). To address the problems, we propose a novel N-Cut solver designed based on the famous coordinate descent method. Since the vanilla coordinate descent method also has $\mathcal{O}(n^3)$ time complexity, we design various accelerating strategies to reduce the time complexity to $\mathcal{O}(|E|)$ ($|E|$ is the number of edges). To avoid reliance on random initialization which brings uncertainties to clustering, we propose an efficient initialization method that gives deterministic outputs. Extensive experiments on several benchmark datasets demonstrate that the proposed solver can obtain larger objective values of N-Cut, meanwhile achieving better clustering performance compared to traditional solvers.
Machine Learning,Optimization and Control
What problem does this paper attempt to address?
This paper attempts to solve two key problems encountered by the Normalized - Cut (N - Cut) model in spectral clustering: 1. **The two - stage method solves a relaxed version of the original problem**: Traditional methods calculate the continuous spectral embedding of the normalized Laplacian matrix and then discretize it through K - means or spectral rotation. However, this method cannot obtain a good solution to the original N - Cut problem because they solve a relaxed version of the problem. 2. **The time complexity of solving the relaxed problem is high**: Solving the relaxed problem requires eigenvalue decomposition (EVD), and its time complexity is \(O(n^3)\), where \(n\) is the number of nodes. This makes the calculation on large - scale data sets very expensive. To address these problems, the author proposes a novel N - Cut solver based on the coordinate descent method and designs a variety of acceleration strategies to reduce the time complexity to \(O(|E|)\), where \(|E|\) is the number of edges. In addition, to avoid the uncertainty caused by relying on random initialization, the author also proposes an efficient initialization method - Nearest Neighbor Hierarchical Initialization (N2HI), which can give a deterministic output. ### Specific solutions 1. **New - type N - Cut solver**: - The author proposes a fast heuristic solver (Fast - CD) based on the coordinate descent method, which can directly optimize the original N - Cut objective function without any relaxation and approximation. - Through ingenious equivalent transformations and acceleration techniques, the computational cost of Fast - CD per iteration is reduced to \(O(|E|)\), which is more efficient than the EVD - based relaxed solver. 2. **Nearest Neighbor Hierarchical Initialization (N2HI)**: - N2HI uses the first - neighbor relationship of the input graph to generate a deterministic initial clustering assignment, thereby effectively avoiding the uncertainty caused by random initialization in traditional methods. - N2HI consists of three steps: clustering based on the first - neighbor relationship, graph coarsening, and a refinement process. ### Experimental results Extensive experiments show that the proposed Fast - CD solver can obtain larger N - Cut objective values on multiple benchmark data sets and achieve better clustering performance, having obvious advantages compared with traditional solvers. In addition, in some benchmark tests, the N - Cut model can achieve state - of - the - art performance through the Fast - CD solver. ### Main contributions 1. Different from previous methods of solving the N - Cut model through relaxation, this paper proposes a fast heuristic solver based on the coordinate descent method, which directly optimizes the original objective function with a time complexity of \(O(|E|)\). 2. The proposed Fast - CD solver is more efficient both theoretically and practically than the EVD - based relaxed solver. 3. The Fast - CD solver can obtain higher N - Cut objective values and better clustering performance. 4. An efficient initialization method N2HI is proposed, which outputs a deterministic initial clustering assignment, thereby effectively avoiding randomness in the clustering process.