An Efficient Smoothing Proximal Gradient Algorithm for Convex Clustering

Xin Zhou,Chunlei Du,Xiaodong Cai
DOI: https://doi.org/10.48550/arXiv.2006.12592
2020-06-23
Abstract:Cluster analysis organizes data into sensible groupings and is one of fundamental modes of understanding and learning. The widely used K-means and hierarchical clustering methods can be dramatically suboptimal due to local minima. Recently introduced convex clustering approach formulates clustering as a convex optimization problem and ensures a globally optimal solution. However, the state-of-the-art convex clustering algorithms, based on the alternating direction method of multipliers (ADMM) or the alternating minimization algorithm (AMA), require large computation and memory space, which limits their applications. In this paper, we develop a very efficient smoothing proximal gradient algorithm (Sproga) for convex clustering. Our Sproga is faster than ADMM- or AMA-based convex clustering algorithms by one to two orders of magnitude. The memory space required by Sproga is less than that required by ADMM and AMA by at least one order of magnitude. Computer simulations and real data analysis show that Sproga outperforms several well known clustering algorithms including K-means and hierarchical clustering. The efficiency and superior performance of our algorithm will help convex clustering to find its wide application.
Machine Learning,Methodology
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the high computational complexity and large memory consumption of existing convex clustering algorithms when dealing with large - scale data. Specifically: 1. **Limitations of existing methods**: - Traditional K - means and hierarchical clustering methods are prone to fall into local optimal solutions, resulting in unsatisfactory results. - Existing convex clustering algorithms (such as methods based on the Alternating Direction Method of Multipliers (ADMM) or the Alternating Minimization Algorithm (AMA)) can guarantee global optimal solutions, but their computational amount and memory requirements are very large, which limits their application on large - scale data sets. 2. **Proposed new method**: - The paper proposes an efficient Smoothing Proximal Gradient Algorithm (Sproga) to solve the convex clustering problem. - Sproga approximates non - smooth terms by introducing smoothing techniques and combines the proximal gradient method, so that the algorithm significantly improves computational efficiency and reduces memory consumption while maintaining global optimal solutions. 3. **Specific improvement points**: - **Speed improvement**: Sproga is one to two orders of magnitude faster than existing ADMM and AMA algorithms. - **Memory optimization**: Sproga requires at least one order of magnitude less memory space than ADMM and AMA. - **Superior performance**: Computer simulations and real - data analysis show that Sproga outperforms well - known clustering algorithms including K - means, hierarchical clustering, DBSCAN, Spectral Clustering (SPECC) and graph - based Louvain on a variety of data sets. 4. **Application scenarios**: - The high efficiency and superior performance of Sproga will help convex clustering methods to be widely used in more fields, especially when dealing with large - scale data sets. In summary, this paper aims to overcome the computational and memory bottlenecks of existing convex clustering algorithms by developing a new efficient algorithm (Sproga), thereby improving the efficiency and applicability of clustering analysis.