Robust and compact maximum margin clustering for high-dimensional data

Hakan Cevikalp,Edward Chome
DOI: https://doi.org/10.1007/s00521-023-09388-x
2024-01-17
Neural Computing and Applications
Abstract:Abstract In the field of machine learning, clustering has become an increasingly popular research topic due to its critical importance. Many clustering algorithms have been proposed utilizing a variety of approaches. This study focuses on clustering of high-dimensional data using the maximum margin clustering approach. In this paper, two methods are introduced: The first method employs the classical maximum margin clustering approach, which separates data into two clusters with the greatest margin between them. The second method takes cluster compactness into account and searches for two parallel hyperplanes that best fit to the cluster samples while also being as far apart from each other as possible. Additionally, robust variants of these clustering methods are introduced to handle outliers and noise within the data samples. The stochastic gradient algorithm is used to solve the resulting optimization problems, enabling all proposed clustering methods to scale well with large-scale data. Experimental results demonstrate that the proposed methods are more effective than existing maximum margin clustering methods, particularly in high-dimensional clustering problems, highlighting the efficacy of the proposed methods.
computer science, artificial intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to perform effective clustering in high - dimensional data, especially through the Maximum Margin Clustering (MMC) method. Specifically, the paper proposes two new binary clustering methods and their robust versions: 1. **Classic Maximum Margin Clustering Method**: - This method uses the classic MMC objective function. The goal is to divide the data into two clusters by a hyperplane and maximize the margin between these two clusters. - Different from the existing MMC methods, this paper uses the Stochastic Gradient (SG) algorithm to directly solve the original problem instead of the dual problem. This makes the method more efficient when dealing with large - scale data sets. 2. **Compact and Maximum Margin Clustering Method**: - This method not only considers the maximum margin between clusters but also the compactness within clusters. It finds two parallel hyperplanes to fit the cluster samples and makes the distance between these two hyperplanes as large as possible. - This method can be regarded as a combination of maximum margin clustering and subspace clustering because it both maximizes the margin between clusters and minimizes the variance within clusters. In addition, the paper also introduces robust versions of these clustering methods to deal with noise and outliers in the data. Specifically, the robust versions use a more stable Symmetric Ramp Loss function, which enables the optimization problem to be solved by the Concave - Convex Procedure (CCP), thereby improving the stability and convergence of the algorithm. In summary, the main contribution of this paper lies in proposing an efficient and robust clustering method for high - dimensional data, which can improve the accuracy and robustness of clustering while maintaining computational efficiency.