Abstract:Model-based clustering approaches concern the paradigm of exploratory data analysis relying on the finite mixture model to automatically find a latent structure governing observed data. They are one of the most popular and successful approaches in cluster analysis. The mixture density estimation is generally performed by maximizing the observed-data log-likelihood by using the expectation-maximization (EM) algorithm. However, it is well-known that the EM algorithm initialization is crucial. In addition, the standard EM algorithm requires the number of clusters to be known a priori. Some solutions have been provided in [31, 12] for model-based clustering with Gaussian mixture models for multivariate data. In this paper we focus on model-based curve clustering approaches, when the data are curves rather than vectorial data, based on regression mixtures. We propose a new robust EM algorithm for clustering curves. We extend the model-based clustering approach presented in [31] for Gaussian mixture models, to the case of curve clustering by regression mixtures, including polynomial regression mixtures as well as spline or B-spline regressions mixtures. Our approach both handles the problem of initialization and the one of choosing the optimal number of clusters as the EM learning proceeds, rather than in a two-fold scheme. This is achieved by optimizing a penalized log-likelihood criterion. A simulation study confirms the potential benefit of the proposed algorithm in terms of robustness regarding initialization and funding the actual number of clusters.

Counting Clusters Using R-NN Curves.

R-NN Curves: an Intuitive Approach to Outlier Detection Using a Distance Based Method

Data Clustering Based on the Modified Relaxation Cheeger Cut Model

Estimating the number of clusters in multivariate data by various fittings of the L-curve

CNAK : Cluster Number Assisted K-means

A robust clustering method with noise identification based on directed K-nearest neighbor graph

A Novel Clustering Algorithm Based on the Natural Reverse Nearest Neighbor Structure

CURE-NS: a hierarchical clustering algorithm with new shrinking scheme

RDCI: A novel method of cluster analysis and applications thereof in sample molecular simulations

A novel data clustering algorithm using heuristic rules based on k-nearest neighbors chain.

A Novel Density Peaks Clustering Algorithm Based on K Nearest Neighbors with Adaptive Merging Strategy

A Graph-based Approach to Estimating the Number of Clusters

Robust EM algorithm for model-based curve clustering

k is the Magic Number -- Inferring the Number of Clusters Through Nonparametric Concentration Inequalities

Centerless Clustering: An Efficient Variant of K-means Based on K-NN Graph

How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis

RECOME: a New Density-Based Clustering Algorithm Using Relative KNN Kernel Density

r-Reference points based k-means algorithm

Cluster Analysis of Medical Research Data using R

Kernel K-Nearest Neighbor Algorithm As a Flexible SAR Modeling Tool

Bayesian cluster analysis for registration and clustering homogeneous subgroups in multidimensional functional data