Clustering large mixed-type data with ordinal variables

Gero Szepannek,Rabea Aschenbruck,Adalbert Wilhelm
DOI: https://doi.org/10.1007/s11634-024-00595-5
2024-05-29
Advances in Data Analysis and Classification
Abstract:One of the most frequently used algorithms for clustering data with both numeric and categorical variables is the k-prototypes algorithm, an extension of the well-known k-means clustering. Gower's distance denotes another popular approach for dealing with mixed-type data and is suitable not only for numeric and categorical but also for ordinal variables. In the paper a modification of the k-prototypes algorithm to Gower's distance is proposed that ensures convergence. This provides a tool that allows to take into account ordinal information for clustering and can also be used for large data. A simulation study demonstrates convergence, good clustering results as well as small runtimes.
statistics & probability
What problem does this paper attempt to address?