Abstract:Missing data presents a challenge to clustering algorithms, as traditional methods tend to pad incomplete data first before clustering. To combine the two processes of padding and clustering and improve the clustering accuracy, a generalized fuzzy clustering framework is proposed based on optimal completion strategy (OCS) and nearest prototype strategy (NPS) with four improved algorithms developed. Feature weights are introduced to reduce outliers' influence on the cluster centers, and kernel functions are used to solve the linear indistinguishability problem. The proposed algorithms are evaluated regarding correct clustering rate, iteration number, and external evaluation indexes with nine datasets from the UCI (University of California, Irvine) Machine Learning Repository. The results of the experiment indicate that the clustering accuracy of the feature weighted kernel fuzzy C-means algorithm with NPS (NPS-WKFCM) and feature weighted kernel fuzzy C-means algorithm with OCS (OCS-WKFCM) under varying missing rates is superior to that of seven conventional algorithms. Experiments demonstrate that the enhanced algorithm proposed for clustering incomplete data is superior.

Research on Incomplete Data Clustering

K-Means Clustering With Incomplete Data

Incomplete Big Data Distributed Clustering

Distributed Clustering and Filling Algorithm of Incomplete Big Data

An Improved Mean Imputation Clustering Algorithm for Incomplete Data

A Survey on Incomplete Multi-view Clustering

Effective Density-Based Clustering Algorithms for Incomplete Data

Incomplete Big Data Clustering Algorithm Using Feature Selection and Partial Distance

Fuzzy C-Means Clustering of Incomplete Data Based on Probabilistic Information Granules of Missing Values

Fuzzy C-Means Clustering Algorithm Based On Incomplete Data

FuzzyC-Means Clustering Algorithm BasedonIncomplete Data

A Three-Way Decisions Clustering Algorithm for Incomplete Data

Interval Kernel Fuzzy C-Means Clustering of Incomplete Data.

Interval Fuzzy C-means Approach for Incomplete Data Clustering Based on Neural Networks

Robust K-Median and K-Means Clustering Algorithms for Incomplete Data

A Global Clustering Approach Using Hybrid Optimization for Incomplete Data Based on Interval Reconstruction of Missing Value.

A Robust Fuzzy C-Means Clustering Algorithm for Incomplete Data.

An efficient $k$-means-type algorithm for clustering datasets with incomplete records

The particle swarm and fuzzy c-means hybrid method for incomplete data clustering

Incomplete High-Dimensional Data Imputation Algorithm Using Feature Selection and Clustering Analysis on Cloud.

A generalized fuzzy clustering framework for incomplete data by integrating feature weighted and kernel learning