Abstract:Due to the rapid development of information technology and network technology, there is a lot of data, but the phenomenon of lack of knowledge is becoming more and more serious. Data mining technology has developed vigorously in this environment, and it has shown more and more vitality. Based on Spark programming model, this paper designs the parallel extension of fuzzy c-means. In order to enhance the performance of fuzzy c-means parallel expansion, the improvement strategy of k-means during the initialization phase is borrowed, and k-means// is extended to fuzzy c-means to obtain better clustering performance. Combined with Spark's programming model, this paper can obtain extended parallel fuzzy c-means algorithm. Several experiments on the data set of the algorithm proposed in this paper have shown good scalability and parallelism, effectively expanding fuzzy c-means clustering to distributed applications, greatly increasing the scale of the data processed by the algorithm. This improves the robustness of the algorithm and the adaptability of the algorithm to the shape and structure of the data, so that the parallel and scalable clustering algorithm can more effectively perform cluster analysis on big data. Three algorithms were simulated on MATLAB platform. We use simple data sets and complex two-dimensional data sets, and compare with the traditional fuzzy c-means algorithm and fuzzy c-means algorithm based on fuzzy entropy. Experiments show that the scalable parallel fuzzy c-means algorithm not only greatly improves the anti-noise performance, but also improves the convergence speed, and it can automatically determine the optimal number of clusters.

An Improved K-means Distributed Clustering Algorithm Based on Spark Parallel Computing Framework

An Improved K-means Algorithm Based on Multiple Clustering and Density.

An Improved K-means Algorithm Based on Mapreduce and Grid

A Parallel Varied Density-Based Clustering Algorithm with Optimized Data Partition

A Parallel Adaptive DBSCAN Algorithm Based on k-Dimensional Tree Partition

Research on K-medoids clustering algorithm based on data density and its parallel processing based on MapReduce

A Novel Density Based Clustering Algorithm and Its Parallelization.

Research On The Parallelization Of The Dbscan Clustering Algorithm For Spatial Data Mining Based On The Spark Platform

Optimization of k-means clustering algorithm in hadoop distributed computing framework

Data Mining Algorithm for Cloud Network Information Based on Artificial Intelligence Decision Mechanism

A Parallel DBSCAN Algorithm Based on Spark

An Improved Parallel K-means Clustering Algorithm with MapReduce

The Parallel Implementation and Application of an Improved K-means Algorithm

A Parallel Clustering Algorithm for Power Big Data Analysis.

Research on Retailer Data Clustering Algorithm Based on Spark

An Improved Parallel K-means Algorithm Based on MapReduce

A modified parallel k-means clustering with improved initial centers

Design and Implementation of Parallel DBSCAN Algorithm Based on Spark

Study of Fast Parallel Clustering Partition Algorithm for Large Data Sets

Research on parallel clustering of power load based on improved K- Means algorithm

A K-means clustering with optimized initial center based on Hadoop platform