Abstract:Clustering is a major field in data mining, which is also an important method of data partition or grouping. Clustering has now been applied in various ways to commerce, market analysis, biology, web classification, and so on. Clustering algorithms include the partitioning method, hierarchical clustering as well as density-based, grid-based, model-based, and fuzzy clustering. The K-means algorithm is one of the essential clustering algorithms. It is a kind of clustering algorithm based on the partitioning method. This study’s aim was to improve the algorithm based on research, while with regard to its application, the aim was to use the algorithm for customer segmentation. Customer segmentation is an essential element in the enterprise’s utilization of CRM. The first part of the paper presents an elaboration of the object of study, its background as well as the goal this article would like to achieve; it also discusses the research the mentality and the overall content. The second part mainly introduces the basic knowledge on clustering and methods for clustering analysis based on the assessment of different algorithms, while identifying its advantages and disadvantages through the comparison of those algorithms. The third part introduces the application of the algorithm, as the study applies clustering technology to customer segmentation. First, the customer value system is built through AHP; customer value is then quantified, and customers are divided into different classifications using clustering technology. The efficient CRM can thus be used according to the different customer classifications. Currently, there are some systems used to evaluate customer value, but none of them can be put into practice efficiently. In order to solve this problem, the concept of continuous symmetry is introduced. It is very important to detect the continuous symmetry of a given problem. It allows for the detection of an observable state whose components are nonlinear functions of the original unobservable state. Thus, we built an evaluating system for customer value, which is in line with the development of the enterprise, using the method of data mining, based on the practical situation of the enterprise and through a series of practical evaluating indexes for customer value. The evaluating system can be used to quantify customer value, to segment the customers, and to build a decision-supporting system for customer value management. The fourth part presents the cure, mainly an analysis of the typical k-means algorithm; this paper proposes two algorithms to improve the k-means algorithm. Improved algorithm A can get the K automatically and can ensure the achievement of the global optimum value to some degree. Improved Algorithm B, which combines the sample technology and the arrangement agglomeration algorithm, is much more efficient than the k-means algorithm. In conclusion, the main findings of the study and further research directions are presented.

Research on Retailer Data Clustering Algorithm Based on Spark

An Improved K-means Distributed Clustering Algorithm Based on Spark Parallel Computing Framework

Research and Application of Improved Clustering Algorithm in Retail Customer Classification

Data Mining Algorithm for Cloud Network Information Based on Artificial Intelligence Decision Mechanism

Research and Application on Spark Clustering Algorithm in Campus Big Data Analysis

Clustering of Electricity Consumption Behavior Dynamics Toward Big Data Applications

Research On The Parallelization Of The Dbscan Clustering Algorithm For Spatial Data Mining Based On The Spark Platform

A Parallel Clustering Algorithm for Power Big Data Analysis.

A Parallel DBSCAN Algorithm Based on Spark

Performance Analysis of Clustering Algorithm under Two Kinds of Big Data Architecture.

Research on parallel clustering of power load based on improved K- Means algorithm

Applying K Means Clustering Techniques on Retail Shop

Performance Comparison of Clustering Algorithms in Spark

Basketball Data Analysis Using Spark Framework and K-Means Algorithm

Recommendation Pattern of Electricity Sales Package Based on Improved Clustering and Spark Framework

Study on K-means Method Based on Data-Mining

Distributed Clustering Algorithm for Awareness of Electricity Consumption Characteristics of Massive Consumers

Improvement Study and Application Based on K-Means Clustering Algorithm

An Optimal Distributed K-Means Clustering Algorithm Based on Cloudstack

The Study Of Parallel K-Means Algorithm

Comparative Study of Apache Spark MLlib Clustering Algorithms