Abstract:Clustering is a major field in data mining, which is also an important method of data partition or grouping. Clustering has now been applied in various ways to commerce, market analysis, biology, web classification, and so on. Clustering algorithms include the partitioning method, hierarchical clustering as well as density-based, grid-based, model-based, and fuzzy clustering. The K-means algorithm is one of the essential clustering algorithms. It is a kind of clustering algorithm based on the partitioning method. This study’s aim was to improve the algorithm based on research, while with regard to its application, the aim was to use the algorithm for customer segmentation. Customer segmentation is an essential element in the enterprise’s utilization of CRM. The first part of the paper presents an elaboration of the object of study, its background as well as the goal this article would like to achieve; it also discusses the research the mentality and the overall content. The second part mainly introduces the basic knowledge on clustering and methods for clustering analysis based on the assessment of different algorithms, while identifying its advantages and disadvantages through the comparison of those algorithms. The third part introduces the application of the algorithm, as the study applies clustering technology to customer segmentation. First, the customer value system is built through AHP; customer value is then quantified, and customers are divided into different classifications using clustering technology. The efficient CRM can thus be used according to the different customer classifications. Currently, there are some systems used to evaluate customer value, but none of them can be put into practice efficiently. In order to solve this problem, the concept of continuous symmetry is introduced. It is very important to detect the continuous symmetry of a given problem. It allows for the detection of an observable state whose components are nonlinear functions of the original unobservable state. Thus, we built an evaluating system for customer value, which is in line with the development of the enterprise, using the method of data mining, based on the practical situation of the enterprise and through a series of practical evaluating indexes for customer value. The evaluating system can be used to quantify customer value, to segment the customers, and to build a decision-supporting system for customer value management. The fourth part presents the cure, mainly an analysis of the typical k-means algorithm; this paper proposes two algorithms to improve the k-means algorithm. Improved algorithm A can get the K automatically and can ensure the achievement of the global optimum value to some degree. Improved Algorithm B, which combines the sample technology and the arrangement agglomeration algorithm, is much more efficient than the k-means algorithm. In conclusion, the main findings of the study and further research directions are presented.

Research of Clustering Algorithms Based on Text Mining

Clustering Algorithms Used in Data Mining

Algorithm and Experiment Research of Textual Document Clustering Based on Improved K-means

A Text Clustering Algorithm to Detect Basic Level Categories in Texts

A Hash-Based Hierarchical Algorithm For Massive Text Clustering

Subspace Clustering by Directly Solving Discriminative K-means

A Linguistic Feature Based Text Clustering Method.

Research on a Text Data Preprocessing Method Suitable for Clustering Algorithm

Research of Adaptive Text Clustering Based on the Statistics of the Datasets

Research on Clustering Algorithm Based on Web Log Mining

The Comparison of SOM and K-means for Text Clustering.

Research on Keyword Extraction Algorithm in English Text Based on Cluster Analysis

Text Clustering Approach Based on Maximal Frequent Term Sets

A Vector Reconstruction Based Clustering Algorithm Particularly for Large-Scale Text Collection

Text clustering based on the user search intention

A Fast Clustering Algorithm for Abnormal and Short Texts

A Novel Rough Semi-Supervised K-Means Algorithm for Text Clustering

Discrete Text Classification and Clustering for Public Opinion Analysis

An Improved K-means Algorithm for Document Clustering

Research and Application of Improved Clustering Algorithm in Retail Customer Classification

An Abnormal Behavior Clustering Algorithm Based on K-means.