Abstract:Data is the most valuable asset in any firm. As time passes, the data expands at a breakneck speed. A major research issue is the extraction of meaningful information from a complex and huge data source. Clustering is one of the data extraction methods. The basic K-Mean and Parallel K-Mean partition clustering algorithms work by picking random starting centroids. The basic and K-Mean parallel clustering methods are investigated in this work using two different datasets with sizes of 10000 and 5000, respectively. The findings of the Simple K-Mean clustering algorithms alter throughout numerous runs or iterations, according to the study, and so iterations differ for each run or execution. In some circumstances, the clustering algorithms’ outcomes are always different, and the algorithms separate and identify unique properties of the K-Mean Simple clustering algorithm from the K-Mean Parallel clustering algorithm. Differentiating these features will improve cluster quality, lapsed time, and iterations. Experiments are designed to show that parallel algorithms considerably improve the Simple K-Mean techniques. The findings of the parallel techniques are also consistent; however, the Simple K-Mean algorithm’s results vary from run to run. Both the 10,000 and 5000 data item datasets are divided into ten subdatasets for ten different client systems. Clusters are generated in two iterations, i.e., the time it takes for all client systems to complete one iteration (mentioned in chapter number 4). In the first execution, Client No. 5 has the longest elapsed time (8 ms), whereas the longest elapsed time in the following iterations is 6 ms, for a total elapsed time of 12 ms for the K-Mean clustering technique. In addition, the Parallel algorithms reduce the number of executions and the time it takes to complete a task.

Clustering Unstructured Data (Flat Files) - An Implementation in Text Mining Tool

Document Clustering Using Locality Preserving Indexing

Clustering Text Data Streams

A Fuzzy Based Approach to Text Mining and Document Clustering

From Free Text to Clusters of Content in Health Records: An Unsupervised Graph Partitioning Approach

Clustering Algorithm on Block Division of Documents

An Analytical Study on Behavior of Clusters Using K Means, EM and K* Means Algorithm

A Text Clustering Algorithm to Detect Basic Level Categories in Texts

Extracting information from free text through unsupervised graph-based clustering: an application to patient incident records

State of the art document clustering algorithms based on semantic similarity

A Short Text Clustering Approaches in Social Media

An Efficient Clustering Algorithm for Small Text Documents

TextLuas: Tracking and Visualizing Document and Term Clusters in Dynamic Text Data

Constrained Coclustering for Textual Documents.

Extract List Data from Semi-Structured Document Using Clustering

Clustering Massive Text Data Streams by Semantic Smoothing Model

Performance Evaluation of Simple K-Mean and Parallel K-Mean Clustering Algorithms: Big Data Business Process Management Concept

Sentimental Analysis on Text data by using Unsupervised Methods

DIAS: A Disassemble-Assemble Framework for Highly Sparse Text Clustering

Hierarchical Clustering Algorithms for Document Datasets

Document Clustering Based on Semantic Smoothing Approach