Abstract:Now a days, the text document is spontaneously increasing over the internet, e-mail and web pages and they are stored in the electronic database format. To arrange and browse the document it becomes difficult. To overcome such problem the document preprocessing, term selection, attribute reduction and maintaining the relationship between the important terms using background knowledge, WordNet, becomes an important parameters in data mining. In these paper the different stages are formed, firstly the document preprocessing is done by removing stop words, stemming is performed using porter stemmer algorithm, word net thesaurus is applied for maintaining relationship between the important terms, global unique words, and frequent word sets get generated, Secondly, data matrix is formed, and thirdly terms are extracted from the documents by using term selection approaches tf-idf, tf-df, and tf2 based on their minimum threshold value. Further each and every document terms gets preprocessed, where the frequency of each term within the document is counted for representation. The purpose of this approach is to reduce the attributes and find the effective term selection method using WordNet for better clustering accuracy. Experiments are evaluated on Reuters Transcription Subsets, wheat, trade, money grain, and ship, Reuters 21578, Classic 30, 20 News group (atheism), 20 News group (Hardware), 20 News group (Computer Graphics) etc.

Massive Short Documents Classification Method Based on Frequent Term Set Clustering

Improving Short Text Classification Through Better Feature Space Selection

Improving short text classification using public search engines

A Clustering Algorithm for Short Documents Based On Concept Similarity

Short Text Classification Based on Strong Feature Thesaurus

A Semantic approach for effective document clustering using WordNet

Convolutional Long Short-term Memory for Long Length Document Classification

Fast text categorization based on collaborative work in the semantic and class spaces

Multi-documents Automatic Abstracting Based on Text Clustering and Semantic Analysis

Feature Dimension Reduction Short Text Clustering Combined with Semantic and Statistics

Clustering Massive Text Data Streams by Semantic Smoothing Model

Document Clustering Based on Semantic Smoothing Approach

Combining Lexical and Semantic Features for Short Text Classification.

Improving Medical Short Text Classification with Semantic Expansion Using Word-Cluster Embedding

Extremely Short Chinese Text Classification Method Based on Bidirectional Semantic Extension

Algorithm and Experiment Research of Textual Document Clustering Based on Improved K-means

PCCS：A FAST CLUSTERING AND CLASSIFICATION METHOD FOR WEB DOCUMENT

Text clustering based on term weights automatic partition

A Study of Classification Algorithm for Data Mining Based on Hybrid Intelligent Systems

Document Type Classification using File Names

State of the art document clustering algorithms based on semantic similarity