Abstract:Now a days, the text document is spontaneously increasing over the internet, e-mail and web pages and they are stored in the electronic database format. To arrange and browse the document it becomes difficult. To overcome such problem the document preprocessing, term selection, attribute reduction and maintaining the relationship between the important terms using background knowledge, WordNet, becomes an important parameters in data mining. In these paper the different stages are formed, firstly the document preprocessing is done by removing stop words, stemming is performed using porter stemmer algorithm, word net thesaurus is applied for maintaining relationship between the important terms, global unique words, and frequent word sets get generated, Secondly, data matrix is formed, and thirdly terms are extracted from the documents by using term selection approaches tf-idf, tf-df, and tf2 based on their minimum threshold value. Further each and every document terms gets preprocessed, where the frequency of each term within the document is counted for representation. The purpose of this approach is to reduce the attributes and find the effective term selection method using WordNet for better clustering accuracy. Experiments are evaluated on Reuters Transcription Subsets, wheat, trade, money grain, and ship, Reuters 21578, Classic 30, 20 News group (atheism), 20 News group (Hardware), 20 News group (Computer Graphics) etc.

Research on Web Document Clustering Based on Sentential Maximum Frequent Word Sets

Web Documents Mining

Document Clustering Method Based on Frequent Co-occurring Words.

A spectral analysis approach to document summarization: Clustering and ranking sentences simultaneously.

Semantic document clustering based on ontology

Text Clustering Approach Based on Maximal Frequent Term Sets

A Semantic approach for effective document clustering using WordNet

A Survey of Document Clustering

A New Document Clustering Algorithm Based on Association Rule

Clustering web documents based on Multiclass spectral clustering.

Web Documents Clustering with Interest Links

A Document Ensemble Clustering Approach Via Dimensionality Reduction

A Clustering Algorithm for Short Documents Based On Concept Similarity

Mining frequent association tag sequences for clustering XML documents

A New Partitioning Based Algorithm for Document Clustering.

FICW: Frequent Itemset Based Text Clustering with Window Constraint

Short documents clustering in very large text databases

A Fuzzy Based Approach to Text Mining and Document Clustering

High-Efficiency Text Clustering Algorithm Based on Semantic Distance

An improved clustering algorithm for web document

A Semi-Supervised Text Clustering Based on Strong Classification Features Affinity Propagation