Abstract:Text document clustering can play a vital role in organizing and handling the everincreasing number of text documents. Uninformative and redundant features included in large text documents reduce the effectiveness of the clustering algorithm. Feature selection (FS) is a well-known technique for removing these features. Since FS can be formulated as an optimization problem, various meta-heuristic algorithms have been employed to solve it. Teaching-Learning-Based Optimization (TLBO) is a novel meta-heuristic algorithm that benefits from the low number of parameters and fast convergence. A hybrid method can simultaneously benefit from the advantages of TLBO and tackle the possible entrapment in the local optimum. By proposing a hybrid of TLBO, Grey Wolf Optimizer (GWO), and Genetic Algorithm (GA) operators, this paper suggests a filter-based FS algorithm (TLBO-GWO). Six benchmark datasets are selected, and TLBO-GWO is compared with three recently proposed FS algorithms with similar approaches, the main TLBO and GWO. The comparison is conducted based on clustering evaluation measures, convergence behavior, and dimension reduction, and is validated using statistical tests. The results reveal that TLBO-GWO can significantly enhance the effectiveness of the text clustering technique (K-means).

What problem does this paper attempt to address?

The paper attempts to address the inefficiency and redundancy issues in text feature selection and clustering. Specifically, large text data contains a significant amount of non-informative or redundant features, which can reduce the effectiveness of clustering algorithms. Therefore, the paper proposes a new hybrid algorithm based on Teaching-Learning-Based Optimization (TLBO) and Grey Wolf Optimization (GWO) (TLBO-GWO) for text feature selection, and uses the K-means clustering method to improve the effectiveness of text clustering. ### Main Issues of the Paper 1. **Text Feature Selection**: Text data contains a large number of non-informative and redundant features, which can degrade the performance of clustering algorithms. 2. **Improvement of Clustering Effectiveness**: Existing feature selection methods perform poorly when dealing with high-dimensional text data, requiring a more effective feature selection method to improve clustering effectiveness. ### Solution 1. **Hybrid Algorithm (TLBO-GWO)**: - **TLBO**: Teaching-Learning-Based Optimization algorithm, which has the advantages of fewer parameters and fast convergence. - **GWO**: Grey Wolf Optimization algorithm, which can effectively avoid local optima. - **Genetic Operators**: Introduce crossover and mutation operations from genetic algorithms to enhance the exploration and exploitation capabilities of the algorithm. 2. **Feature Selection Process**: - Preprocess each document, including tokenization, stop-word removal, stemming, and term weighting. - Use the TLBO-GWO algorithm to select the most informative features for each document. - Merge the selected features of all documents to form a global feature subset. 3. **Clustering Process**: - Use the K-means clustering algorithm to cluster the updated text dataset. - Validate the effectiveness of the algorithm through clustering evaluation metrics, convergence behavior, and dimensionality reduction capability. ### Experimental Validation The paper selected six benchmark datasets and compared TLBO-GWO with three recently proposed feature selection algorithms as well as the main TLBO and GWO. The results show that TLBO-GWO exhibits significant advantages in clustering effectiveness, convergence behavior, and dimensionality reduction. ### Conclusion The paper proposes a new hybrid algorithm based on TLBO and GWO for text feature selection and clustering. Experimental results show that the algorithm can significantly improve the effectiveness of text clustering, especially when dealing with high-dimensional text data.

An enhanced Teaching-Learning-Based Optimization (TLBO) with Grey Wolf Optimizer (GWO) for text feature selection and clustering

Chaotic diffusion‐limited aggregation enhanced grey wolf optimizer: Insights, analysis, binarization, and feature selection

A comprehensive unsupervised feature selection method of two-stage strategy

Software Module Clustering based on the Fuzzy Adaptive Teaching Learning based Optimization Algorithm

A novel hybrid multi-verse optimizer with K-means for text documents clustering

A Velocity-Guided Grey Wolf Optimization Algorithm With Adaptive Weights and Laplace Operators for Feature Selection in Data Classification

Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering

Kernel fuzzy C- means clustering with teaching learning based optimization algorithm (TLBO-KFCM)

Augmented weighted K-means grey wolf optimizer: An enhanced metaheuristic algorithm for data clustering problems

An enhanced teaching-learning-based optimization algorithm with self-adaptive and learning operators and its search bias towards origin

Improving Teaching–learning-Based-optimization Algorithm by a Distance-Fitness Learning Strategy

Hybrid Tabu-Grey wolf optimizer algorithm for enhancing fresh cold-chain logistics distribution

Unsupervised text feature selection by binary fire hawk optimizer for text clustering

A Nature Inspired Hybrid Partitional Clustering Method Based on Grey Wolf Optimization and JAYA Algorithm

Grey Wolf Optimization Algorithm Based on Follow-Controlled Learning Strategy

A feature selection method based on the Golden Jackal-Grey Wolf Hybrid Optimization Algorithm

Grey Wolf Optimization Algorithm: A Survey

Multi-objective Binary Grey Wolf Optimization for Feature Selection Based on Guided Mutation Strategy

Explorative Binary Gray Wolf Optimizer with Quadratic Interpolation for Feature Selection

Orthogonal Learning Covariance Matrix for Defects of Grey Wolf Optimizer: Insights, Balance, Diversity, and Feature Selection

Optimization of fuzzy c-means (FCM) clustering in cytology image segmentation using the gray wolf algorithm