A Comprehensive Survey of Text Classification Techniques and Their Research Applications: Observational and Experimental Insights

Kamal Taha,Paul D. Yoo,Chan Yeun,Aya Taha
2024-11-25
Abstract:The exponential growth of textual data presents substantial challenges in management and analysis, notably due to high storage and processing costs. Text classification, a vital aspect of text mining, provides robust solutions by enabling efficient categorization and organization of text data. These techniques allow individuals, researchers, and businesses to derive meaningful patterns and insights from large volumes of text. This survey paper introduces a comprehensive taxonomy specifically designed for text classification based on research fields. The taxonomy is structured into hierarchical levels: research field-based category, research field-based sub-category, methodology-based technique, methodology sub-technique, and research field applications. We employ a dual evaluation approach: empirical and experimental. Empirically, we assess text classification techniques across four critical criteria. Experimentally, we compare and rank the methodology sub-techniques within the same methodology technique and within the same overall research field sub-category. This structured taxonomy, coupled with thorough evaluations, provides a detailed and nuanced understanding of text classification algorithms and their applications, empowering researchers to make informed decisions based on precise, field-specific insights.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges in text data management and analysis, especially in the case of high storage and processing costs. With the exponential growth of text data, how to efficiently classify and organize this data has become a key issue. As an important means of text mining, text classification technology can help researchers, enterprises and individuals extract meaningful patterns and insights from it by effectively classifying and organizing a large amount of text data. Specifically, this paper aims to: 1. **Provide a comprehensive review of text classification techniques**: Conduct a detailed evaluation of various text classification methods through a systematic review of existing literature. 2. **Propose a taxonomy based on the research field**: This taxonomy is divided into multiple levels, including research field categories, sub - categories, methodological techniques and their sub - techniques, and applications. This taxonomy helps to compare different algorithms more precisely and provides researchers with a more detailed understanding. 3. **Conduct empirical and experimental evaluations**: Empirically evaluate text classification techniques through four key criteria, and rank and compare sub - techniques within the same methodological techniques and research field sub - categories. 4. **Highlight the advantages and limitations of various algorithms**: Through detailed evaluation, reveal the performance of different text classification techniques in practical applications, thereby helping researchers choose the method most suitable for their specific tasks. In general, the goal of this paper is to provide researchers with a structured framework and in - depth insights to better understand and apply modern text classification algorithms. This not only helps to improve the efficiency of text data management, but also promotes further research and development in related fields.