Abstract:The automatic text categorization technique has gained significant attention among researchers because of the increasing availability of online text information. Therefore, many different learning approaches have been designed in the text categorization field. Among them, the widely used method is the Centroid-Based Classifier (CBC) due to its theoretical simplicity and computational efficiency. However, the classification accuracy of CBC greatly depends on the data distribution. Thus it leads to a misfit model and also has poor classification performance when the data distribution is highly skewed. In this paper, a new classification model named as Gravitation Model (GM) is proposed to solve the class-imbalanced classification problem. In the training phase, each class is weighted by a mass factor, which can be learned from the training data, to indicate data distribution of the corresponding class. In the testing phase, a new document will be assigned to a particular class with the max gravitational force. The performance comparisons with CBC and its variants based on the results of experiments conducted on twelve real datasets show that the proposed gravitation model consistently outperforms CBC together with the Class-Feature-Centroid Classifier (CFC). Also, it obtains the classification accuracy competitive to the DragPushing (DP) method while it maintains a more stable performance. Thus, the proposed gravitation model is proved to be less over-fitting and has higher learning ability than CBC model.

Multi-Level Topical Text Categorization with Wikipedia

Improving semi-supervised text classification by using wikipedia knowledge

Text Categorization Based on Domain Ontology

Learning Topic Hierarchies For Wikipedia Categories

Improving Text Categorization with Semantic Knowledge in Wikipedia

Wiki3C: exploiting wikipedia for context-aware concept categorization.

Aggressive Dimensionality Reduction With Reinforcement Local Feature Selection For Text Categorization

A 6-Tuple Framework For The Current Multi-Label Text Categorization

Hierarchical Text Categorization Based on Multiple Feature Selection and Fusion of Multiple Classifiers Approaches.

TWAG: A Topic-Guided Wikipedia Abstract Generator

Interpretative Topic Categorization Via Deep Multiple Instance Learning

Open-categorical Text Classification Based on Multi-Lda Models

A High Performance Two-Class Chinese Text Categorization Method

Exploiting Textual and Visual Features for Image Categorization

Automatic Text Categorization Based on Content Analysis with Cognitive Situation Models

Language Independent Text Categorization.

Multiple-instance Learning for Text Categorization Based on Semantic Representation

Graph-Based Chinese Text Categorization

Content-Oriented Automatic Text Categorization with the Cognitive Situation Models

A New Centroid-Based Classification Model for Text Categorization

LSASGT:an Approach to Text Categorization Based on Latent Semantic Analysis and Spectral Graph Transducer