A New Feature Weighting Method Based on Probability Distribution in Imbalanced Text Classification

Leilei Chu,Hui Gao,Wenbo Chang
DOI: https://doi.org/10.1109/fskd.2010.5569830
2010-01-01
Abstract:Many real-world text classification tasks involve imbalanced training examples. Categories with fewer examples are under-represented and their classifiers often perform far below satisfactory. We propose a new approach using a probability distribution to assign the feature weight and apply it to Naive Bayes classifier. The method is evaluated in our experiments on FuDan Chinese Corpus. The experimental result shows significant improvement for imbalanced datasets while the performance for balanced datasets is not jeopardized. Our approach has suggested a simple and effective solution to boost the performance of text classification over skewed datasets.
What problem does this paper attempt to address?