A Chinese Short Text Classification Method Based on TF-IDF and Gradient Boosting Decision Tree

Yanming Cheng,Zhigang Yu,Je Hu,Mingchuan Yang
DOI: https://doi.org/10.1109/icicml57342.2022.10009824
2022-01-01
Abstract:To solve the problem of feature extraction and semantic sparsity in Chinese short text classification, this paper uses TF-IDF algorithm to extract category keywords and uses the set of category keywords as the feature set of short text classification. Next, the weight of keyword features is obtained by calculating the maximum similarity between the category keywords and each word in the essay. Based on the weighted keyword feature vector set, the short text is represented by vectors. Finally, we use the GBDT algorithm to train the classifier for short text classification and carry out experiments to verify the effectiveness of this method in improving the classification effect.
What problem does this paper attempt to address?