Classification for Text Data from the Power System Based on Improving Naive Bayes

G. Liang,Y. G. Yan,M. Wang,X. L. Lian,M. S. Li,W. H. Tang
DOI: https://doi.org/10.1109/appeec48164.2020.9220634
2020-01-01
Abstract:After years of operation of the power system, a large amount of text data has been accumulated, and it is particularly important to analyze them such as the violation data. In this context, this paper introduced a novel classification method, Improving Naive Bayes Based on Improving Term Frequency-Inverse Document Frequency (ITF-IDF), which aims to categorize the text and reduce the costs of labor analysis. The classification of the violation data which including personal behavior, instrument, security activities, supervision and two-ticket data. To increase the classification accuracy, the proposed method improved the weight of Naive Bayes, namely ITF-TDF. In the experimental studies, the Improving Naive Bayes is evaluated on the test data of spam message which is a binary classification and the violation data from the power system as multi-classification, and is compared with the classifiers based on conventional Naive Bayes, the Logistic Regression and the Support Vector Machine (SVM), respectively. The results demonstrate that the proposed method has a better performance than the other methods.
What problem does this paper attempt to address?