A Method for Eliminating Class Noise in Text Classification Based on Feature Class Attribute

WANG Qiang,GUAN Yi,WANG Xiao-Long
DOI: https://doi.org/10.16383/j.aas.2007.08.006
2007-01-01
ACTA AUTOMATICA SINICA
Abstract:This paper presents a novel algorithm for eliminating class noise based on the analysis of the feature class attribute in text classification.The algorithm can eliminate class noise for classifier by mining the most representative class information of text features,which means that the algorithm can actively prejudge the candidate class labels to unseen documents using the class attribute linked to features and classify them in the candidate class spaces to reduce the number of decisions,retrench time expense,and promote accuracy.The experimental results on Chinese and English corpus show that the algorithm has good performance.The F measure is 0.76 and 0.93,respectively,and the run efficiency of classifier has been improved greatly.A further experiment indicates that the algorithm has good expansibility.Based on a certain feedback learning strategy,the F measure can be further improved to 0.806 and 0.943.
What problem does this paper attempt to address?