Modified DFS-based term weighting scheme for text classification
Long Chen,Liangxiao Jiang,Chaoqun Li
DOI: https://doi.org/10.1016/j.eswa.2020.114438
IF: 8.5
2021-04-01
Expert Systems with Applications
Abstract:<p>With the rapid growth of textual data on the Internet, text classification (TC) has attracted increasing attention. As a widely used text representation method, the vector space model (VSM) represents the content of a document as a vector composed of term frequency (TF) in the term space. Because different terms have different levels of importance in a document, designing an appropriate term weighting scheme is crucial to improve the performance of TC. In this study, we first conducted a comprehensive survey of the existing well-known term weighting schemes and found that they are not fully effective and that researchers are still focused on proposing new term weighting schemes. To further improve the performance of TC, we propose a new term weighting scheme based on the modified distinguishing feature selector (DFS), which we call TF-MDFS (modified DFS-based TF). Experimental results show that TF-MDFS is overall better than existing state-of-the-art term weighting schemes in terms of the classification accuracy of widely used base classifiers.</p>
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science