IMPROVEMENT OF TF-IDF FEATURE SELECTION ALGORITHM BASED ON INDUSTRY PROPRIETARY DICTIONARY

Qixun Zhang,Hongzhi Liu,Shixiang Liu,Tang Jia,Jian Cao
DOI: https://doi.org/10.3969/j.issn.1000-386x.2017.07.051
2017-01-01
Abstract:An industry proprietary dictionary is a dictionary of industry-specific terms, it can improve the completeness of the text feature space by applying the industry proprietary dictionary to the feature selection algorithm based on TF-IDF.The key goal of TF-IDF-based improved algorithm is to extract low-frequency keywords.The existing improved method based on statistical features increases the computational complexity of the original algorithm and reduces the efficiency of the algorithm.To solve this problem, the original TF-IDF feature selection algorithm adopts lexical mapping to extract low-frequency keywords to construct a complete feature space.Experimental results show that the feature extracted by TF-IDF algorithm based on industry proprietary dictionary can improve the recall and precision of clustering effectively in the following secondary clustering verification experiments compared with the feature extracted without using the industry proprietary dictionary feature selection algorithm.
What problem does this paper attempt to address?