Discovering New Sensitive Words Based on Sensitive Information Categorization.

Panyu Liu,Yangyang Li,Zhiping Cai,Shuhui Chen
DOI: https://doi.org/10.1007/978-3-030-24274-9_30
2019-01-01
Abstract:Sensitive word detection has popped out nowadays as the prosperity of internet technologies emerges. At the same time, some internet users diffuse sensitive contents which contains unhealthy information. But how to improve sensitive information classification accuracy and find new sensitive words has been an urgent demand in the network information security. On the one hand, the sensitive information classification result inaccurate, on the other hand, all the research methods can not find the new sensitive information, in other word, it does not automatically identify new sensitive information. We mainly improved the existing outstanding machine learning classification algorithm, experimental results show that this method can significantly improve the classification accuracy. Beside, by researching word similarity algorithm base on HowNet and CiLin, we can realize expanding the database of sensitive words continually (i.e., discovery the new sensitive word). Through the methodologies mentioned above, we have got a better accuracy and realized new sensitive word discovery technology which will be analyzed and presented in the paper.
What problem does this paper attempt to address?