A STRING SIMILARITY CALCULATION FOR RECOGNISING KEYWORDS OF COINED PROFANITIES

Shaoqing Li,Chengrong Wu,Jianping Zeng,Yiping Zhong
DOI: https://doi.org/10.3969/j.issn.1000-386x.2015.03.036
2015-01-01
Abstract:With the development of Internet technology,there are various network applications of textual communication,such as chat rooms,BBS and so on.In order to maintain the healthy development of network environment,many applications usually filter the profanities posted by users.To avoid being filtered,some of malicious users often disguise these profanities in their information posted.How to recognise these disguised profanities is an important issue.In this paper we present an algorithm to recognise these disguised profanities by computing the string similarity of aberrant sensitive words.This algorithm has the following features:(1)the score for string similarity of disguised profanities given by this algorithm is very close to the one by human brain;(2)very low time complexity;(3)very high identification rate about disguised profanities.The algorithm determines whether to filter the suspected sensitive words or not according to the calculated similarity values.Data of experiment show that this algorithm outperforms the state-of-the-art metric of string similarity for newly coined profanities.
What problem does this paper attempt to address?