Chinese underground market jargon analysis based on unsupervised learning

Zhao Kangzhi,Zhang Yong,Xing Chunxiao,Li Weifeng,Chen Hsinchun
DOI: https://doi.org/10.1109/ISI.2016.7745450
2016-01-01
Abstract:With the rapid growth of online population, China has become the world's largest online market. This also gives rise to the Chinese underground market, which has facilitated many of the cybercrimes in China. Consequently, there is a need for research scrutinizing Chinese underground markets. One major challenge facing cybersecurity researchers is to understand the unfamiliar cybercriminal jargons. To this end, we are motivated to analyze jargons in Chinese underground market. Particularly, we utilize the recent advancements in unsupervised machine learning methods, word embedding and Latent Dirichlet Allocation. We evaluate our work on a research testbed encompassing 29 exclusive underground market QQ groups with 23,000 members. Specifically, we test the ability of the proposed approach to learn semantically similar words of known cybersecurity-related jargons. Results suggest the state-of-the-art unsupervised learning approaches can help better understand cybercriminal language, providing promising insights for future research on Chinese underground markets.
What problem does this paper attempt to address?