Learning Semantic Coherence for Machine Generated Spam Text Detection

Mengjiao Bao,Jianxin Li,Jian Zhang,Hao Peng,Xudong Liu
DOI: https://doi.org/10.1109/ijcnn.2019.8852340
2019-01-01
Abstract:Using machine to generate text has attracted considerable attention recently. However, low quality text generated by machine will seriously impact the user experience due to the poor readability. Traditional methods for detecting machine generated text heavily depend on hand-crafted features. While most deep learning methods for general text classification tend to model the semantic representation of topics, and thus overlook the semantic coherence that is also useful for detecting machine generated text. In this paper, we propose an end-to-end neural architecture that learns semantic coherence of text sequences. We conduct experiments on both Chinese and English datasets with more than two million articles containing manually written and machine generated ones. Results show that our method is effective and achieves the state-of-the-art performance.
What problem does this paper attempt to address?