Plagiarism Judgment Based on Language Model and Feature Classification

LI Hui,LIU Ying
DOI: https://doi.org/10.3969/j.issn.1000-3428.2013.05.051
2013-01-01
Abstract:The protection of copyright property arouses much attention in the present information age.Aiming at the dispute problem caused by the text similarity between some novels,this paper proposes a method based on language model and feature classification,with statistics of coincidences and the proportion of pos to analyze the grammatical collocations and the coincidences.The methods of Principal Component Analysis(PCA) and Random Forest(RF) used to extract characteristics for automatic classification are added into experiments.The result of machine learning shows that the method can effectively identify whether novels exist plagiarism phenomenon.
What problem does this paper attempt to address?