Detecting the Theft of Natural Language Text Using Birthmark

Jianlong Yang,Jianmin Wang,Deyi Li
DOI: https://doi.org/10.1109/IIH-MSP.2006.85
2006-01-01
Abstract:To detect the theft of natural language text effectively, we present a novel scheme to derive birthmark from the text. Since birthmark is a unique and native characteristic of every text, a text with the same birthmark of another can be easily suspected of a copy. Ideally, birthmark should satisfy two properties: (a) credibility - independent texts must be distinguished by completely different birthmarks, and (b) resilience - birthmark should be tolerant against meaningpreserving attacks. To evaluate the effectiveness of the proposed birthmark, we conduct two experiments. The first one shows that birthmark successfully distinguishes non-copied files. In the second one, it shows that birthmark has quite good a tolerance against meaning-preserving attacks.
What problem does this paper attempt to address?