Sockpuppet Detection in Wikipedia: A Corpus of Real-World Deceptive Writing for Linking Identities

Thamar Solorio,Ragib Hasan,Mainul Mizan
DOI: https://doi.org/10.48550/arXiv.1310.6772
2013-10-25
Abstract:This paper describes the corpus of sockpuppet cases we gathered from Wikipedia. A sockpuppet is an online user account created with a fake identity for the purpose of covering abusive behavior and/or subverting the editing regulation process. We used a semi-automated method for crawling and curating a dataset of real sockpuppet investigation cases. To the best of our knowledge, this is the first corpus available on real-world deceptive writing. We describe the process for crawling the data and some preliminary results that can be used as baseline for benchmarking research. The dataset will be released under a Creative Commons license from our project website: <a class="link-external link-http" href="http://docsig.cis.uab.edu" rel="external noopener nofollow">this http URL</a>.
Computation and Language,Cryptography and Security,Computers and Society
What problem does this paper attempt to address?