Automatic detection of reuses and citations in literary texts
Jean-Gabriel Ganascia,Peirre Glaudes,Andrea Del Lungo,J.-G. Ganascia,P. Glaudes,A. Del Lungo
DOI: https://doi.org/10.1093/llc/fqu020
2014-06-24
Literary and Linguistic Computing
Abstract:For more than 40 years now, modern theories of literature insist on the role of paraphrases, rewritings, citations, reciprocal borrowings, and mutual contributions of many kinds. The notions of ‘intertextuality’, ‘transtextuality’, and ‘hypertextuality/hypotextuality’ were introduced in the seventies and eighties to approach these phenomena. Through the Phœbus project, computer scientists from the computer science laboratory of the University Pierre and Marie Curie collaborate with the literary teams of Paris-Sorbonne University to develop efficient tools for literary studies that take advantage of modern computer science techniques to detect borrowings of huge masses of texts and to help put them in context. In this context, we have developed a piece of software that automatically detects and explores networks of textual reuses in classical literature. This article describes the principles on which our program is based, the significant results that have already been obtained and the prospective for the near future. It is divided into four parts. The first part recalls the distinction between various types of borrowings like plagiarism, pastiches, citations, etc. The second enumerates the criteria that are retained to characterize reuses and citations on which we are focusing here. The third part describes the implementation and shows its efficiency by comparison with manual detection. Finally, we show some of the results that have already been obtained with the Phœbus program.