AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

Ximing Lu,Melanie Sclar,Skyler Hallinan,Niloofar Mireshghallah,Jiacheng Liu,Seungju Han,Allyson Ettinger,Liwei Jiang,Khyathi Chandu,Nouha Dziri,Yejin Choi
2024-10-06
Abstract:Creativity has long been considered one of the most difficult aspect of human intelligence for AI to mimic. However, the rise of Large Language Models (LLMs), like ChatGPT, has raised questions about whether AI can match or even surpass human creativity. We present CREATIVITY INDEX as the first step to quantify the linguistic creativity of a text by reconstructing it from existing text snippets on the web. CREATIVITY INDEX is motivated by the hypothesis that the seemingly remarkable creativity of LLMs may be attributable in large part to the creativity of human-written texts on the web. To compute CREATIVITY INDEX efficiently, we introduce DJ SEARCH, a novel dynamic programming algorithm that can search verbatim and near-verbatim matches of text snippets from a given document against the web. Experiments reveal that the CREATIVITY INDEX of professional human authors is on average 66.2% higher than that of LLMs, and that alignment reduces the CREATIVITY INDEX of LLMs by an average of 30.1%. In addition, we find that distinguished authors like Hemingway exhibit measurably higher CREATIVITY INDEX compared to other human writers. Finally, we demonstrate that CREATIVITY INDEX can be used as a surprisingly effective criterion for zero-shot machine text detection, surpassing the strongest existing zero-shot system, DetectGPT, by a significant margin of 30.2%, and even outperforming the strongest supervised system, GhostBuster, in five out of six domains.
Computation and Language
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is to evaluate and quantify the text - creation ability of large language models (LLMs), especially whether their language creativity can be on a par with or even surpass that of humans. Specifically, the researchers proposed a new metric named **C REATIVITY INDEX** to measure the creativity of text and calculate this metric by comparing machine - generated text with existing text on the web. The following are the main problems of the paper and their solutions: ### Research Background and Motivation 1. **The Challenge of Creativity**: - Creativity has always been considered one of the most complex aspects of human intelligence that are difficult for artificial intelligence (AI) to imitate. - With the emergence of large language models (such as ChatGPT), people have begun to question whether AI can match or surpass human creativity. 2. **Limitations of Existing Methods**: - Previous studies have attempted to quantify creativity in writing through specific scoring criteria and human evaluation, but these methods are difficult to be applied on a large scale to evaluate the large amount of text generated by LLMs. ### Proposed Method 3. **Introduction of C REATIVITY INDEX**: - **Definition**: The C REATIVITY INDEX is a new statistical measurement method, aiming to quantify the language creativity by the degree of reconstructing a given text from the existing web text. - **Assumption**: The "creativity" of LLMs may be mainly attributed to the creativity of the large amount of human - written web text they use, rather than true originality. 4. **DJ S EARCH Algorithm**: - **Purpose**: In order to efficiently calculate the C REATIVITY INDEX, the researchers introduced DJ S EARCH, a new algorithm based on dynamic programming, which can quickly search for verbatim and approximately verbatim matches on the web for text fragments in a given document. - **Implementation**: This algorithm combines strict verbatim matching (using Infini - gram) and semantic similarity matching based on word embeddings (using Word Mover’s Distance), thus improving the search efficiency. ### Experimental Results 5. **Experimental Design and Findings**: - **Comparison Objects**: The researchers compared the texts generated by professional human authors, classic literary writers (such as Hemingway), and multiple LLMs. - **Results**: - The C REATIVITY INDEX of human authors is on average 66.2% higher than that of LLMs. - Reinforcement learning from human feedback (RLHF) significantly reduces the creativity index of LLMs, with an average reduction of 30.1%. - Classic literary writers (such as Hemingway) show a higher level of creativity. 6. **Application Prospects**: - The C REATIVITY INDEX can not only be used as an effective criterion for zero - sample machine text detection, but also can surpass existing supervised learning systems in multiple fields. ### Conclusion This paper provides a new perspective and tool for evaluating and understanding the creativity of LLMs by proposing the C REATIVITY INDEX and DJ S EARCH algorithm. This not only helps to gain a deeper understanding of the capabilities and limitations of LLMs, but also provides a valuable foundation for future research and applications. --- Hope this summary can help you understand the core problems of this paper and their solutions. If you have more specific questions or need further information, please feel free to let me know!