Abstract:Some argue that the essence of humanity, such as creativity and sentiment, can never be mimicked by machines. This paper casts doubt on this belief by studying a vital question: Can AI compose poetry as well as humans? To answer the question, we propose ProFTAP, a novel evaluation framework inspired by Turing test to assess AI's poetry writing capability. We apply it on current large language models (LLMs) and find that recent LLMs do indeed possess the ability to write classical Chinese poems nearly indistinguishable from those of humans. We also reveal that various open-source LLMs can outperform GPT-4 on this task.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is: **Can artificial intelligence create ancient poems like humans?** Specifically, the paper explores this issue through the following points: 1. **Questioning traditional views**: Some people believe that essential human characteristics such as creativity and emotion cannot be imitated by machines. This paper questions this view and attempts to verify through empirical research whether AI can create poetry like humans. 2. **Introducing the ProFTAP framework**: To answer the above - mentioned question, the author proposes a new evaluation framework named ProFTAP (Probabilistic Feigenbaum Test for AI - generated Poetry). Inspired by the Turing test, this framework aims to measure the poetry - creation ability of AI through discrimination. 3. **Experimental verification**: The author applies ProFTAP to current large - language models (LLMs) and finds that these models do indeed have the ability to create classical poems that are indistinguishable from those created by humans. In addition, the author also reveals that several open - source LLMs outperform GPT - 4 in this task. ### Main contributions 1. **Proposing the ProFTAP framework**: This is a new framework inspired by the Turing test for evaluating AI - generated poetry. It is more objective, rigorous, and easier to implement than previous manual methods. 2. **Applying ProFTAP for evaluation**: Popular LLMs are evaluated through this framework, revealing their abilities in classical - poetry generation. 3. **Showing the advantages of open - source LLMs**: Fine - tuned open - source LLMs perform excellently in the classical - poetry - generation task and can even write poems that are indistinguishable from the works of ancient poets. ### Problems solved The paper mainly solves the problem of how to objectively and scientifically evaluate the quality of AI - generated poetry. Through experiments, it is proved that some AI models are already able to imitate human creation of classical poems to a certain extent. This not only challenges the traditional view that machines cannot imitate human creativity but also provides new ideas for the future application of AI in the field of artistic creation.

Can AI Write Classical Chinese Poetry like Humans? An Empirical Study Inspired by Turing Test

Can Machine Generate Traditional Chinese Poetry? A Feigenbaum Test

Generation of Chinese classical poetry based on pre-trained model

Composing Like an Ancient Chinese Poet: Learn to Generate Rhythmic Chinese Poetry

A study of the possibilities and limitations of artificial intelligence literature

Understanding Literary Texts by LLMs: A Case Study of Ancient Chinese Poetry

CharPoet: A Chinese Classical Poetry Generation System Based on Token-free LLM

Chinese Traditional Poetry Generating System Based on Deep Learning

Artificial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry

A Pilot Study on Teaching Chinese Classics in the Era of AI: Focusing on Ancient Chinese Poetry Education with Generative AI

Chinese Poetry Generation with a Working Memory Model

Machine translation of Chinese classical poetry: a comparison among ChatGPT, Google Translate, and DeepL Translator

GPT-based Generation for Classical Chinese Poetry

A Comparison Study of Human and Machine-Generated Creativity

AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text

ChatGPT versus human essayists: an exploration of the impact of artificial intelligence for authorship and academic integrity in the humanities

Can AI Be as Creative as Humans?

The Role of AI in Human-AI Creative Writing for Hong Kong Secondary Students

A New Automatic Chinese Poetry Generation Model Based on Neural Network

A new context-aware approach for automatic Chinese poetry generation

AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably