GPTEval: A Survey on Assessments of ChatGPT and GPT-4

Rui Mao,Guanyi Chen,Xulang Zhang,Frank Guerin,Erik Cambria
2023-08-24
Abstract:The emergence of ChatGPT has generated much speculation in the press about its potential to disrupt social and economic systems. Its astonishing language ability has aroused strong curiosity among scholars about its performance in different domains. There have been many studies evaluating the ability of ChatGPT and GPT-4 in different tasks and disciplines. However, a comprehensive review summarizing the collective assessment findings is lacking. The objective of this survey is to thoroughly analyze prior assessments of ChatGPT and GPT-4, focusing on its language and reasoning abilities, scientific knowledge, and ethical considerations. Furthermore, an examination of the existing evaluation methods is conducted, offering several recommendations for future research in evaluating large language models.
Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to comprehensively evaluate the capabilities of the two large - language models, ChatGPT and GPT - 4. Specifically, the paper focuses on the following aspects: 1. **Language and reasoning abilities**: Analyze the performance of ChatGPT and GPT - 4 in natural language processing tasks such as dialogue, text generation, sentiment analysis, and information retrieval, as well as their abilities in logical reasoning, common - sense reasoning, and causal reasoning. 2. **Scientific knowledge**: Evaluate the knowledge levels of these two models in formal and natural science fields such as mathematics, computer science, physics, chemistry, and medicine. 3. **Ethical considerations**: Explore the ethical issues of ChatGPT and GPT - 4 in terms of fairness, robustness, reliability, and toxicity. Through these evaluations, the paper aims to provide a comprehensive perspective to understand the advantages and limitations of ChatGPT and GPT - 4 and provide guidance and suggestions for future research.