Abstract:Cognitive psychology delves on understanding perception, attention, memory, language, problem-solving, decision-making, and reasoning. Large language models (LLMs) are emerging as potent tools increasingly capable of performing human-level tasks. The recent development in the form of GPT-4 and its demonstrated success in tasks complex to humans exam and complex problems has led to an increased confidence in the LLMs to become perfect instruments of intelligence. Although GPT-4 report has shown performance on some cognitive psychology tasks, a comprehensive assessment of GPT-4, via the existing well-established datasets is required. In this study, we focus on the evaluation of GPT-4's performance on a set of cognitive psychology datasets such as CommonsenseQA, SuperGLUE, MATH and HANS. In doing so, we understand how GPT-4 processes and integrates cognitive psychology with contextual information, providing insight into the underlying cognitive processes that enable its ability to generate the responses. We show that GPT-4 exhibits a high level of accuracy in cognitive psychology tasks relative to the prior state-of-the-art models. Our results strengthen the already available assessments and confidence on GPT-4's cognitive psychology abilities. It has significant potential to revolutionize the field of AI, by enabling machines to bridge the gap between human and machine reasoning.

What problem does this paper attempt to address?

The paper primarily explores the performance and capabilities of GPT-4, an advanced large language model, in cognitive psychology tasks. The research team aims to gain a deep understanding of how GPT-4 handles and integrates tasks related to cognitive psychology by evaluating its performance on multiple benchmark datasets and revealing the underlying cognitive processes. Specifically, the paper focuses on the following points: 1. **Research Background**: It first introduces the research goals and methods of cognitive psychology, as well as the development of large language models (such as GPT-4) in recent years. These models have garnered widespread attention due to their powerful natural language processing capabilities. 2. **Research Purpose**: The main purpose of the paper is to evaluate GPT-4's performance in cognitive psychology tasks to verify whether it can exhibit human-level intelligence. This includes, but is not limited to, abilities in commonsense reasoning, solving mathematical problems, and text comprehension. 3. **Dataset Selection**: To comprehensively assess GPT-4's cognitive psychology capabilities, the researchers selected four key datasets for testing, which are: - **CommonsenseQA**: Used to test commonsense reasoning ability. - **MATH**: Contains a large number of mathematical problems aimed at testing the model's ability to solve math problems. - **SuperGLUE**: A benchmark test set covering various natural language understanding tasks, with a high level of difficulty. - **HANS**: Used to detect whether the model relies on shallow syntax or lexical overlap and other simple rules for reasoning. 4. **Experimental Results**: The results show that GPT-4 performed excellently on all the aforementioned datasets, especially achieving an accuracy of 83.2% on CommonsenseQA and 91.2% on SuperGLUE. Additionally, on the HANS dataset, GPT-4 achieved 100% accuracy, but the authors noted that this perfect score might be due to the dataset containing only non-entailment type samples. 5. **Conclusion**: The paper concludes that GPT-4's outstanding performance in cognitive psychology tasks demonstrates its high level of intelligence, which could have significant implications for the future development of artificial intelligence, particularly in understanding and simulating human cognitive processes. The research also emphasizes the importance of further testing and evaluation of GPT-4 to better understand its capabilities and limitations. In summary, through a series of carefully designed experiments, the paper showcases GPT-4's powerful capabilities in the field of cognitive psychology and envisions its future applications in psychology and other related fields.

Mind meets machine: Unravelling GPT-4's cognitive psychology

Using cognitive psychology to understand GPT-3

Putting GPT-4o to the Sword: A Comprehensive Evaluation of Language, Vision, Speech, and Multimodal Proficiency

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Thinking Fast and Slow in Large Language Models

GPT-4 Surpassing Human Performance in Linguistic Pragmatics

GPT as Psychologist? Preliminary Evaluations for GPT-4V on Visual Affective Computing

GPT-4o reads the mind in the eyes

GPT is an effective tool for multilingual psychological text analysis

The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4

The model student: GPT-4 performance on graduate biomedical science exams

Can GPT Redefine Medical Understanding? Evaluating GPT on Biomedical Machine Reading Comprehension

Using large language models in psychology

Towards a Psychology of Machines: Large Language Models Predict Human Memory

An Early Evaluation of GPT-4V(ision)

Cognitive Effects in Large Language Models

An Eye for an AI: Evaluating GPT-4o's Visual Perception Skills and Geometric Reasoning Skills Using Computer Graphics Questions

LLM Cognitive Judgements Differ From Human

Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models

GPTEval: A Survey on Assessments of ChatGPT and GPT-4