Automating psychological hypothesis generation with AI: when large language models meet causal graph

Song Tong,Kai Mao,Zhen Huang,Yukun Zhao,Kaiping Peng
DOI: https://doi.org/10.31234/osf.io/7ck9m
2024-07-16
Abstract:Leveraging the synergy between causal knowledge graphs and a large language model (LLM), our study introduces a groundbreaking approach for computational hypothesis generation in psychology. We analyzed 43,312 psychology articles using a LLM to extract causal relation pairs. This analysis produced a specialized causal graph for psychology. Applying link prediction algorithms, we generated 130 potential psychological hypotheses focusing on `well-being', then compared them against research ideas conceived by doctoral scholars and those produced solely by the LLM. Interestingly, our combined approach of a LLM and causal graphs mirrored the expert-level insights in terms of novelty, clearly surpassing the LLM-only hypotheses (t(59) = 3.34, p=0.007 and t(59) = 4.32, p<0.001, respectively). This alignment was further corroborated using deep semantic analysis. Our results show that combining LLM with machine learning techniques such as causal knowledge graphs can revolutionize automated discovery in psychology, extracting novel insights from the extensive literature. This work stands at the crossroads of psychology and artificial intelligence, championing a new enriched paradigm for data-driven hypothesis generation in psychological research.
Artificial Intelligence,Computers and Society
What problem does this paper attempt to address?
The paper aims to address the issue of automating hypothesis generation in psychological research, specifically by combining large language models (LLMs) with causal graphs to achieve automatic generation of hypotheses in the field of psychology. To achieve this goal, the research team conducted the following work: 1. **Literature Analysis**: Collected 43,312 psychology-related articles from public databases and used large language models to extract causal relationship pairs from them, thereby constructing a specialized psychological causal graph. 2. **Hypothesis Generation**: Applied link prediction algorithms to generate potential psychological hypotheses on the causal graph. The research focused on hypotheses related to "well-being." 3. **Comparative Evaluation**: Compared the generated hypotheses with research ideas conceived by doctoral students and hypotheses generated solely by large language models to assess their novelty and practicality. The results showed that the method combining large language models with causal graphs was comparable to expert levels in generating novelty and significantly outperformed hypotheses generated solely by large language models. Additionally, in-depth semantic analysis confirmed the advantages of this method in conceptual integration and semantic scope. This study not only demonstrates the ability to extract causal knowledge from a large body of literature but also provides a new tool and methodology to promote data-driven hypothesis generation in the field of psychology. By combining traditional theory-driven research methods with emerging data-centric research paradigms, this study enriches our understanding of factors influencing psychology, particularly in the field of social psychology.