RealBehavior: A Framework for Faithfully Characterizing Foundation Models' Human-like Behavior Mechanisms

Enyu Zhou,Rui Zheng,Zhiheng Xi,Songyang Gao,Xiaoran Fan,Zichu Fei,Jingting Ye,Tao Gui,Qi Zhang,Xuanjing Huang
2023-10-17
Abstract:Reports of human-like behaviors in foundation models are growing, with psychological theories providing enduring tools to investigate these behaviors. However, current research tends to directly apply these human-oriented tools without verifying the faithfulness of their outcomes. In this paper, we introduce a framework, RealBehavior, which is designed to characterize the humanoid behaviors of models faithfully. Beyond simply measuring behaviors, our framework assesses the faithfulness of results based on reproducibility, internal and external consistency, and generalizability. Our findings suggest that a simple application of psychological tools cannot faithfully characterize all human-like behaviors. Moreover, we discuss the impacts of aligning models with human and social values, arguing for the necessity of diversifying alignment objectives to prevent the creation of models with restricted characteristics.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem this paper attempts to address is: how to faithfully and accurately characterize the mechanisms of human behavior that emerge in foundational models (such as large language models). Current research tends to directly apply psychological tools to explore these human behaviors, but often does not verify the applicability and fidelity of these tools on the models. Therefore, the authors propose a framework called RealBehavior, which aims to comprehensively evaluate the fidelity of human behaviors generated by models through four dimensions: reproducibility, internal consistency, external consistency, and generalization ability. Additionally, the paper discusses the impact of aligning models with human and societal values and emphasizes the importance of diversified alignment goals to avoid creating models with limited characteristics. Specifically, the paper focuses on the following aspects: 1. **Fidelity Evaluation**: Proposes a two-stage framework, first measuring the human behavior of the model, then evaluating the fidelity of the results. 2. **Applicability of Psychological Tests**: Explores whether existing psychological testing tools can be effectively applied to models and how these tools perform on the models. 3. **Changes in Behavioral Characteristics**: Analyzes the changes in personality traits of different versions of language models, particularly the trend of changes in personality trait scores as the models evolve. 4. **Generalization Ability of Behavior**: Evaluates whether the behavioral characteristics of the model can remain consistent across different interaction scenarios through occasion-based behavior testing. Through these methods, the paper aims to provide a systematic and reliable methodological framework for understanding and evaluating the human behaviors of large language models.