Eliciting the Priors of Large Language Models using Iterated In-Context Learning

Jian-Qiao Zhu,Thomas L. Griffiths
2024-06-04
Abstract:As Large Language Models (LLMs) are increasingly deployed in real-world settings, understanding the knowledge they implicitly use when making decisions is critical. One way to capture this knowledge is in the form of Bayesian prior distributions. We develop a prompt-based workflow for eliciting prior distributions from LLMs. Our approach is based on iterated learning, a Markov chain Monte Carlo method in which successive inferences are chained in a way that supports sampling from the prior distribution. We validated our method in settings where iterated learning has previously been used to estimate the priors of human participants -- causal learning, proportion estimation, and predicting everyday quantities. We found that priors elicited from GPT-4 qualitatively align with human priors in these settings. We then used the same method to elicit priors from GPT-4 for a variety of speculative events, such as the timing of the development of superhuman AI.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to extract the implicit knowledge from large - language models (LLMs), especially the background knowledge on which these models rely when making decisions. Specifically, the authors adopted an iterative - learning - based method to extract the prior distributions of LLMs. This method, through a series of inference tasks, gradually guides the data generated by the model to reflect its internal prior knowledge, similar to the techniques used in psychological research to extract the prior knowledge of human participants. The main objectives of the paper can be summarized as follows: 1. **Understanding the decision - making process of LLMs**: As LLMs are increasingly applied in the real world, it is particularly important to understand how they use background knowledge to make decisions. This involves not only technical understanding but also the reliability and safety of these models in fields such as medicine, finance, and law. 2. **Developing a method for extracting prior distributions**: The authors proposed an iterative - learning - based prompt workflow for extracting prior distributions from LLMs. This method utilizes the Markov Chain Monte Carlo (MCMC) method to directly sample the prior distribution through continuous inference tasks. 3. **Verifying the effectiveness of the method**: The authors verified their method on multiple tasks, including causal learning, ratio estimation, and prediction of daily quantities. The experimental results show that the prior distributions extracted from GPT - 4 are qualitatively consistent with those of humans. 4. **Exploring the priors of LLMs for speculative events**: In addition to tasks with known human prior distributions, the authors also attempted to use the same method to extract the priors of GPT - 4 for some speculative events, such as the development time of superhuman AI, the time to achieve zero - carbon emissions, and the establishment time of Mars colonization. The results show that GPT - 4 has reasonable prior distributions for these speculative events. In conclusion, this paper aims to reveal the implicit knowledge on which large - language models rely when making decisions through an innovative iterative - learning method, thereby enhancing our understanding and control ability of these models.