Abstract:As the performance of larger, newer Large Language Models continues to improve for strategic Theory of Mind (ToM) tasks, the demand for these state-of-the-art models increases commensurately. However, their deployment is costly both in terms of processing power and time. In this paper, we investigate the feasibility of creating smaller, highly-performing specialized algorithms by way of fine-tuning. To do this, we first present a large pre-trained model with 20 unique scenarios that combine different social contexts with games of varying social dilemmas, record its answers, and use them for Q&A fine-tuning on a smaller model of the same family. Our focus is on in-context game-theoretic decision-making, the same domain within which human interaction occurs and that requires both a theory of mind (or a semblance thereof) and an understanding of social dynamics. The smaller model is therefore trained not just on the answers provided, but also on the motivations provided by the larger model, which should contain advice and guidelines to navigate both strategic dilemmas and social cues. We find that the fine-tuned smaller language model consistently bridged the gap in performance between the smaller pre-trained version of the model and its larger relative and that its improvements extended in areas and contexts beyond the ones provided in the training examples, including on out-of-sample scenarios that include completely different game structures. On average for all games, through fine-tuning, the smaller model showed a 46% improvement measured as alignment towards the behavior of the larger model, with 100% representing indistinguishable behavior. When presented with out-of-sample social contexts and games, the fine-tuned model still displays remarkable levels of alignment, reaching an improvement of 18% and 28% respectively.

Too Big to Fool: Resisting Deception in Language Models

Large Language Models can Strategically Deceive their Users when Put Under Pressure

Large Language Models as Misleading Assistants in Conversation

Larger and more instructable language models become less reliable

Deception Abilities Emerged in Large Language Models

An Assessment of Model-On-Model Deception

How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts

Large Model Strategic Thinking, Small Model Efficiency: Transferring Theory of Mind in Large Language Models

Large Language Models Can Be Easily Distracted by Irrelevant Context

Do Large Language Models Exhibit Cognitive Dissonance? Studying the Difference Between Revealed Beliefs and Stated Answers

Too Big to Fail: Larger Language Models are Disproportionately Resilient to Induction of Dementia-Related Linguistic Anomalies

Large Language Models Are Also Good Prototypical Commonsense Reasoners

Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

Turning large language models into cognitive models

Why Larger Language Models Do In-context Learning Differently?

Larger Language Models Don't Care How You Think: Why Chain-of-Thought Prompting Fails in Subjective Tasks

Nevermind: Instruction Override and Moderation in Large Language Models

Large Language Models with Controllable Working Memory

Exploiting Large Language Models (LLMs) through Deception Techniques and Persuasion Principles