Self-playing Adversarial Language Game Enhances LLM Reasoning

Pengyu Cheng,Tianhao Hu,Han Xu,Zhisong Zhang,Yong Dai,Lei Han,Nan Du
2024-05-23
Abstract:We explore the self-play training procedure of large language models (LLMs) in a two-player adversarial language game called Adversarial Taboo. In this game, an attacker and a defender communicate around a target word only visible to the attacker. The attacker aims to induce the defender to speak the target word unconsciously, while the defender tries to infer the target word from the attacker's utterances. To win the game, both players should have sufficient knowledge about the target word and high-level reasoning ability to infer and express in this information-reserved conversation. Hence, we are curious about whether LLMs' reasoning ability can be further enhanced by self-play in this adversarial language game (SPAG). With this goal, we select several open-source LLMs and let each act as the attacker and play with a copy of itself as the defender on an extensive range of target words. Through reinforcement learning on the game outcomes, we observe that the LLMs' performances uniformly improve on a broad range of reasoning benchmarks. Furthermore, iteratively adopting this self-play process can continuously promote LLMs' reasoning abilities. The code is at
Computation and Language,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the insufficient reasoning ability of large - language models (LLMs) in complex problem - solving and the development of high - level intelligence. Although existing large - language models such as GPT - 4 and Gemini perform excellently in natural - language understanding, text generation, machine translation, and programming, they still face challenges in reasoning ability, especially in terms of correctness and faithfulness. To address this challenge, researchers have tried various methods, such as Chain - of - Thought (CoT) prompt engineering and the use of auxiliary reasoning tools, but these methods often require additional prompt design and are sensitive to different prompt patterns and LLM checkpoints. To solve these problems, this paper proposes a new self - training method to enhance the reasoning ability of LLMs by allowing them to play against themselves in an adversarial language game. Specifically, the method selects a language game named "Adversarial Taboo", in which two players (an attacker and a defender) have a conversation around a target word known only to the attacker. The attacker's aim is to induce the defender to unconsciously say the target word, while the defender tries to infer the target word from the conversation history. To win the game, both sides need to have sufficient knowledge about the target word and a high level of reasoning ability. Through this method, researchers hope to explore whether the reasoning ability of LLMs can be further enhanced through Self - Play in Adversarial language Game (SPAG). The experimental results show that after multiple rounds of self - play training, the performance of LLMs on multiple reasoning benchmark tests has significantly improved, especially in logical reasoning, common - sense understanding, and complex problem - solving. In addition, iteratively adopting this self - play process can continuously promote the reasoning ability of LLMs, thus providing new possibilities for developing more advanced LLM capabilities.