Enhancing Role-playing Systems through Aggressive Queries: Evaluation and Improvement

Yihong Tang,Jiao Ou,Che Liu,Fuzheng Zhang,Di Zhang,Kun Gai
2024-02-16
Abstract:The advent of Large Language Models (LLMs) has propelled dialogue generation into new realms, particularly in the field of role-playing systems (RPSs). While enhanced with ordinary role-relevant training dialogues, existing LLM-based RPSs still struggle to align with roles when handling intricate and trapped queries in boundary scenarios. In this paper, we design the Modular ORchestrated Trap-setting Interaction SystEm (MORTISE) to benchmark and improve the role-playing LLMs' performance. MORTISE can produce highly role-relevant aggressive queries through the collaborative effort of multiple LLM-based modules, and formulate corresponding responses to create an adversarial training dataset via a consistent response generator. We select 190 Chinese and English roles to construct aggressive queries to benchmark existing role-playing LLMs. Through comprehensive evaluation, we find that existing models exhibit a general deficiency in role alignment capabilities. We further select 180 of the roles to collect an adversarial training dataset (named RoleAD) and retain the other 10 roles for testing. Experiments on models improved by RoleAD indicate that our adversarial dataset ameliorates this deficiency, with the improvements demonstrating a degree of generalizability in ordinary scenarios.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in role - playing systems (RPSs), existing large - language models (LLMs) have difficulty maintaining role consistency when dealing with complex and challenging queries, especially in boundary scenarios. Although these models are enhanced through role - related training dialogues in common scenarios, they often fail to be well - aligned with the role when faced with complex and tricky queries. To evaluate and improve this problem, the author designed a multi - module system named MORTISE to generate highly role - related offensive queries and constructed an adversarial training dataset (RoleAD) to enhance the model's role - alignment ability in boundary scenarios. Through comprehensive evaluation, the research found that existing models generally have deficiencies in role consistency, and training with RoleAD can significantly improve this defect, and this improvement also shows a certain generalization ability in common scenarios.