Speaker Verification in Agent-Generated Conversations

Yizhe Yang,Palakorn Achananuparp,Heyan Huang,Jing Jiang,Ee-Peng Lim
2024-06-06
Abstract:The recent success of large language models (LLMs) has attracted widespread interest to develop role-playing conversational agents personalized to the characteristics and styles of different speakers to enhance their abilities to perform both general and special purpose dialogue tasks. However, the ability to personalize the generated utterances to speakers, whether conducted by human or LLM, has not been well studied. To bridge this gap, our study introduces a novel evaluation challenge: speaker verification in agent-generated conversations, which aimed to verify whether two sets of utterances originate from the same speaker. To this end, we assemble a large dataset collection encompassing thousands of speakers and their utterances. We also develop and evaluate speaker verification models under experiment setups. We further utilize the speaker verification models to evaluate the personalization abilities of LLM-based role-playing models. Comprehensive experiments suggest that the current role-playing models fail in accurately mimicking speakers, primarily due to their inherent linguistic characteristics.
Computation and Language
What problem does this paper attempt to address?
The main aim of this paper is to address the following issues: 1. **Defining and Evaluating the Personalization Ability of Role-Playing Dialogue Agents**: With the development of large language models (LLMs), role-playing dialogue agents can simulate speakers with different personal attributes and language styles to enhance their ability to perform general and specific dialogue tasks. However, how to evaluate whether the dialogues generated by these agents are truly personalized to the characteristics of the target speakers has not been fully studied. 2. **Introducing the Speaker Verification Task**: To fill this research gap, the authors introduce a new evaluation challenge—speaker verification in agent-generated dialogues. The goal of this task is to verify whether two sets of dialogues come from the same speaker, thereby measuring whether the dialogue agent can accurately mimic the language style and personal characteristics of the target speaker. 3. **Developing a Speaker Verification Dataset and Model**: The authors constructed a large-scale dataset containing thousands of speakers and their dialogues and developed speaker verification models for experimental evaluation. Through these models, the personalization ability of LLM-based role-playing models was further assessed. 4. **Revealing the Limitations of Existing Role-Playing Models**: Comprehensive experimental results indicate that current role-playing models have difficulty accurately mimicking speakers, especially in retaining the unique language style and personal traits of the speakers. 5. **Proposing Evaluation Metrics and Framework**: To better evaluate the performance of role-playing models, the paper proposes two metrics—Simulation Score and Distinction Score—to assess the consistency of agent-generated dialogues with the target speaker's actual dialogues and the distinction between different speakers. In summary, this study aims to evaluate and improve the personalization level of role-playing dialogue agents by introducing the speaker verification task and developing corresponding models.