Abstract:The new wave of Large Language Models (LLM) has offered an efficient tool to curate sizeable conversational datasets. So far studies have mainly focused on task-oriented or generic open-domain dialogs, and have not fully explored the ability of LLMs in following complicated prompts. In this work, we focus on personalization, and employ LLMs to curate a dataset which is difficult and costly to crowd-source: PersonalityChat is a synthetic conversational dataset based upon the popular PersonaChat dataset, but conditioned on both personas and (Big-5) personality traits. Evaluating models fine-tuned on this dataset, we show that the personality trait labels can be used for trait-based personalization of generative dialogue models. We also perform a head-to-head comparison between PersonalityChat and PersonaChat, and show that training on the distilled dataset results in more fluent and coherent dialog agents in the small-model regime.

What problem does this paper attempt to address?

The main problems that this paper attempts to solve are: 1. **Personalized Dialogue Generation**: Currently, most research mainly focuses on task - oriented or general open - domain dialogues, without fully exploiting the potential of large - language models (LLMs) in handling complex prompts. This paper focuses on personalized dialogue generation, especially personalized dialogue based on personality traits. The author uses a large - language model (such as ChatGPT) to generate a new dialogue dataset **PersonalityChat**, which is based not only on personal facts (such as the role settings in PersonaChat), but also combines the Big - 5 personality traits. 2. **Dataset Generation and Comparison**: The author generates the **PersonalityChat** dataset through a two - step method: - **Step 1**: Use ChatGPT to predict the personality trait labels of each role in PersonaChat. - **Step 2**: According to the predicted personality trait labels and role settings, use ChatGPT again to generate dialogues. 3. **Model Training and Evaluation**: The author verifies the following two research questions through experiments: - **RQ1**: Can personality traits be used to control the dialogue behavior of the model? - **RQ2**: In a small - parameter model, how does the performance of the model trained on **PersonalityChat** compare with that of the model trained on **PersonaChat**? ### Main Contributions 1. **Release of the **PersonalityChat** Dataset**: A personalized dialogue dataset based on role settings and personality traits. 2. **Impact of Personality Trait Labels**: Demonstrates that personality trait labels can be used to adjust the attitude of generative dialogue models. 3. **Performance Improvement of Small - Parameter Models**: In small - parameter models, the model trained on **PersonalityChat** performs better, especially in terms of fluency and coherence. 4. **Release of the **PersonaTraits** Dataset**: Contains personality trait inferences of various roles generated by ChatGPT. ### Experimental Results - **RQ1**: Through automatic evaluation and manual evaluation, the author finds that the model can generate different dialogue styles according to different personality traits. For example, dialogues with high openness, extraversion, agreeableness, and low neuroticism are more expressive, positive, and engaging. - **RQ2**: In small - parameter models (T5 - small) and large - parameter models (T5 - base), the model trained on **PersonalityChat** outperforms the model trained on **PersonaChat** on multiple metrics, especially in terms of naturalness, coherence, and overall quality. ### Conclusion This paper demonstrates how to use large - language models to generate high - quality personalized dialogue data by generating the **PersonalityChat** dataset, and verifies the effectiveness of these data in training generative dialogue models. This provides new directions and tools for future research on personalized dialogue systems.

PersonalityChat: Conversation Distillation for Personalized Dialog Modeling with Facts and Traits

Beyond Discrete Personas: Personality Modeling Through Journal Intensive Conversations

BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data

Personality Traits in Large Language Models

Faithful Persona-based Conversational Dataset Generation with Large Language Models

Personalized Dialogue Generation with Diversified Traits

PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits

Editing Personality for LLMs

Tailoring Personality Traits in Large Language Models via Unsupervisedly-Built Personalized Lexicons

Is persona enough for personality? Using ChatGPT to reconstruct an agent's latent personality from simple descriptions

Large Language Models Can Infer Personality from Free-Form User Interactions

Humanity in AI: Detecting the Personality of Large Language Models

P-Tailor: Customizing Personality Traits for Language Models via Mixture of Specialized LoRA Experts

PersLLM: A Personified Training Approach for Large Language Models

Crafting Customisable Characters with LLMs: Introducing SimsChat, a Persona-Driven Role-Playing Agent Framework

Eliciting Big Five Personality Traits in Large Language Models: A Textual Analysis with Classifier-Driven Approach

Enhancing Personalized Dialogue Generation with Contrastive Latent Variables: Combining Sparse and Dense Persona

The Language Model Can Have the Personality: Joint Learning for Personality Enhanced Language Model (Student Abstract)

ChatAnything: Facetime Chat with LLM-Enhanced Personas.

Persona-Identified Chatbot through Small-Scale Modeling and Data Transformation

Orca: Enhancing Role-Playing Abilities of Large Language Models by Integrating Personality Traits