Abstract:The rapid advancement of Large Language Models (LLMs) and conversational assistants necessitates dynamic, scalable, and configurable conversational datasets for training and evaluation. These datasets must accommodate diverse user interaction modes, including text and voice, each presenting unique modeling challenges. Knowledge Graphs (KGs), with their structured and evolving nature, offer an ideal foundation for current and precise knowledge. Although human-curated KG-based conversational datasets exist, they struggle to keep pace with the rapidly changing user information needs. We present ConvKGYarn, a scalable method for generating up-to-date and configurable conversational KGQA datasets. Qualitative psychometric analyses confirm our method can generate high-quality datasets rivaling a popular conversational KGQA dataset while offering it at scale and covering a wide range of human-interaction configurations. We showcase its utility by testing LLMs on diverse conversations - exploring model behavior on conversational KGQA sets with different configurations grounded in the same KG fact set. Our results highlight the ability of ConvKGYarn to improve KGQA foundations and evaluate parametric knowledge of LLMs, thus offering a robust solution to the constantly evolving landscape of conversational assistants.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that existing Conversational Knowledge Graph Question Answering (KGQA) datasets are difficult to maintain relevance and timeliness when dealing with rapidly changing user information needs. Specifically: 1. **Dynamics and Configurability**: Although existing conversational KGQA datasets are rich in content, they are usually difficult to keep up with the rapid changes in user information needs. This results in these datasets may no longer be relevant or effective in real - world adaptive conversation scenarios. 2. **Diversity and Coverage**: Current datasets often lack sufficient diversity and coverage and cannot comprehensively reflect various patterns of user - conversation system interactions, including text and voice interactions, and each mode has its own unique modeling challenges. 3. **High - Quality Generation**: Although there are human - annotated conversational KGQA datasets, their quality and scale are limited, and the generation process is time - consuming and labor - intensive, making it difficult to scale up on a large scale. To solve these problems, the paper proposes ConvKGYarn, a method for generating large - scale, configurable conversational KGQA datasets. Through this method, high - quality conversation data can be generated, which can not only be comparable to existing human - annotated datasets, but also significantly expand the coverage of entities and facts, and introduce configurable attributes of user interaction styles. The paper verifies the effectiveness of ConvKGYarn through multiple evaluation methods, including single - model scoring, pairwise comparison, and parameterized knowledge evaluation, demonstrating its advantages in generating diverse conversation data.

ConvKGYarn: Spinning Configurable and Scalable Conversational Knowledge Graph QA datasets with Large Language Models

Evaluating and Enhancing Large Language Models for Conversational Reasoning on Knowledge Graphs

KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models

ChatGPT is not Enough: Enhancing Large Language Models with Knowledge Graphs for Fact-aware Language Modeling

AGENTiGraph: An Interactive Knowledge Graph Platform for LLM-based Chatbots Utilizing Private Data

Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study over Open-ended Question Answering

Faithful Persona-based Conversational Dataset Generation with Large Language Models

Conversational Question Answering with Reformulations over Knowledge Graph

DyKgChat: Benchmarking Dialogue Generation Grounding on Dynamic Knowledge Graphs

Enhancing Text-based Knowledge Graph Completion with Zero-Shot Large Language Models: A Focus on Semantic Enhancement

Supervised Knowledge Makes Large Language Models Better In-context Learners

Enhancing Large Language Models with Knowledge Graphs for Robust Question Answering

Combining Knowledge Graphs and Large Language Models

GKT: A Novel Guidance-Based Knowledge Transfer Framework For Efficient Cloud-edge Collaboration LLM Deployment

KGConv, a Conversational Corpus grounded in Wikidata

Knowledge Graph for NLG in the context of conversational agents

LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities

Knowledge-Grounded Conversational Data Augmentation with Generative Conversational Networks

CogMG: Collaborative Augmentation Between Large Language Model and Knowledge Graph

InCA: Rethinking In-Car Conversational System Assessment Leveraging Large Language Models

KGValidator: A Framework for Automatic Validation of Knowledge Graph Construction