Abstract:Conversational question answering systems often rely on semantic parsing to enable interactive information retrieval, which involves the generation of structured database queries from a natural language input. For information-seeking conversations about facts stored within a knowledge graph, dialogue utterances are transformed into graph queries in a process that is called knowledge-based conversational question answering. This paper evaluates the performance of large language models that have not been explicitly pre-trained on this task. Through a series of experiments on an extensive benchmark dataset, we compare models of varying sizes with different prompting techniques and identify common issue types in the generated output. Our results demonstrate that large language models are capable of generating graph queries from dialogues, with significant improvements achievable through few-shot prompting and fine-tuning techniques, especially for smaller models that exhibit lower zero-shot performance.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to evaluate the ability of large - language models (LLMs) to perform semantic parsing in conversational question - answering systems, especially their performance on knowledge graphs. Specifically, the research aims to explore the following aspects: 1. **Understanding conversations and generating SPARQL queries**: Evaluate whether LLMs can transform natural - language conversations into structured queries (such as SPARQL queries) for knowledge graphs, thereby achieving knowledge - graph - based conversational question - answering. 2. **Comparison of different models and prompting techniques**: Through a series of experiments, compare the performance differences of LLMs of different sizes under different prompting techniques (such as zero - shot, few - shot prompting), and identify common types of output problems. 3. **Optimizing model performance**: Explore how to improve the performance of LLMs in semantic - parsing tasks through fine - tuning and other strategies, especially for the improvement of smaller models. ### Research Background Conversational question - answering systems usually rely on semantic parsing to convert natural - language inputs into structured database queries for interactive information retrieval. For fact - querying based on knowledge graphs, conversational expressions need to be converted into graph queries, a process known as knowledge - based conversational question - answering. However, most of the existing work focuses on independent natural - language expressions and ignores broader context information. Therefore, this study pays special attention to a series of related expressions in conversations, fuzzy queries, and evolving search intentions. ### Main Contributions - **Benchmarking study**: Four different LLMs were evaluated, and eight common error types in generating graph queries were identified using automatic metrics and human evaluation. - **Detailed discussion**: The effects of prompting and fine - tuning strategies on model performance were explored, aiming to improve the model's performance in conversational question - answering. - **Reproducibility**: A GitHub repository was established, containing all model scripts, datasets, and evaluation outputs, ensuring full reproducibility of the experimental results. ### Experimental Setup - **Dataset**: The SPICE dataset was selected, which contains 197,000 conversations, each accompanied by an executable SPARQL query. - **Model selection**: Four LLMs of different scales were compared, including GPT - 3.5 - Turbo, LLaMA and its fine - tuned version LoRA, and Vicuna. - **Prompting methods**: Zero - shot and few - shot prompting were used to evaluate the model's performance under different conditions. ### Results and Discussion The experimental results show that LLMs exhibit significant differences in semantic - parsing ability in conversational question - answering. The fine - tuned LoRA model performs well in almost all tasks, especially when dealing with simple questions. However, for complex questions (such as logical reasoning and quantitative reasoning), the performance of all models declines. In addition, human evaluation further reveals eight common error types in the model - generated outputs, providing a basis for subsequent improvements. In conclusion, this study not only evaluates the semantic - parsing ability of LLMs in conversational question - answering but also provides valuable insights for optimizing these models.

Evaluating Large Language Models in Semantic Parsing for Conversational Question Answering over Knowledge Graphs

A Comparative Analysis of Conversational Large Language Models in Knowledge-Based Text Generation

Enhancing Large Language Models with Knowledge Graphs for Robust Question Answering

Multi-hop Question Answering over Knowledge Graphs using Large Language Models

How Proficient Are Large Language Models in Formal Languages? An In-Depth Insight for Knowledge Base Question Answering

Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts

Interactive-KBQA: Multi-Turn Interactions for Knowledge Base Question Answering with Large Language Models

Integrating Large Language Models with Graph-based Reasoning for Conversational Question Answering

Research on Intelligent Question-Answering Systems Based on Large Language Models and Knowledge Graphs

Knowledge Graph-augmented Language Models for Complex Question Answering

Bridging Information Gaps in Dialogues With Grounded Exchanges Using Knowledge Graphs

Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs

Multi-Task Learning for Conversational Question Answering over a Large-Scale Knowledge Base

Enhanced Story Comprehension for Large Language Models through Dynamic Document-Based Knowledge Graphs

Domain Specific Question Answering Over Knowledge Graphs Using Logical Programming and Large Language Models

Evaluating and Enhancing Large Language Models for Conversational Reasoning on Knowledge Graphs

Towards Evaluating Large Language Models for Graph Query Generation

Enhancing Large Language Models with Pseudo- and Multisource- Knowledge Graphs for Open-ended Question Answering

Large Language Models Meet Knowledge Graphs to Answer Factoid Questions