Leveraging Large Language Models for Concept Graph Recovery and Question Answering in NLP Education

Rui Yang,Boming Yang,Sixun Ouyang,Tianwei She,Aosong Feng,Yuang Jiang,Freddy Lecue,Jinghui Lu,Irene Li
2024-02-22
Abstract:In the domain of Natural Language Processing (NLP), Large Language Models (LLMs) have demonstrated promise in text-generation tasks. However, their educational applications, particularly for domain-specific queries, remain underexplored. This study investigates LLMs' capabilities in educational scenarios, focusing on concept graph recovery and question-answering (QA). We assess LLMs' zero-shot performance in creating domain-specific concept graphs and introduce TutorQA, a new expert-verified NLP-focused benchmark for scientific graph reasoning and QA. TutorQA consists of five tasks with 500 QA pairs. To tackle TutorQA queries, we present CGLLM, a pipeline integrating concept graphs with LLMs for answering diverse questions. Our results indicate that LLMs' zero-shot concept graph recovery is competitive with supervised methods, showing an average 3% F1 score improvement. In TutorQA tasks, LLMs achieve up to 26% F1 score enhancement. Moreover, human evaluation and analysis show that CGLLM generates answers with more fine-grained concepts.
Computation and Language
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily explores how to utilize large language models (LLMs) to restore concept maps in natural language processing (NLP) educational scenarios and apply them to question-answering tasks. Specifically: 1. **Concept Map Restoration**: The study investigates the ability of LLMs to create domain-specific concept maps in a zero-shot setting. The paper evaluates the performance of LLMs in this task through different prompting strategies (such as zero-shot, Chain-of-Thought (CoT), and Retriever Augmented Generation (RAG)). 2. **TutorQA Benchmark**: A new benchmark dataset, TutorQA, is introduced to evaluate the performance of LLMs in scientific diagram reasoning and question-answering tasks. TutorQA includes 5 tasks, each with 100 expert-verified question-answer pairs. 3. **CGLLM Pipeline**: A pipeline named CGLLM is proposed, which combines concept maps with LLMs to enhance the performance of question-answering tasks. Experimental results show that CGLLM significantly improves performance across multiple tasks, particularly increasing the average F1 score by 3% in concept map restoration tasks and up to 26% in TutorQA tasks. In summary, the paper aims to explore the potential applications of LLMs in educational scenarios, particularly how concept map restoration can support more complex question-answering tasks.