Efficient Learning Content Retrieval with Knowledge Injection

Batuhan Sariturk,Rabia Bayraktar,Merve Elmas Erdem
2024-11-28
Abstract:With the rise of online education platforms, there is a growing abundance of educational content across various domain. It can be difficult to navigate the numerous available resources to find the most suitable training, especially in domains that include many interconnected areas, such as ICT. In this study, we propose a domain-specific chatbot application that requires limited resources, utilizing versions of the Phi language model to help learners with educational content. In the proposed method, Phi-2 and Phi-3 models were fine-tuned using QLoRA. The data required for fine-tuning was obtained from the Huawei Talent Platform, where courses are available at different levels of expertise in the field of computer science. RAG system was used to support the model, which was fine-tuned by 500 Q&A pairs. Additionally, a total of 420 Q&A pairs of content were extracted from different formats such as JSON, PPT, and DOC to create a vector database to be used in the RAG system. By using the fine-tuned model and RAG approach together, chatbots with different competencies were obtained. The questions and answers asked to the generated chatbots were saved separately and evaluated using ROUGE, BERTScore, METEOR, and BLEU metrics. The precision value of the Phi-2 model supported by RAG was 0.84 and the F1 score was 0.82. In addition to a total of 13 different evaluation metrics in 4 different categories, the answers of each model were compared with the created content and the most appropriate method was selected for real-life applications.
Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to help learners efficiently find the training resources that best meet their needs in the context of the increasingly rich content on online education platforms, especially in fields like ICT which contain many inter - related areas. Specifically, the paper proposes a domain - specific chatbot application aimed at using improved language models (such as Phi - 2 and Phi - 3) to help learners retrieve educational content. ### Main problems: 1. **Information overload**: There are a large number of educational resources on online education platforms, and it is difficult for learners to screen out the most suitable training content from them. 2. **Domain complexity**: Especially in fields like ICT, which involve multiple inter - related sub - fields, making navigation and selection more difficult. 3. **Real - time and accuracy**: Traditional language models may generate inaccurate or fact - inconsistent content (i.e., "hallucinations") when generating answers, especially when dealing with domain - specific questions. ### Solutions: To address the above challenges, the paper proposes the following solutions: - **Combining RAG (Retrieval - Augmented Generation) and LLM (Large Language Model)**: By using the RAG system, combine external data sources with language models to improve the accuracy and relevance of generated answers. - **Parameter - efficient fine - tuning**: Use the QLoRA (Quantized Low - Rank Adaptation) method to fine - tune the Phi - 2 and Phi - 3 models to reduce computational resource consumption and improve model performance. - **Multi - source data extraction**: Obtain course content from platforms such as the Huawei Talent Platform and convert it into Q&A pairs for training and evaluating the model. - **Evaluation metrics**: Evaluate the performance of different methods through metrics such as BLEU, ROUGE, METEOR, and BERTScore to ensure that the generated answers are both accurate and comprehensive. ### Specific steps: 1. **Dataset generation**: Extract 500 Q&A pairs from the course content of the Huawei Talent Platform for fine - tuning the model. 2. **Parameter - efficient fine - tuning**: Use the QLoRA method to fine - tune the Phi - 2 and Phi - 3 models to adapt to domain - specific tasks. 3. **Application of the RAG system**: Utilize the RAG system to enhance the model's generation ability by retrieving external data sources. 4. **Performance evaluation**: Compare the performance of different methods through multiple evaluation metrics and select the optimal solution for application in practical scenarios. ### Summary: The main objective of this paper is to develop a chatbot that can efficiently retrieve and provide accurate educational content by combining RAG and fine - tuning techniques, thereby helping learners better select training resources suitable for them.