Retrieval-Augmented Generation for Domain-Specific Question Answering: A Case Study on Pittsburgh and CMU

Haojia Sun,Yaqi Wang,Shuting Zhang
2024-11-21
Abstract:We designed a Retrieval-Augmented Generation (RAG) system to provide large language models with relevant documents for answering domain-specific questions about Pittsburgh and Carnegie Mellon University (CMU). We extracted over 1,800 subpages using a greedy scraping strategy and employed a hybrid annotation process, combining manual and Mistral-generated question-answer pairs, achieving an inter-annotator agreement (IAA) score of 0.7625. Our RAG framework integrates BM25 and FAISS retrievers, enhanced with a reranker for improved document retrieval accuracy. Experimental results show that the RAG system significantly outperforms a non-RAG baseline, particularly in time-sensitive and complex queries, with an F1 score improvement from 5.45% to 42.21% and recall of 56.18%. This study demonstrates the potential of RAG systems in enhancing answer precision and relevance, while identifying areas for further optimization in document retrieval and model training.
Machine Learning,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve the accuracy and relevance of large - language - model responses in specific domains. Specifically, the researchers designed a Retrieval - Augmented Generation (RAG) system to provide relevant documents to assist large - language models in answering specific - domain questions about Pittsburgh and Carnegie Mellon University (CMU). These questions cover multiple aspects such as history, events, culture, and sports. ### Main Objectives: 1. **Improve Answer Accuracy**: By integrating retrieval techniques, enable the model to access the latest or context - relevant documents, thereby generating more accurate answers. 2. **Handle Time - Sensitive and Complex Queries**: Pay special attention to complex questions that require the latest information or multi - step reasoning, and ensure that the model can provide timely and precise answers. ### Solutions: 1. **Data Extraction**: A large number of sub - pages were extracted from public websites (such as Wikipedia, city government websites, event calendars, etc.), and more than 1,800 sub - pages were collected using a greedy crawling strategy. 2. **Data Labeling**: Combine manual labeling and automatically generated question - answer pairs to ensure the diversity and representativeness of the dataset. Use the Mistral model for few - shot learning and fine - tuning to generate 1,302 automatically labeled question - answer pairs. 3. **RAG Framework Design**: Integrate BM25 and FAISS retrievers and add a re - ranking module to improve the accuracy of document retrieval. Use the Mistral 7B model as the backbone model and generate answers through 2 - shot learning. 4. **Experiment and Evaluation**: Through a series of experiments, compare the performance of the RAG system under different configurations, including whether to use re - ranking, few - shot learning, and combined retrievers. The results show that the RAG system in the best configuration performs significantly better than the baseline model on time - sensitive and complex queries. ### Experimental Results: - **Baseline Model (without RAG)**: EM is 0.00% and the F1 score is 5.45%. - **RAG System (Best Configuration)**: EM is 20.25%, the F1 score is 42.21%, the precision is 47.29%, and the recall rate is 56.18%. ### Conclusion: The RAG system performs well in specific - domain question - answering tasks, especially when handling time - sensitive and complex queries. The research also points out directions for future improvement, including improving the accuracy of document retrieval and the generalization ability of the dataset. This provides new ideas and methods for the application of large - language models in specific domains.