RAM2C: A Liberal Arts Educational Chatbot based on Retrieval-augmented Multi-role Multi-expert Collaboration

Haoyu Huang,Tong Niu,Rui Yang,Luping Shi
2024-09-24
Abstract:Recently, many studies focus on utilizing large language models (LLMs) into educational dialogues. Especially, within liberal arts dialogues, educators must balance \textbf{H}umanized communication, \textbf{T}eaching expertise, and \textbf{S}afety-ethics (\textbf{HTS}), besides the subject knowledge itself. However, due to collecting massive amounts of HTS-compliant teaching dialogues from real world as training corpus is expensive, the outputs of existing LLMs in teaching dialogues fall short of human standards. To address this, we design a Retrieval-augmented Multi-role Multi-expert Collaboration (RAM2C) framework to automatically generate such dialogues data. Specifically, we first establish HTS-guided knowledge bases, encompassing three domain knowledge in teaching skills, psychology, and safety ethics. Then, RAM2C organizes LLMs, which are retrieval-augmented by the above different knowledge bases, into multi-experts groups with distinct roles to generate the HTS-compliant educational dialogues dataset. We then fine-tuned the LLMs using this dataset. Empirical evaluations indicate that RM2C-empowered LLMs excel in Chinese reading teaching, offering more personalized, and ethically safe teaching response, demonstrating RAM2C's practicality and high quality. We release the experiments at \hyperlink{<a class="link-external link-https" href="https://github.com/ram2c/ram2c" rel="external noopener nofollow">this https URL</a>}{<a class="link-external link-https" href="https://github.com/ram2c/ram2c" rel="external noopener nofollow">this https URL</a>}.
Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The paper attempts to address the challenges faced in providing high-quality instructional dialogues using large language models (LLMs) in the context of humanities education. Specifically, these challenges include: 1. **Humanized Communication**: Ensuring that the dialogue has human-like characteristics and can engage in personalized interactions. 2. **Teaching Expertise**: Enhancing the professional level of dialogue in terms of teaching techniques. 3. **Safety and Ethics**: Ensuring that the dialogue content adheres to ethical standards and avoids inappropriate content. Due to the difficulty and high cost of collecting a large amount of instructional dialogue data that meets the above standards from the real world, existing LLMs cannot achieve the standards of human teachers in actual instructional dialogues. To address these issues, the paper proposes a framework called "Retrieval-Augmented Multi-role Multi-expert Collaboration (RAM2C)," which automatically generates instructional dialogue data that meets HTS standards through the collaboration of multi-role experts and further optimizes the performance of LLMs. Experimental results show that LLMs optimized by RAM2C exhibit higher personalization, ethical safety, and teaching quality in Chinese reading instruction.