EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education

Yuhao Dan,Zhikai Lei,Yiyang Gu,Yong Li,Jianghao Yin,Jiaju Lin,Linhao Ye,Zhiyan Tie,Yougen Zhou,Yilei Wang,Aimin Zhou,Ze Zhou,Qin Chen,Jie Zhou,Liang He,Xipeng Qiu
2023-08-05
Abstract:EduChat (https://www.educhat.top/) is a large-scale language model (LLM)-based chatbot system in the education domain. Its goal is to support personalized, fair, and compassionate intelligent education, serving teachers, students, and parents. Guided by theories from psychology and education, it further strengthens educational functions such as open question answering, essay assessment, Socratic teaching, and emotional support based on the existing basic LLMs. Particularly, we learn domain-specific knowledge by pre-training on the educational corpus and stimulate various skills with tool use by fine-tuning on designed system prompts and instructions. Currently, EduChat is available online as an open-source project, with its code, data, and model parameters available on platforms (e.g., GitHub <a class="link-external link-https" href="https://github.com/icalk-nlp/EduChat" rel="external noopener nofollow">this https URL</a>, Hugging Face <a class="link-external link-https" href="https://huggingface.co/ecnu-icalk" rel="external noopener nofollow">this https URL</a> ). We also prepare a demonstration of its capabilities online (<a class="link-external link-https" href="https://vimeo.com/851004454" rel="external noopener nofollow">this https URL</a>). This initiative aims to promote research and applications of LLMs for intelligent education.
Computation and Language
What problem does this paper attempt to address?
The paper aims to address the challenges of applying large language models (LLMs) in the field of education. Specifically, the paper proposes the EduChat system, a chatbot system based on large language models designed for intelligent education. The goal of EduChat is to support teachers, students, and parents in a personalized, fair, and empathetic manner. The main issues addressed include: 1. **Lack of domain knowledge**: General large language models lack sufficient educational expertise and cannot adapt well to practical application scenarios (such as essay evaluation). 2. **Outdated knowledge**: Knowledge in the field of education is constantly being updated, but existing large language models cannot learn the latest knowledge due to their training mechanisms. 3. **Hallucination problem**: Large language models may generate inaccurate information. To address these issues, EduChat employs the following methods: - Pre-training on a large corpus of education-related materials, including psychology books, ancient poetry, etc., to acquire foundational educational knowledge. - Fine-tuning with carefully designed task-specific instructions to activate education-related functionalities (such as essay evaluation, Socratic teaching, and emotional support). - Introducing retrieval-augmented techniques, enabling the model to automatically determine whether the retrieved information is helpful for answering questions and generating answers based on relevant knowledge, ensuring accuracy and timeliness of information. Through these methods, EduChat can significantly improve its performance in the field of education while maintaining a comparable level of basic capabilities to other large-scale models.