Dohee Kim,Unggi Lee,Sookbun Lee,Jiyeong Bae,Taekyung Ahn,Jaekwon Park,Gunho Lee,Hyeoncheol Kim
Abstract:This paper introduces ES-KT-24, a novel multimodal Knowledge Tracing (KT) dataset for intelligent tutoring systems in educational game contexts. Although KT is crucial in adaptive learning, existing datasets often lack game-based and multimodal elements. ES-KT-24 addresses these limitations by incorporating educational game-playing videos, synthetically generated question text, and detailed game logs. The dataset covers Mathematics, English, Indonesian, and Malaysian subjects, emphasizing diversity and including non-English content. The synthetic text component, generated using a large language model, encompasses 28 distinct knowledge concepts and 182 questions, featuring 15,032 users and 7,782,928 interactions. Our benchmark experiments demonstrate the dataset's utility for KT research by comparing Deep learning-based KT models with Language Model-based Knowledge Tracing (LKT) approaches. Notably, LKT models showed slightly higher performance than traditional DKT models, highlighting the potential of language model-based approaches in this field. Furthermore, ES-KT-24 has the potential to significantly advance research in multimodal KT models and learning analytics. By integrating game-playing videos and detailed game logs, this dataset offers a unique approach to dissecting student learning patterns through advanced data analysis and machine-learning techniques. It has the potential to unearth new insights into the learning process and inspire further exploration in the field.
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the deficiencies of existing Knowledge Tracing (KT) datasets in gamified learning environments. Specifically, the existing KT datasets mainly contain numerical sequences and lack content based on games and multimodal elements. These problems limit researchers' ability to develop more comprehensive knowledge - tracing models, especially in modern digital educational materials, particularly in gamified learning environments, where complex interactions and outcomes are far more than simple right or wrong answers, and also include multiple indicators such as task completion time and number of attempts.
To address these challenges, the paper introduces ES - KT - 24, a new multimodal knowledge - tracing benchmark dataset. The ES - KT - 24 dataset has the following features:
1. **Gamified learning videos**: The dataset contains videos of educational games being played. These videos record the players' gaming processes and provide rich visual and interactive information.
2. **Synthetic text generation**: A large - language model was used to generate synthetic question texts related to the game content, covering 28 different knowledge concepts and 182 questions.
3. **Detailed game logs**: The dataset also contains detailed game logs that record every action, event, correct/incorrect response, and time - duration data of the players.
4. **Multilingual support**: The dataset covers four subjects, namely mathematics, English, Indonesian, and Malay, emphasizing diversity and the inclusion of non - English content.
Through this multimodal data, the ES - KT - 24 dataset aims to provide a brand - new standard for knowledge - tracing research, support more complex learning analysis and the application of machine - learning techniques, and thus gain a deeper understanding of students' learning patterns and processes. In addition, by comparing the deep - learning - based KT model and the language - model - based KT (LKT) method, the paper also shows the performance advantages of the LKT model, further proving the potential of language models in this field.