Abstract:The field of Knowledge Tracing aims to understand how students learn and master knowledge over time by analyzing their historical behaviour data. To achieve this goal, many researchers have proposed Knowledge Tracing models that use data from Intelligent Tutoring Systems to predict students' subsequent actions. However, with the development of Intelligent Tutoring Systems, large-scale datasets containing long-sequence data began to emerge. Recent deep learning based Knowledge Tracing models face obstacles such as low efficiency, low accuracy, and low interpretability when dealing with large-scale datasets containing long-sequence data. To address these issues and promote the sustainable development of Intelligent Tutoring Systems, we propose a LSTM BERT-based Knowledge Tracing model for long sequence data processing, namely LBKT, which uses a BERT-based architecture with a Rasch model-based embeddings block to deal with different difficulty levels information and an LSTM block to process the sequential characteristic in students' actions. LBKT achieves the best performance on most benchmark datasets on the metrics of ACC and AUC. Additionally, an ablation study is conducted to analyse the impact of each component of LBKT's overall performance. Moreover, we used t-SNE as the visualisation tool to demonstrate the model's embedding strategy. The results indicate that LBKT is faster, more interpretable, and has a lower memory cost than the traditional deep learning based Knowledge Tracing methods.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the inefficiency, low accuracy, and low interpretability issues faced by existing deep - learning - based Knowledge Tracing (KT) models when dealing with long - sequence data in large - scale datasets. With the development of Intelligent Tutoring Systems (ITS), large - scale datasets containing long - sequence data have emerged, and existing KT models perform poorly when processing these data. Specifically: 1. **Inefficiency**: Traditional KT models are slow when processing long - sequence data. 2. **Low accuracy**: Existing models have low prediction accuracy when processing long - sequence data. 3. **Low interpretability**: Traditional deep - learning methods are difficult to interpret the decision - making process of the model due to their black - box nature. To address these problems, the author proposes a new knowledge - tracing model LBKT (LSTM BERT - based Knowledge Tracing model) that combines LSTM and BERT, specifically designed to handle long - sequence data. LBKT improves the shortcomings of existing models in the following ways: - **Combining the advantages of BERT and LSTM**: BERT can capture the relationships in complex data, and LSTM is good at handling long - sequence data. The combination of the two can improve the performance of the model on large - scale datasets. - **Introducing Rasch model embedding**: By using Rasch model embedding to handle different difficulty information in student behavior data, the accuracy and interpretability of the model are improved. ### Model Architecture The LBKT model consists of three main parts: 1. **Rasch model embedding**: - Use the embedding vectors generated by the Rasch model to represent the difficulty of questions and concepts. - The formula is as follows: \[ E = E_{\text{Rasch}}+E_{\text{BERT Token}}+E_{\text{Position}} \] where, \[ E_{\text{Rasch}} = E_d+E_d\times E_q \] 2. **BERT - based architecture**: - It contains 12 Transformer blocks, and each block has multiple - head attention mechanisms, feed - forward networks (FFN), and sub - layer connections. - The formula for the attention mechanism is: \[ \text{Attention}(Q, K, V)=\text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V \] - The formula for the feed - forward network is: \[ \text{FFN}(x)=\text{GELU}(W_1x + b_1)W_2 + b_2 \] 3. **LSTM block**: - Use neural network linear transformation to replace the traditional attention projection to improve the performance of processing long - sequence data. ### Experimental Results The experimental results show that LBKT achieves the best performance on multiple benchmark datasets and outperforms other baseline models in both ACC and AUC metrics. Especially when processing long - sequence data, the speed of LBKT is 4.29 times faster than BEKT (EdNet dataset) and 4.77 times faster (Junyi Academy dataset), and it also has lower memory usage. ### Conclusion The LBKT model significantly improves the efficiency, accuracy, and interpretability of the knowledge - tracing task on large - scale long - sequence datasets by combining the advantages of BERT, Rasch model embedding, and LSTM.

Integrating LSTM and BERT for Long-Sequence Data Analysis in Intelligent Tutoring Systems

A Deeper Knowledge Tracking Model Integrating Cognitive Theory and Learning Behavior

BKT-LSTM: Efficient Student Modeling for knowledge tracing and student performance prediction

Integrating learning factors and Bayesian network for interpretable knowledge tracing

Language Model Can Do Knowledge Tracing: Simple but Effective Method to Integrate Language Model and Knowledge Tracing Task

Interpretable Knowledge Tracing: Simple and Efficient Student Modeling with Causal Relations

A Question-centric Multi-experts Contrastive Learning Framework for Improving the Accuracy and Interpretability of Deep Sequential Knowledge Tracing Models

Enhancing Deep Knowledge Tracing with Auxiliary Tasks

DKT-STDRL: Spatial and Temporal Representation Learning Enhanced Deep Knowledge Tracing for Learning Performance Prediction

A Temporal Convolutional Knowledge Tracing Model Integrating Forgetting Factors and Item Response Theory

Programming Knowledge Tracing: A Comprehensive Dataset and A New Model

Augmenting Interpretable Knowledge Tracing by Ability Attribute and Attention Mechanism

Learning Behavior-oriented Knowledge Tracing.

Towards Interpretable Deep Learning Models for Knowledge Tracing

Improving Interpretability of Deep Sequential Knowledge Tracing Models with Question-centric Cognitive Representations

Interpreting Deep Learning Models for Knowledge Tracing

Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMs

Incorporating Rich Features into Deep Knowledge Tracing

A probabilistic generative model for tracking multi-knowledge concept mastery probability

LANA: Towards Personalized Deep Knowledge Tracing Through Distinguishable Interactive Sequences

A study of progressive data flow knowledge tracing based on reconstructed attention mechanism