Integrating LSTM and BERT for Long-Sequence Data Analysis in Intelligent Tutoring Systems

Zhaoxing Li,Jujie Yang,Jindi Wang,Lei Shi,Sebastian Stein
2024-04-25
Abstract:The field of Knowledge Tracing aims to understand how students learn and master knowledge over time by analyzing their historical behaviour data. To achieve this goal, many researchers have proposed Knowledge Tracing models that use data from Intelligent Tutoring Systems to predict students' subsequent actions. However, with the development of Intelligent Tutoring Systems, large-scale datasets containing long-sequence data began to emerge. Recent deep learning based Knowledge Tracing models face obstacles such as low efficiency, low accuracy, and low interpretability when dealing with large-scale datasets containing long-sequence data. To address these issues and promote the sustainable development of Intelligent Tutoring Systems, we propose a LSTM BERT-based Knowledge Tracing model for long sequence data processing, namely LBKT, which uses a BERT-based architecture with a Rasch model-based embeddings block to deal with different difficulty levels information and an LSTM block to process the sequential characteristic in students' actions. LBKT achieves the best performance on most benchmark datasets on the metrics of ACC and AUC. Additionally, an ablation study is conducted to analyse the impact of each component of LBKT's overall performance. Moreover, we used t-SNE as the visualisation tool to demonstrate the model's embedding strategy. The results indicate that LBKT is faster, more interpretable, and has a lower memory cost than the traditional deep learning based Knowledge Tracing methods.
Computers and Society,Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the inefficiency, low accuracy, and low interpretability issues faced by existing deep - learning - based Knowledge Tracing (KT) models when dealing with long - sequence data in large - scale datasets. With the development of Intelligent Tutoring Systems (ITS), large - scale datasets containing long - sequence data have emerged, and existing KT models perform poorly when processing these data. Specifically: 1. **Inefficiency**: Traditional KT models are slow when processing long - sequence data. 2. **Low accuracy**: Existing models have low prediction accuracy when processing long - sequence data. 3. **Low interpretability**: Traditional deep - learning methods are difficult to interpret the decision - making process of the model due to their black - box nature. To address these problems, the author proposes a new knowledge - tracing model LBKT (LSTM BERT - based Knowledge Tracing model) that combines LSTM and BERT, specifically designed to handle long - sequence data. LBKT improves the shortcomings of existing models in the following ways: - **Combining the advantages of BERT and LSTM**: BERT can capture the relationships in complex data, and LSTM is good at handling long - sequence data. The combination of the two can improve the performance of the model on large - scale datasets. - **Introducing Rasch model embedding**: By using Rasch model embedding to handle different difficulty information in student behavior data, the accuracy and interpretability of the model are improved. ### Model Architecture The LBKT model consists of three main parts: 1. **Rasch model embedding**: - Use the embedding vectors generated by the Rasch model to represent the difficulty of questions and concepts. - The formula is as follows: \[ E = E_{\text{Rasch}}+E_{\text{BERT Token}}+E_{\text{Position}} \] where, \[ E_{\text{Rasch}} = E_d+E_d\times E_q \] 2. **BERT - based architecture**: - It contains 12 Transformer blocks, and each block has multiple - head attention mechanisms, feed - forward networks (FFN), and sub - layer connections. - The formula for the attention mechanism is: \[ \text{Attention}(Q, K, V)=\text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V \] - The formula for the feed - forward network is: \[ \text{FFN}(x)=\text{GELU}(W_1x + b_1)W_2 + b_2 \] 3. **LSTM block**: - Use neural network linear transformation to replace the traditional attention projection to improve the performance of processing long - sequence data. ### Experimental Results The experimental results show that LBKT achieves the best performance on multiple benchmark datasets and outperforms other baseline models in both ACC and AUC metrics. Especially when processing long - sequence data, the speed of LBKT is 4.29 times faster than BEKT (EdNet dataset) and 4.77 times faster (Junyi Academy dataset), and it also has lower memory usage. ### Conclusion The LBKT model significantly improves the efficiency, accuracy, and interpretability of the knowledge - tracing task on large - scale long - sequence datasets by combining the advantages of BERT, Rasch model embedding, and LSTM.