E2Vec: Feature Embedding with Temporal Information for Analyzing Student Actions in E-Book Systems

Yuma Miyazaki,Valdemar Švábenský,Yuta Taniguchi,Fumiya Okubo,Tsubasa Minematsu,Atsushi Shimada
2024-05-24
Abstract:Digital textbook (e-book) systems record student interactions with textbooks as a sequence of events called EventStream data. In the past, researchers extracted meaningful features from EventStream, and utilized them as inputs for downstream tasks such as grade prediction and modeling of student behavior. Previous research evaluated models that mainly used statistical-based features derived from EventStream logs, such as the number of operation types or access frequencies. While these features are useful for providing certain insights, they lack temporal information that captures fine-grained differences in learning behaviors among different students. This study proposes E2Vec, a novel feature representation method based on word embeddings. The proposed method regards operation logs and their time intervals for each student as a string sequence of characters and generates a student vector of learning activity features that incorporates time information. We applied fastText to generate an embedding vector for each of 305 students in a dataset from two years of computer science courses. Then, we investigated the effectiveness of E2Vec in an at-risk detection task, demonstrating potential for generalizability and performance.
Computers and Society,Artificial Intelligence,Computation and Language,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that existing research fails to fully utilize the time - sequence information of operations when extracting features from the EventStream data of e - book systems. Specifically, although the features extracted in past research, such as the number of operation types or access frequencies, are useful for some analyses, these features lack the time information to capture the subtle differences in students' learning behaviors. To solve this problem, the paper proposes a new feature representation method named E2Vec, which is based on word embeddings and takes into account not only the order of operations but also the time intervals between operations. Through this method, the authors hope to provide a more fine - grained representation of learning activities to improve downstream tasks in the education field, such as predicting at - risk students. The following are the specific goals and contributions of the paper: 1. **Introducing time information**: Different from previous research that only focuses on statistical features of operations, E2Vec regards operation logs and their time intervals as character sequences and generates vectors of students' learning activities that contain time information. 2. **Improving the effectiveness of feature representation**: Verified by experiments, the features generated by E2Vec show higher performance and generalization ability in predicting at - risk students. 3. **Exploring new feature embedding models**: Use fastText to train the embedding model, generate feature vectors for each student, and evaluate their performance in downstream tasks. In summary, this paper aims to improve the feature representation of students' learning behaviors by introducing time information, thereby enhancing the effectiveness of educational data analysis.