Abstract:With the advancement and utility of Artificial Intelligence (AI), personalising education to a global population could be a cornerstone of new educational systems in the future. This work presents the PEEKC dataset and the TrueLearn Python library, which contains a dataset and a series of online learner state models that are essential to facilitate research on learner engagement modelling.TrueLearn family of models was designed following the "open learner" concept, using humanly-intuitive user representations. This family of scalable, online models also help end-users visualise the learner models, which may in the future facilitate user interaction with their models/recommenders. The extensive documentation and coding examples make the library highly accessible to both machine learning developers and educational data mining and learning analytics practitioners. The experiments show the utility of both the dataset and the library with predictive performance significantly exceeding comparative baseline models. The dataset contains a large amount of AI-related educational videos, which are of interest for building and validating AI-specific educational recommenders.

What problem does this paper attempt to address?

The problems that this paper attempts to solve are: 1. **Lack of publicly available educational video engagement datasets**: Most of the currently publicly available datasets are mainly focused on question - answering and testing scenarios, lacking detailed data related to educational video - viewing behaviors. This restricts researchers' ability to develop and validate personalized educational recommendation systems. 2. **Deficiencies of existing models in large - scale, continuous - learning environments**: Traditional Knowledge Tracing (KT) and Item Response Theory (IRT) models are mainly targeted at limited learning materials and testing scenarios and cannot effectively support the personalized learning needs in large - scale, continuous - learning environments. 3. **Improving the prediction performance of educational video recommendation systems**: Existing educational recommendation systems fail to fully utilize the implicit signals of user - video interactions (such as clicks, viewing duration, etc.), making it difficult to achieve efficient and personalized learning support. For this purpose, the paper makes two main contributions: - **Creation and release of the PEEKC dataset**: This is a publicly available dataset containing more than 20,000 informal learners watching AI - related educational videos. Each video segment is annotated with relevant Knowledge Components (KCs). These data are collected in a real - world environment and can better reflect learners' natural learning behaviors. - **Development of the TrueLearn library**: This is an open - source Python library that contains the latest Bayesian online learning models and visualization tools for modeling learners' interests, knowledge, and novelty. The design of this library follows the "open learner" concept, uses an intuitive user representation, and provides multiple visualization methods to help users understand and manage their own learning states. Through these two contributions, the paper aims to promote the research and development of personalized educational recommendation systems, especially in large - scale, continuous - learning environments, by using implicit interaction signals to enhance learners' engagement and learning effectiveness. ### Formula Summary The main formulas involved in the paper are as follows: 1. **Knowledge component coverage of resources**: \[ SR(c, c')=\log\left(\frac{\max(|L_c|, |L_{c'}|)}{|L_c\cap L_{c'}|}\right)-\log\left(\frac{\min(|L_c|, |L_{c'}|)}{|W|}\right) \] where \( L_c \) represents the set of concepts linked to the Wikipedia concept \( c \), and \( W \) represents the set of all Wikipedia topics. 2. **Cosine similarity calculation**: \[ \cos(\text{str}, c)=\frac{\text{TFIDF}(\text{str})\cdot\text{TFIDF}(c)}{\|\text{TFIDF}(\text{str})\|\times\|\text{TFIDF}(c)\|} \] where \( \text{TFIDF}(s) \) returns the TF - IDF vector of the string \( s \), and \( \|\cdot\| \) represents the norm of the vector. 3. **Normalized viewing time**: \[ e_t^{\ell, r_i}=\frac{W(\ell, r_i)}{D(r_i)} \] where \( W(\cdot) \) is a function that returns the viewing time of the learner \( \ell \) for the resource \( r_i \), and \( D(\cdot) \) is a function that returns the duration of the lecture segment \( r_i \). These formulas are used to process and analyze educational video data to model learners' engagement and learning states.

A Toolbox for Modelling Engagement with Educational Videos

PEEK: A Large Dataset of Learner Engagement with Educational Videos

TrueLearn: A Python Library for Personalised Informational Recommendations with (Implicit) Feedback

TrueLearn: A Family of Bayesian Algorithms to Match Lifelong Learners to Open Educational Resources

Predicting Engagement in Video Lectures

A General Model for Detecting Learner Engagement: Implementation and Evaluation

CLUE: Contextualised Unified Explainable Learning of User Engagement in Video Lectures

DMCNet: Diversified Model Combination Network for Understanding Engagement from Video Screengrabs

VLEngagement: A Dataset of Scientific Video Lectures for Evaluating Population-based Engagement

Integrating AI and Learning Analytics for Data-Driven Pedagogical Decisions and Personalized Interventions in Education

Construction of personalized recommendation model for educational video game resources based on knowledge graph

Prediction and Localization of Student Engagement in the Wild

An Educational Tool for Learning about Social Media Tracking, Profiling, and Recommendation

Light-Sync: A low overhead synchronization algorithm for underwater acoustic networks

Intelligent Interface: Enhancing Lecture Engagement with Didactic Activity Summaries

AI Annotated Recommendations in an Efficient Visual Learning Environment with Emphasis on YouTube (AI-EVL)

From Passive Watching to Active Learning: Empowering Proactive Participation in Digital Classrooms with AI Video Assistant

Generative AI for Customizable Learning Experiences

An Interactive Visualization Tool for Understanding Active Learning

Toward enriched Cognitive Learning with XAI

Multi-task Information Enhancement Recommendation Model for Educational Self-Directed Learning System