Bag of states: a non-sequential approach to video-based engagement measurement

Ali Abedi,Chinchu Thomas,Dinesh Babu Jayagopi,Shehroz S. Khan
DOI: https://doi.org/10.1007/s00530-023-01244-1
IF: 3.9
2024-01-29
Multimedia Systems
Abstract:Automated measurement of student engagement equips educators with valuable insights, aiding them in achieving educational program objectives and customizing their approach to suit individual students. Engagement measurement requires a detailed analysis of the behavioral and affective states of students over precise timescales. A range of current techniques have engineered sequential and spatiotemporal models, including recurrent neural networks, temporal convolutional networks, three-dimensional convolutional neural networks, and transformers to measure engagement from video data. These models are trained to incorporate the sequential/temporal order of behavioral and affective states into the video analysis, outputting their level of engagement. Drawing upon the definition of engagement in educational psychology, this paper questions the necessity of incorporating the order of behavioral and affective states into engagement measurement. Non-sequential bag-of-words-based models are developed to analyze behavioral and affective features extracted from videos and output engagement levels. The non-sequential models only analyze the occurrence of behavioral and affective states not the order in which they occur. Experimental results indicate that the proposed non-sequential approach is superior to state-of-the-art sequential engagement measurement approaches. On the IIITB Online SE dataset, the proposed approach significantly improved engagement level classification accuracy by 22%, and 26%, respectively, compared to the recurrent neural network, and the temporal convolutional network. It also improved minority class recall and achieved a classification accuracy as high as 0.6658 On the DAiSEE dataset. In another experiment, models displayed consistent performance while trained on the shuffled versions of the datasets compared with those trained on the original, unshuffled datasets. In the shuffled versions, behavioral and affective states within video samples were randomly permuted. These observations reinforce the notion that the order in which affective and behavioral states occur does not impact engagement measurement.
computer science, information systems, theory & methods
What problem does this paper attempt to address?