Class-attention video transformer for engagement prediction

Sheng, Victor
DOI: https://doi.org/10.1007/s11042-024-20350-4
IF: 2.577
2024-10-13
Multimedia Tools and Applications
Abstract:In this paper, we propose the Class Attention in Video Transformer (CavT), an end-to-end method designed to process both long and short variant-length videos for student engagement prediction. CavT introduces a single vector for class embedding and incorporates the Binary-Order Representatives Sampling (BorS) technique to augment the dataset by adding multiple video sequences. Our method outperforms the state-of-the-art with MSE values of 0.0495 on the EmotiW-EP and 0.0377 on the DAiSEE datasets, providing a robust and scalable solution for engagement prediction.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?