Abstract:In the realm of education, feedback emerges as a pivotal component, serving to foster engagement and interaction while also facilitating the refinement of teaching methods to capture and maintain student attention. Traditional classroom assessment methods often struggle to accurately gauge the degree of comprehension among students during lectures, relying on manual comment collection that inherently carries the risk of inaccuracies. In response to this challenge, a novel system has been proposed, harnessing the power of Facial Emotion Recognition (FER) technology to capture student feedback. Within this framework, students are given a unique avenue to convey their emotions and reactions, employing facial expressions and gestures as the means to communicate. This innovative approach enables the analysis of students' emotional responses and thereby provides invaluable insights into their comprehension levels, as well as the overall quality and engagement experienced during lectures. The approach takes shape through the utilization of Computer Vision techniques, with a particular focus on an unobtrusive methodology for assessing students' overall engagement. Overcoming limitations of traditional assessment, our approach integrates compound scaling, employing the proposed Multitask EfficientNetB0 model recognized for its proved accuracy in emotion recognition (95.7%) and behavior analysis (96.3%) across diverse datasets (DAiSEE, iSED, iSAFFE). The behavioral classification system categorizes students into "Engaged" and "Disengaged" classes within a multi-class framework, providing nuanced insights into comprehension and Student engagement. Assessment metrics, including ROC Curves, Precision, Recall, and F1-Score, ensure a thorough evaluation. Our system's adaptability is demonstrated across varied educational environments, showcasing real-world efficacy in classrooms, laboratories, and seminar halls. The inclusion of MTCNN enhances face detection capabilities, facilitating robust analysis in dynamic scenarios. Expanding its applicability, the model has been put to the test in a range of educational settings, including classrooms, laboratory environments, and seminar halls, offering dual-capability analysis of both emotions and behavior. This comprehensive approach yields nuanced insights into student engagement and interaction, and its performance has been validated through real-world deployment within classrooms and seminars

Engagement Detection in Online Learning Based on Pre-trained Vision Transformer and Temporal Convolutional Network

Improving state-of-the-art in Detecting Student Engagement with Resnet and TCN Hybrid Network

TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals

Exploiting Temporal Coherence for Self-Supervised Visual Tracking by Using Vision Transformer

An Online Education Effect Assessing Method Based on YOLOv8 and Vision Transformer

Engagement Measurement Based on Facial Landmarks and Spatial-Temporal Graph Convolutional Networks

LS-VIT: Vision Transformer for action recognition based on long and short-term temporal difference

Class-attention video transformer for engagement prediction

Detecting disengagement in virtual learning as an anomaly using temporal convolutional network autoencoder

DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement Estimation in Conversation

Training Object Detectors from Scratch: An Empirical Study in the Era of Vision Transformer

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

Class-attention Video Transformer for Engagement Intensity Prediction

DAT++: Spatially Dynamic Vision Transformer with Deformable Attention

DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition

IFTSDNet: An Interact-Feature Transformer Network With Spatial Detail Enhancement Module for Change Detection

DMCNet: Diversified Model Combination Network for Understanding Engagement from Video Screengrabs

DTT-CGINet: A Dual Temporal Transformer Network with Multi-Scale Contour-Guided Graph Interaction for Change Detection

Multitask EfficientNet affective computing for student engagement detection

The Wits Intelligent Teaching System: Detecting Student Engagement During Lectures Using Convolutional Neural Networks

A Simple yet Effective Network based on Vision Transformer for Camouflaged Object and Salient Object Detection