STRAN: Student expression recognition based on spatio-temporal residual attention network in classroom teaching videos

Zheng Chen,Meiyu Liang,Zhe Xue,Wanying Yu
DOI: https://doi.org/10.1007/s10489-023-04858-0
IF: 5.3
2023-08-09
Applied Intelligence
Abstract:In order to obtain the state of students' listening in class objectively and accurately, we can obtain students' emotions through their expressions in class and cognitive feedback through their behaviors in class, and then integrate the two to obtain a comprehensive assessment results of classroom status. However, when obtaining students' classroom expressions, the major problem is how to accurately and efficiently extract the expression features from the time dimension and space dimension of the class videos. In order to solve the above problems, we propose a class expression recognition model based on spatio-temporal residual attention network (STRAN), which could extract facial expression features through convolution operation in both time and space dimensions on the basis of limited resources, shortest time consumption and optimal performance. Specifically, STRAN firstly uses the residual network with the three-dimensional convolution to solve the problem of network degradation when the depth of the convolutional neural network increases, and the convergence speed of the whole network is accelerated at the same number of layers. Secondly, the spatio-temporal attention mechanism is introduced so that the network can effectively focus on the important video frames and the key areas within the frames. In order to enhance the comprehensiveness and correctness of the final classroom evaluation results, we use deep convolutional neural network to capture students' behaviors while obtaining their classroom expressions. Then, an intelligent classroom state assessment method(Weight_classAssess) combining students' expressions and behaviors is proposed to evaluate the classroom state. Finally, on the basis of the public datasets CK+ and FER2013, we construct two more comprehensive synthetic datasets CK+_Class and FER2013_Class, which are more suitable for the scene of classroom teaching, by adding some collected video sequences of students in class and images of students' expressions in class. The proposed method is compared with the existing methods, and the results show that STRAN can achieve 93.84% and 80.45% facial expression recognition rates on CK+ and CK+_Class datasets, respectively. The accuracy rate of classroom intelligence assessment of students based on Weight_classAssess also reaches 78.19%, which proves the effectiveness of the proposed method.
computer science, artificial intelligence
What problem does this paper attempt to address?