A Two-Stage Spatiotemporal Attention Convolution Network for Continuous Dimensional Emotion Recognition from Facial Video

Min Hu,Qian Chu,Xiaohua Wang,Lei He,Fuji Ren
DOI: https://doi.org/10.1109/lsp.2021.3063609
2021-01-01
IEEE Signal Processing Letters
Abstract:Continuous dimensional emotion recognition for facial video sequence is a crucial and challenging task in Affective Computing and Human-Computer Intelligent Interaction. The key of this task is to effectively extract and discriminate spatial-temporal features in a more fine-grained way. In this paper, a Two-Stage Spatiotemporal Attention Temporal Convolution Network (TS-SATCN) is designed for continuous dimensional emotion recognition of facial videos. The first stage generates an initial recognition result that is later fed into the second for correction. In each stage, the introduced spatiotemporal attention branch helps the network learn different attention levels and focuses on the informative spatial-temporal features adaptively. The network is trained by a proposed smooth loss function which can further improve the predictions’ quality. Extensive experiments are performed on two datasets, RECOLA and AFEW-VA, which shows that the proposed method achieves significant improvement over state-of-the-art methods.
What problem does this paper attempt to address?