A Deep Attention Transformer Network for Pain Estimation with Facial Expression Video

Haochen Xu,Manhua Liu
DOI: https://doi.org/10.1007/978-3-030-86608-2_13
2021-01-01
Abstract:Since pain often causes deformations in the facial structure, analysis of facial expressions has received considerable attention for automatic pain estimation in recent years. This study proposes a deep attention transformer network for pain estimation called Pain Estimate Transformer (PET), which consists of two different subnetworks: an image encoding subnetwork and a video transformer subnetwork. In image encoding subnetwork, ResNet is combined with a bottleneck attention block to learning the features of facial images. In the transformer subnetwork, a transformer encoder is used to capture the temporal relationship among frames. The spatial-temporal features are combined with Multi-Layer Perceptron (MLP) for pain intensity regression. Experimental results on the UNBC-McMaster Shoulder Pain dataset show that the proposed PET achieves compelling performances for pain intensity estimation.
What problem does this paper attempt to address?