A Deep Multiscale Spatiotemporal Network for Assessing Depression from Facial Dynamics

Wheidima Carneiro de Melo,Eric Granger,Abdenour Hadid
DOI: https://doi.org/10.1109/taffc.2020.3021755
IF: 13.99
2020-01-01
IEEE Transactions on Affective Computing
Abstract:Recently, deep learning models have been successfully employed in many video-based affective computing applications (e.g., detecting pain, stress, and Alzheimer's disease). One key application is automatic depression recognition – recognition of facial expressions associated with depressive behaviour. State-of-the-art deep learning algorithms to recognize depression typically explore spatial and temporal information individually, by using 2D convolutional neural networks (CNNs) to analyze appearance information, and then by either mapping facial feature variations or averaging the depression level over video frames. This approach has limitations in terms of its ability to represent dynamic information that can help to accurately discriminate between depression levels. In contrast, models based on 3D CNNs allow to directly encode the spatio-temporal relationships, although these models rely on temporal information with fixed range and single receptive field. This approach limits the ability to capture variations of facial expression with diverse ranges, and the exploitation of diverse facial areas. In this article, a novel 3D CNN architecture – the Multiscale Spatiotemporal Network (MSN) – is introduced to effectively represent facial information related to depressive behaviours from videos. The basic structure of the model is composed of parallel convolutional layers with different temporal depths and sizes of receptive field, which allows the MSN to explore a wide range of spatio-temporal variations in facial expressions. Experimental results on two benchmark datasets show that our MSN architecture is effective, outperforming state-of-the-art methods in automatic depression recognition.
computer science, cybernetics, artificial intelligence
What problem does this paper attempt to address?