Evaluation of the Spatio-Temporal Features and GAN for Micro-expression Recognition System.

Sze-Teng Liong,Y. S. Gan,Danna Zheng,Shu-Meng Li,Hao-Xuan Xu,Han-Zhe Zhang,Ran-Ke Lyu,Kun-Hong Liu
DOI: https://doi.org/10.1007/s11265-020-01523-4
2020-01-01
Journal of Signal Processing Systems
Abstract:Owing to the development and advancement of artificial intelligence, numerous works have been established in the human facial expression recognition system. Meanwhile, the detection and classification of micro-expressions have been attracting attention from various research communities in the recent few years. In this paper, we first review the processes of a conventional optical-flow-based recognition system. Concisely, it comprises four basic steps: facial landmarks annotations (to detect the face and locate the landmark coordinates), optical flow guided images computation (to describe the dynamic changes on the face), feature extraction (to summarize the features encoded) and emotion class categorization (to build a classification model based on the given training data). Secondly, a few approaches have been implemented to improve the feature extraction part, such as exploiting GAN to generate more image samples. Particularly, several variations of optical flow are computed in order to generate optimal images, which lead to high recognition accuracies. Next, GAN, a combination of Generator and Discriminator, is utilized to generate new "fake" images to increase the sample size. Thirdly, a modified state-of-the-art convolutional neural networks is proposed. In brief, multiple optical flow derived components are adopted in the OFF-ApexNet structure to better represent the facial subtle motion changes. From the experiment results obtained, the additional optical flow information computed does not complement the feature extraction stage, and thus leading to poorer recognition performance. On the other hand, the implementation of GAN to the input data improves the performance in SMIC dataset, by achieving the accuracy of 61.80%, 62.20% and 60.98% for AC-GAN, SAGAN and without GAN images, respectively.
What problem does this paper attempt to address?