Cross-VAE: Towards Disentangling Expression from Identity for Human Faces

Haozhe Wu,Jia,Lingxi Xie,Guojun Qi,Yuanchun Shi,Qi Tian
DOI: https://doi.org/10.1109/icassp40776.2020.9053608
2020-01-01
Abstract:Facial expression and identity are two independent yet intertwined components for representing a face. For facial expression recognition, identity can contaminate the training procedure by providing tangled but irrelevant information. In this paper, we propose to learn clearly disentangled and discriminative features that are invariant of identities for expression recognition. However, such disentanglement normally requires annotations of both expression and identity on one large dataset, which is often unavailable. Our solution is to extend conditional VAE to a crossed version named Cross-VAE, which is able to use partially labeled data to disentangle expression from identity. We emphasis the following novel characteristics of our Cross-VAE: (1) It is based on an independent assumption that the two latent representations' distributions are orthogonal. This ensures both encoded representations to be disentangled and expressive. (2) It utilizes a symmetric training procedure where the output of each encoder is fed as the condition of the other. Thus two partially labeled sets can be jointly used. Extensive experiments show that our proposed method is capable of encoding expressive and disentangled features for facial expression. Compared with the baseline methods, our model shows an improvement of 3.56% on average in terms of accuracy.
What problem does this paper attempt to address?