Abstract:Recent successes of deep learning-based recognition rely on maintaining the content related to the main-task label. However, how to explicitly dispel the noisy signals for better generalization in a controllable manner remains an open issue. For instance, various factors such as identity-specific attributes, pose, illumination and expression affect the appearance of face images. Disentangling the identity-specific factors is potentially beneficial for facial expression recognition (FER). This chapter systematically summarize the detrimental factors as task-relevant/irrelevant semantic variations and unspecified latent variation. In this chapter, these problems are casted as either a deep metric learning problem or an adversarial minimax game in the latent space. For the former choice, a generalized adaptive (N+M)-tuplet clusters loss function together with the identity-aware hard-negative mining and online positive mining scheme can be used for identity-invariant FER. The better FER performance can be achieved by combining the deep metric loss and softmax loss in a unified two fully connected layer branches framework via joint optimization. For the latter solution, it is possible to equipping an end-to-end conditional adversarial network with the ability to decompose an input sample into three complementary parts. The discriminative representation inherits the desired invariance property guided by prior knowledge of the task, which is marginal independent to the task-relevant/irrelevant semantic and latent variations. The framework achieves top performance on a serial of tasks, including lighting, makeup, disguise-tolerant face recognition and facial attributes recognition. This chapter systematically summarize the popular and practical solution for disentanglement to achieve more discriminative visual recognition.

Toward Identity-Invariant Facial Expression Recognition: Disentangled Representation via Mutual Information Perspective

DR-FER: Discriminative and Robust Representation Learning for Facial Expression Recognition

Disentangling Identity and Pose for Facial Expression Recognition

Automatic 4D Facial Expression Recognition via Collaborative Cross-domain Dynamic Image Network.

Disentanglement for Discriminative Visual Recognition

Identity-Enhanced Network for Facial Expression Recognition

Efficient Facial Expression Recognition with Representation Reinforcement Network and Transfer Self-Training for Human–Machine Interaction

Facial Expression Recognition Using Disentangled Adversarial Learning

Facial Expression Recognition by Expression-Specific Representation Swapping

Designing One Unified Framework for High-Fidelity Face Reenactment and Swapping

Disentangled Representation for Age-Invariant Face Recognition: A Mutual Information Minimization Perspective

Learning informative and discriminative semantic features for robust facial expression recognition

3D-FERNet: A Facial Expression Recognition Network utilizing 3D information

Collaborative expression representation using peak expression and intra class variation face images for practical subject-independent emotion recognition in videos

THIN: THrowable Information Networks and Application for Facial Expression Recognition In The Wild

Facial Expression Recognition with Controlled Privacy Preservation and Feature Compensation

Deep Representation of Facial Geometric and Photometric Attributes for Automatic 3D Facial Expression Recognition

Enhanced Dual-Level Representations for Facial Expression Recognition

Dual-channel feature disentanglement for identity-invariant facial expression recognition

Multi-Domain Norm-referenced Encoding Enables Data Efficient Transfer Learning of Facial Expression Recognition

Hypergraph-Guided Disentangled Spectrum Transformer Networks for Near-Infrared Facial Expression Recognition