Abstract:Face recognition systems have been widely applied in security-related areas of our daily life. However, they are vulnerable to face spoofing attacks. Specifically, an attacker can fool a face recognition system into making false decisions, by presenting spoof face information (such as printed photos, replayed videos, etc.), rather than live face, to the face recognition system. Therefore, Face Anti-Spoofing (FAS) is critical for the security operation of a face recognition system. Deep learning-based FAS approaches show the best performance among existing FAS approaches. The basic idea of deep learning-based FAS approaches is to learn statistical representations capable of distinguishing spoof faces from live ones, and then leverage the learned representations for live and spoof face classifications. Therefore, the learned representations play a key role in the performance of FAS. However, most existing approaches learn representations from representation-entangled spaces, in which critical and irrelevant representations for live and spoof face classifications are entangled with each other, thereby bringing a negative influence on the performance of a FAS system. To address the issue, we introduced a Twin Autoencoder Disentanglement (TAD) framework. Our TAD framework utilizes adversarial learning and a reconstruction strategy to disentangle both critical and irrelevant representations into two mutually independent representation spaces. In addition, to further suppress irrelevant representations that may remain in the critical representation space, we design a multi-branch supervision architecture (MSA) and embed it into TAD. MSA achieves the goal via imposing depth supervision and pattern supervision to the critical representation space. i.e., learning spatial representation (face depth information) and texture representation (face spoof pattern information). Experimental results on four typical public datasets, OULU-NPU, SiW, Replay-Attack, and CASIA-MFSD, demonstrate that our proposed TAD approach successfully disentangles critical and irrelevant representations, and the two disentangled representations are more interpretable than state-of-the-art FAS methods. The codes are available at https://github.com/TAD-FAS/TAD.

CFASL: Composite Factor-Aligned Symmetry Learning for Disentanglement in Variational AutoEncoder

Facial Landmark Disentangled Network with Variational Autoencoder

Disentangling Factors of Variation in Deep Representations Using Adversarial Training.

Disentanglement with Factor Quantized Variational Autoencoders

Commutative Lie Group VAE for Disentanglement Learning

Improving disentanglement in variational auto-encoders via feature imbalance-informed dimension weighting

DynamicVAE: Decoupling Reconstruction Error and Disentangled Representation Learning

Guided Variational Autoencoder for Disentanglement Learning

Multimodal hierarchical Variational AutoEncoders with Factor Analysis latent space

Rethinking Controllable Variational Autoencoders

Disentangled VAE Representations for Multi-Aspect and Missing Data

3D Face Modeling via Weakly-supervised Disentanglement Network joint Identity-consistency Prior

Multifactor Sequential Disentanglement via Structured Koopman Autoencoders

Bridging Disentanglement with Independence and Conditional Independence Via Mutual Information for Representation Learning.

$α$-TCVAE: On the relationship between Disentanglement and Diversity

Improved disentangled speech representations using contrastive learning in factorized hierarchical variational autoencoder

Variantional autoencoder with decremental information bottleneck for disentanglement

Disentangle Irrelevant and Critical Representations for Face Anti-Spoofing

Bridging Disentanglement with Independence and Conditional Independence via Mutual Information for Representation Learning

FissionVAE: Federated Non-IID Image Generation with Latent Space and Decoder Decomposition

Semantically Disentangled Variational Autoencoder for Modeling 3D Facial Details