AFFAKT: A Hierarchical Optimal Transport based Method for Affective Facial Knowledge Transfer in Video Deception Detection

Zihan Ji,Xuetao Tian,Ye Liu
2024-12-12
Abstract:The scarcity of high-quality large-scale labeled datasets poses a huge challenge for employing deep learning models in video deception detection. To address this issue, inspired by the psychological theory on the relation between deception and expressions, we propose a novel method called AFFAKT in this paper, which enhances the classification performance by transferring useful and correlated knowledge from a large facial expression dataset. Two key challenges in knowledge transfer arise: 1) \textit{how much} knowledge of facial expression data should be transferred and 2) \textit{how to} effectively leverage transferred knowledge for the deception classification model during inference. Specifically, the optimal relation mapping between facial expression classes and deception samples is firstly quantified using proposed H-OTKT module and then transfers knowledge from the facial expression dataset to deception samples. Moreover, a correlation prototype within another proposed module SRKB is well designed to retain the invariant correlations between facial expression classes and deception classes through momentum updating. During inference, the transferred knowledge is fine-tuned with the correlation prototype using a sample-specific re-weighting strategy. Experimental results on two deception detection datasets demonstrate the superior performance of our proposed method. The interpretability study reveals high associations between deception and negative affections, which coincides with the theory in psychology.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in video spoofing detection, due to the lack of high - quality large - scale labeled datasets, the performance of deep - learning models is limited. Specifically, current spoofing - detection datasets (such as Real Life Trial (RTL) and DOLOS) usually contain a small number of labeled samples, which restricts the training of deep neural networks and thus hinders further performance improvement. To solve this problem, the authors propose a new method named AFFAKT (Affective Facial Knowledge Transfer). AFFAKT enhances the classification performance of video spoofing - detection models by transferring useful and relevant knowledge from large - scale facial - expression datasets. This method aims to answer two key questions: 1. How much knowledge of facial - expression data should be transferred? 2. How can the transferred knowledge be effectively utilized during the inference process to improve the spoofing - classification model? ### Solution Overview AFFAKT mainly consists of the following four modules: 1. **Encoder Layer**: Use a pre - trained encoder to extract feature representations of the source domain (facial - expression datasets) and the target domain (spoofing - detection datasets). 2. **Hierarchical Optimal Transport Knowledge Transfer Module (Hierarchical Optimal Transport Knowledge Transfer, H - OTKT)**: Automatically quantify the potential correlation between facial - expression categories and spoofing samples through hierarchical optimal transport (H - OT), and determine how much knowledge to transfer from different categories to each sample. The specific formula is as follows: \[ OT_{\text{high}}(P, Q)=\min_{T \in \Pi(P, Q)}\langle T, M\rangle_F-\epsilon H(T) \] where \( T \in \mathbb{R}^{n\times L_s} \) and \( M \in \mathbb{R}^{n\times L_s} \) are the transport - plan matrix and cost matrix respectively, and \( \Pi(P, Q) \) is the constraint condition, ensuring that the marginal distributions of \( T \) are \( P \) and \( Q \). 3. **Classification Layer**: Use a multi - layer perceptron (MLP) and the softmax function for the final classification prediction, and define the cross - entropy loss function \( L_{ce} \) and the Sinkhorn - divergence - based spatial - difference loss function \( L_{ot} \) to optimize the entire network and reduce the difference between the source - feature space and the target - feature space. 4. **Sample - specific Re - weighting Knowledge Bank Module (Sample - specific Re - weighting Knowledge Bank, SRKB)**: Construct relevant prototypes \( B \) through a momentum - update mechanism to maintain the invariant relationship between the target class and the source class, and use a sample - specific re - weighting strategy in the test phase to enhance the detection performance. ### Experimental Results The experimental results show that AFFAKT achieves better performance than existing methods on two video spoofing - detection datasets (RTL and DOLOS). In particular, AFFAKT shows significant advantages in terms of F1 - score, accuracy (ACC), and AUC metrics. In addition, interpretability studies show that there is a high correlation between spoofing behavior and negative emotions, which is consistent with psychological theories. In conclusion, AFFAKT significantly improves the performance of video spoofing - detection by effectively transferring relevant knowledge in facial - expression data, and solves the problem of limited performance of deep - learning models due to insufficient data.