Abstract:In recent years, numerous “face-swapping” videos have emerged in social networks, one of the representatives is the lip forgery with speakers.While making life more entertaining for the public, it poses a significant crisis for personal privacy and property security in cyberspace.Currently, under non-destructive conditions, most of the lip forgery detection methods achieve good performance.However, the compression operations are widely used in practice especially in social media platforms, face recognition and other scenarios.While saving pixel and time redundancy, the compression operations affect the video quality and destroy the coherent integrity of pixel-to-pixel and frame-to-frame in the spatial domain, and then the degradation of its detection performance and even misjudgment of the real video will be caused.When the information in the spatial domain cannot provide sufficiently effective features, the information in the frequency domain naturally becomes a priority research object because it can resist compression interference.Aiming at this problem, the advantages of frequency information in image structure and gradient feedback were analyzed.Then the lip forgery detection via spatial-frequency domain combination was proposed, which effectively utilized the corresponding characteristics of information in spatial and frequency domains.For lip features in the spatial domain, an adaptive extraction network and a light-weight attention module were designed.For frequency features in the frequency domain, separate extraction and fusion modules for different components were designed.Subsequently, by conducting a weighted fusion of lip features in spatial domain and frequency features in frequency domain, more texture information was preserved.In addition, fine-grained constraints were designed during the training to separate the inter-class distance of real and fake lip features while closing the intra-class distance.Experimental results show that, benefiting from the frequency information, the proposed method can enhance the detection accuracy under compression situation with certain transferability.On the other hand, in the ablation study conducted on the core modules, the results verify the effectiveness of the frequency component for anti-compression and the constraint of the dual loss function in training.

Lip motion recognition of speaker based on SIFT

LVID: A Multimodal Biometrics Authentication System on Smartphones.

LipPass: Lip Reading-based User Authentication on Smartphones Leveraging Acoustic Signals.

Lip Reading-Based User Authentication Through Acoustic Sensing on Smartphones.

Speaker Recognition Technology Based on Lip Movement

Speaker Recognition Based on Lip-reading: an Overview

Audio-Visual System for Robust Speaker Recognition.

3D Convolutional Neural Networks Based Speaker Identification and Authentication.

Lip-reading algorithm in face recognition systems for high security

Visual Speaker Authentication By A Cnn-Based Scheme With Discriminative Segment Analysis

Lip Contour Extraction Based on Support Vector Machine Add Support

Robust Speaking Face Identification For Video Analysis

Lip Forgery Detection Via Spatial-Frequency Domain Combination

Authentication Analysis of Features Based on Lip-reading Recognition

An Approach to Robust and Fast Locating of Lip Motion

Face and Lip-reading Authentication System Based on Android Smart Phones

Lip-Movement Features Extraction and Recognition Based on Chroma Analysis

A Lip-Reading Recognition Approach Based on Long Short-Term Memory

Homeomorphic Manifold Analysis: Learning Motion Features of Image Sequence for Lipreading

Video Analysis Using Spatiotemporal Descriptor and Kernel Extreme Learning Machine for Lip Reading

An Inner Contour Based Lip Moving Feature Extraction Method for Chinese Speech