Abstract:Over the past few years, face recognition has gained many interests. Face recognition has become a popular area of research in computer vision and pattern recognition. The problem attracts researchers from different disciplines such as image processing, pattern recognition, neural networks, computer vision, and computer graphics (Zhao, Chellappa, Rosenfeld & Phillips, 2003). Face recognition is a typical computer vision problem. The goal of computer vision is to understand the images of scenes, locate and identify objects, determine their structures, spatial arrangements and relationship with other objects (Shah, 2002). The main task of face recognition is to locate and identify the identity of people in the scene. Face recognition is also a challenging pattern recognition problem. The number of training samples of each face class is usually so small that it is hard to learn the distribution of each class. In addition, the within-class difference may be sometimes larger than the between-class difference due to variations in illumination, pose, expression, age, etc. The availability of the feasible technologies brings face recognition many potential applications, such as in face ID, access control, security, surveillance, smart cards, law enforcement, face databases, multimedia management, human computer interaction, etc (Li & Jain, 2005). Traditional still image-based face recognition has achieved great success in constrained environments. However, once the conditions (including illumination, pose, expression, age) change too much, the performance declines dramatically. The recent FRVT2002 (Face Recognition Vendor Test 2002) (Phillips, Grother, Micheals, Blackburn, Tabassi & Bone 2003) shows that the recognition performance of face images captured in an outdoor environment and different days is still not satisfying. Current still image-based face recognition algorithms are even far away from the capability of human perception system (Zhao, Chellappa, Rosenfeld & Phillips, 2003). On the other hand, psychology and physiology studies have shown that motion can help people for better face recognition (Knight & Johnston, 1997; O’Toole, Roark & Abdi, 2002). Torres (2004) pointed out that traditional still image-based face recognition confronts great challenges and difficulties. There are two potential ways to solve it: video-based face recognition technology and multi-modal identification technology. During the past several years, many research efforts have been concentrated on video-based face recognition. Compared with still image-based face recognition, true video-based face recognition algorithms that use both spatial and temporal information started only a few years ago (Zhao, Chellappa, Rosenfeld & Phillips, 2003). This article gives an overview of most existing methods in the field of video-based face recognition and analyses their respective pros and cons. First, a general statement of face recognition is given. Then, most existing methods for video-based face recognition are briefly reviewed. Some future trends and conclusions are given in the end.

Video-based Face Outline Recognition

Person Re-identification Based on Transform Algorithm

Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos

A Video Is Worth Three Views: Trigeminal Transformers for Video-Based Person Re-Identification

Unified Video and Image Representation for Boosted Video Face Forgery Detection

Video-Based Facial Animation with Detailed Appearance Texture

TEINet: Towards an Efficient Architecture for Video Recognition.

Video-driven state-aware facial animation

Person Re-Identification by Unsupervised Video Matching.

Cross-Modality Spatial-Temporal Transformer for Video-Based Visible-Infrared Person Re-Identification

Multi-Stage Spatio-Temporal Aggregation Transformer for Video Person Re-identification

Person Re-Identification By Video Ranking

Video-based person re-identification with complementary local and global features using a graph transformer

An Unbiased Temporal Representation for Video-Based Person Re-Identification

Deeply-Coupled Convolution-Transformer with Spatial-temporal Complementary Learning for Video-based Person Re-identification

Exploring Static–Dynamic ID Matching and Temporal Static ID Inconsistency for Generalizable Deepfake Detection

State-of-the-Art on Video-Based Face Recognition

Two-stream Multi-level Dynamic Point Transformer for Two-person Interaction Recognition

PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation

Adaptive Human Silhouette Reconstruction Based on the Exploration of Temporal Information

Long-term face tracking in the wild using deep learning