Abstract:Various methods are employed in computer vision applications to identify individuals, including using face recognition as a human visual feature helpful in tracking or searching for a person. However, tracking systems that rely solely on facial information encounter limitations, particularly when faced with occlusions, blurred images, or faces oriented away from the camera. Under these conditions, the system struggles to achieve accurate tracking-based face recognition. Therefore, this research addresses this issue by fusing descriptions of the face visual with body visual features. When the system cannot find the target face, the CNN+LSTM hybrid method assists in multi-feature body visual recognition, narrowing the search space and speeding up the search process. The results indicate that the combination of the CNN+LSTM method yields higher accuracy, recall, precision, and F1 scores (reaching 89.20%, 87.36%, 91.02%, and 88.43%, respectively) compared to the single CNN method (reaching 88.84%, 74.00%, 67.00%, and 69.00% respectively). However, the combination of these two visual features requires high computation. Thus, it is necessary to add a tracking system to reduce the computational load and predict the location. Furthermore, this research utilizes the Q-Learning algorithm to make optimal decisions in automatically tracking objects in dynamic environments. The system considers factors such as face and body visual features, object location, and environmental conditions to make the best decisions, aiming to enhance tracking efficiency and accuracy. Based on the conducted experiments, it is concluded that the system can adjust its actions in response to environmental changes with better outcomes. It achieves an accuracy rate of 91.5% and an average of 50 fps in five different videos, as well as a video benchmark dataset with an accuracy of 84% and an average error of 11.15 pixels. Utilizing the proposed method speeds up the search process and optimizes tracking decisions, saving time and computational resources.

Multi-target video-based face recognition and gesture recognition based on enhanced detection and multi-trajectory incremental learning

Video Based Head Detection and Tracking Surveillance System

Beyond Traditional Driving Scenes: A Robotic-Centric Paradigm for 2D+3D Human Tracking Using Siamese Transformer Network

Unconstrained Face Detection and Recognition Based on Rgb-D Camera for the Visually Impaired

Multi Gesture Recognition: A Tracking Learning Detection Approach.

Multi-features Guided Robust Visual Tracking.

Joint Structured Sparsity Regularized Multiview Dimension Reduction for Video-Based Facial Expression Recognition.

Real-time detection tracking and recognition algorithm based on multi-target faces

2D Motion Detection Bounded Hand 3D Trajectory Tracking and Gesture Recognition under Complex Background

Multi-target tracking based on appearance features and similarity fusion

An Automatic System for Unconstrained Video-Based Face Recognition

Long-term face tracking in the wild using deep learning

A Real-Time Multi-Task Learning System for Joint Detection of Face, Facial Landmark and Head Pose

Multi-Clue Based Facial Feature Detection and Tracking in Video

Dynamic gesture tracking and recognition algorithm based on deep learning

Robust Video-Based Face Recognition Via M-estimator and Image Set Collaborative Representation.

A Video Target Re-Recognition Method Based on Adaptive Attention Enhancement and Multi-Scale Feature Fusion

Simultaneous Facial Feature Tracking and Facial Expression Recognition.

Unsupervised Multi-Target Trajectory Detection, Learning And Analysis In Complicated Environments

Real-Time Human Tracking Using Multi-Features Visual With CNN-LSTM and Q-Learning

A Multi-Task Model for Simultaneous Face Identification and Facial Expression Recognition.