A Performance Analysis of Face and Speech Recognition in the Video and Audio Stream Using Machine Learning Classification Techniques

Roshni Wadhwani,Akhil Pandey,Vishal Shrivastav
DOI: https://doi.org/10.1007/978-981-16-3915-9_14
2021-01-01
Abstract:Object detection and tracking is usually the first step in applications such as video surveillance. The static camera face recognition and tracking system's main purpose is to estimate the speed and distance parameters. We propose a general detection and tracking method for motion based on the visual system and using the image difference algorithm. Then recognize the person's voice to get feedback from the corresponding person. The process focuses on detecting people on stage and then completes the voice signal processing. We propose a new person recognition technology that uses face and voice fusion compared to a single biometric recognition, and this technology can greatly improve the recognition speed. The development of security systems uses the Viola–Jones face recognition algorithm. The proposed method uses the Local Binary Pattern (LBP) as a function extraction technique to calculate local functions. Our project uses Mel-Frequency Divergence Coefficient (MFCC) extraction technology for speech recognition. The extracted functions are used as input to the multi-SVM classifier to provide a gender to identify individuals and display the results. The new system can be used in various areas, such as identity verification and other potential commercial applications.
What problem does this paper attempt to address?