Abstract:The goal of information security is to prevent unauthorized access to data. There are several conventional ways to confirm user identity, such as using a password, user name, and keys. These conventional methods are rather limited; they can be stolen, lost, copied, or cracked. Because multimodal biometric identification systems are more secure and have higher recognition efficiency than unimodal biometric systems, they get attention. Single-modal biometric recognition systems perform poorly in real-world public security operations because of poor biometric data quality. Some of the drawbacks of current multimodal fusion methods include low generalization and single-level fusion. This study presents a novel multimodal biometric fusion model that significantly enhances accuracy and generalization through the power of artificial intelligence. Various fusion methods, encompassing pixel-level, feature-level, and score-level fusion, are seamlessly integrated through deep neural networks. At the pixel level, we employ spatial, channel, and intensity fusion strategies to optimize the fusion process. On the feature level, modality-specific branches and jointly optimized representation layers establish robust dependencies between modalities through backpropagation. Finally, intelligent fusion techniques, such as Rank-1 and modality evaluation, are harnessed to blend matching scores on the score level. To validate the model's effectiveness, we construct a virtual homogeneous multimodal dataset using simulated operational data. Experimental results showcase significant improvements compared to single-modal algorithms, with a remarkable 2.2 percentage point increase in accuracy achieved through multimodal feature fusion. The score fusion method surpasses single-modal algorithms by 3.5 percentage points, reaching an impressive retrieval accuracy of 99.6%.

Bimodal speaker identification using dynamic bayesian network

Dynamic bayesian networks for audio-visual speaker recognition

Multi-level Fusion of Audio and Visual Features for Speaker Identification

Dynamic Bayesian network approach to speaker identification

Automatic Speaker Recognition Using Dynamic Bayesian Network.

Weight Estimation for Audio-Visual Multi-level Fusion in Bimodal Speaker Identification

A feature-level fusion based improved multimodal biometric recognition system using ear and profile face

Combining Voiceprint and Face Biometrics for Speaker Identification Using SDWS.

Speech Expression Multimodal Emotion Recognition Based on Deep Belief Network

Face-voice based multimodal biometric authentication system via FaceNet and GMM

A Finger Bimodal Fusion Algorithm Based on Improved Densenet

A dynamic face and fingerprint fusion system for identity authentication

A Fusion Approach to Spoken Language Identification Based on Combining Multiple Phone Recognizers and Speech Attribute Detectors

Look, Listen and Learn - A Multimodal LSTM for Speaker Identification

Artificial intelligence-Enabled deep learning model for multimodal biometric fusion

Multimodal person authentication using speech, face and visual speech

Audio-Visual Fusion Based on Interactive Attention for Person Verification

Multimodal biometrics of fingerprint and signature recognition using multi-level feature fusion and deep learning techniques

Bidirectional Attention For Text-Dependent Speaker Verification

Research on Voiceprint Recognition Technology Based on Deep Neural Network

A Novel Dual-Modal Emotion Recognition Algorithm with Fusing Hybrid Features of Audio Signal and Speech Context