Abstract:In this paper, we propose to employ a bank of modality-dedicated Convolutional Neural Networks (CNNs), fuse, train, and optimize them together for person classification tasks. A modality-dedicated CNN is used for each modality to extract modality-specific features. We demonstrate that, rather than spatial fusion at the convolutional layers, the fusion can be performed on the outputs of the fully-connected layers of the modality-specific CNNs without any loss of performance and with significant reduction in the number of parameters. We show that, using multiple CNNs with multimodal fusion at the feature-level, we significantly outperform systems that use unimodal representation. We study weighted feature, bilinear, and compact bilinear feature-level fusion algorithms for multimodal biometric person identification. Finally, We propose generalized compact bilinear fusion algorithm to deploy both the weighted feature fusion and compact bilinear schemes. We provide the results for the proposed algorithms on three challenging databases: CMU Multi-PIE, BioCop, and BIOMDATA.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the feature fusion problem in multimodal biometrics, in order to improve the accuracy of person classification tasks. Specifically, the author proposes a new method to design and optimize a multimodal - specific convolutional neural network (CNN), and perform feature - level fusion at the fully - connected layer instead of spatial fusion at the convolutional layer. This method not only reduces the number of parameters but also significantly improves the performance of the multimodal biometric system. ### Detailed Explanation 1. **Challenges in Multimodal Biometrics** - Biometric systems use human physical characteristics (such as face, iris, fingerprint, voice, etc.) for identity recognition. - Multimodal biometric systems can more robustly handle problems such as noisy data, non - universality, and class variation by combining multiple biometric features. - The challenge lies in how to effectively fuse feature information from different modalities to improve recognition performance. 2. **Limitations of Existing Methods** - Existing feature fusion methods include signal - level, feature - level, score - level, rank - level, and decision - level fusion. - Feature - level fusion usually has better results than other levels of fusion because it retains more original information. - Common feature - level fusion methods such as feature concatenation are less efficient, especially when the feature space dimension increases. - Bilinear multiplication can capture high - order dependencies between modalities, but has a high output dimension and high computational complexity. 3. **Solutions Proposed in the Paper** - **Multimodal - specific CNN**: Design a specialized CNN for each modality to extract features specific to that modality. - **Fully - connected Layer Fusion**: Perform feature - level fusion at the fully - connected layer instead of spatial fusion at the convolutional layer. This can significantly reduce the number of parameters without sacrificing performance. - **Generalized Compact Bilinear Fusion Algorithm**: Propose a generalized compact bilinear fusion algorithm, which combines the advantages of weighted feature fusion and the compact bilinear scheme. - **End - to - End Joint Optimization**: Through end - to - end training, jointly optimize the entire network structure, including the modality - specific network, the joint representation layer, and the classification layer. 4. **Experimental Verification** - Experiments were carried out on three challenging databases (CMU Multi - PIE, BioCop, BIOMDATA) to verify the effectiveness of the proposed method. - The experimental results show that the proposed method significantly outperforms single - modality representation in the case of multiple - modality fusion and performs excellently in feature - level fusion. ### Formula Display - **Bilinear Fusion** \[ Y = X_1^T X_2 \] where \(X_1\) and \(X_2\) are the feature vectors of two modalities. - **Generalized Compact Bilinear Fusion** \[ y=\text{FFT}^{-1}(\text{FFT}(\Psi(x_1, h_1, s_1))\odot\text{FFT}(\Psi(x_2, h_2, s_2))) \] where: - \(\Psi(x, h, s)\) is the count sketch function, - \(h\) is a random hash function, - \(s\) is a random sign function. In conclusion, this paper solves the feature fusion problem in multimodal biometrics through an innovative network architecture and fusion algorithm, and improves the recognition performance of the system.

Generalized Bilinear Deep Convolutional Neural Networks for Multimodal Biometric Identification

Artificial intelligence-Enabled deep learning model for multimodal biometric fusion

Enhanced multimodal biometric recognition systems based on deep learning and traditional methods in smart environments

Multimodal biometric identification system with deep learning based feature level fusion using maximum orthogonal method

Parallel score fusion of ECG and fingerprint for human authentication based on convolution neural network

CMCI: A Robust Multimodal Fusion Method for Spiking Neural Networks

Finger Multimodal Feature Fusion and Recognition Based on Channel Spatial Attention

A feature-level fusion based improved multimodal biometric recognition system using ear and profile face

Multimodal Biometrics Recognition Using a Deep Convolutional Neural Network with Transfer Learning in Surveillance Videos

MMTM: Multimodal Transfer Module for CNN Fusion

Quality-Aware Multimodal Biometric Recognition

Regulation of granulocyte apoptosis by NF-kappaB.

A Finger Bimodal Fusion Algorithm Based on Improved Densenet

A two-step verification-based multimodal-biometric authentication system using KCP-DCNN and QR code generation

Image and Encoded Text Fusion for Multi-Modal Classification

Deep Hashing for Secure Multimodal Biometrics

A twin convolutional neural network with hybrid binary optimizer for multimodal breast cancer digital image classification

Multimodal Biometrics Fusion Using Correlation Filter Bank.

Comparative Analysis of Modality Fusion Approaches for Audio-Visual Person Identification and Verification

Large-scale Multi-modal Person Identification in Real Unconstrained Environments

A Deep Feature Fusion Network Based on Multiple Attention Mechanisms for Joint Iris-Periocular Biometric Recognition