Generalized Bilinear Deep Convolutional Neural Networks for Multimodal Biometric Identification

Sobhan Soleymani,Amirsina Torfi,Jeremy Dawson,Nasser M. Nasrabadi
DOI: https://doi.org/10.48550/arXiv.1807.01298
2018-07-04
Abstract:In this paper, we propose to employ a bank of modality-dedicated Convolutional Neural Networks (CNNs), fuse, train, and optimize them together for person classification tasks. A modality-dedicated CNN is used for each modality to extract modality-specific features. We demonstrate that, rather than spatial fusion at the convolutional layers, the fusion can be performed on the outputs of the fully-connected layers of the modality-specific CNNs without any loss of performance and with significant reduction in the number of parameters. We show that, using multiple CNNs with multimodal fusion at the feature-level, we significantly outperform systems that use unimodal representation. We study weighted feature, bilinear, and compact bilinear feature-level fusion algorithms for multimodal biometric person identification. Finally, We propose generalized compact bilinear fusion algorithm to deploy both the weighted feature fusion and compact bilinear schemes. We provide the results for the proposed algorithms on three challenging databases: CMU Multi-PIE, BioCop, and BIOMDATA.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the feature fusion problem in multimodal biometrics, in order to improve the accuracy of person classification tasks. Specifically, the author proposes a new method to design and optimize a multimodal - specific convolutional neural network (CNN), and perform feature - level fusion at the fully - connected layer instead of spatial fusion at the convolutional layer. This method not only reduces the number of parameters but also significantly improves the performance of the multimodal biometric system. ### Detailed Explanation 1. **Challenges in Multimodal Biometrics** - Biometric systems use human physical characteristics (such as face, iris, fingerprint, voice, etc.) for identity recognition. - Multimodal biometric systems can more robustly handle problems such as noisy data, non - universality, and class variation by combining multiple biometric features. - The challenge lies in how to effectively fuse feature information from different modalities to improve recognition performance. 2. **Limitations of Existing Methods** - Existing feature fusion methods include signal - level, feature - level, score - level, rank - level, and decision - level fusion. - Feature - level fusion usually has better results than other levels of fusion because it retains more original information. - Common feature - level fusion methods such as feature concatenation are less efficient, especially when the feature space dimension increases. - Bilinear multiplication can capture high - order dependencies between modalities, but has a high output dimension and high computational complexity. 3. **Solutions Proposed in the Paper** - **Multimodal - specific CNN**: Design a specialized CNN for each modality to extract features specific to that modality. - **Fully - connected Layer Fusion**: Perform feature - level fusion at the fully - connected layer instead of spatial fusion at the convolutional layer. This can significantly reduce the number of parameters without sacrificing performance. - **Generalized Compact Bilinear Fusion Algorithm**: Propose a generalized compact bilinear fusion algorithm, which combines the advantages of weighted feature fusion and the compact bilinear scheme. - **End - to - End Joint Optimization**: Through end - to - end training, jointly optimize the entire network structure, including the modality - specific network, the joint representation layer, and the classification layer. 4. **Experimental Verification** - Experiments were carried out on three challenging databases (CMU Multi - PIE, BioCop, BIOMDATA) to verify the effectiveness of the proposed method. - The experimental results show that the proposed method significantly outperforms single - modality representation in the case of multiple - modality fusion and performs excellently in feature - level fusion. ### Formula Display - **Bilinear Fusion** \[ Y = X_1^T X_2 \] where \(X_1\) and \(X_2\) are the feature vectors of two modalities. - **Generalized Compact Bilinear Fusion** \[ y=\text{FFT}^{-1}(\text{FFT}(\Psi(x_1, h_1, s_1))\odot\text{FFT}(\Psi(x_2, h_2, s_2))) \] where: - \(\Psi(x, h, s)\) is the count sketch function, - \(h\) is a random hash function, - \(s\) is a random sign function. In conclusion, this paper solves the feature fusion problem in multimodal biometrics through an innovative network architecture and fusion algorithm, and improves the recognition performance of the system.