MsMemoryGAN: A Multi-scale Memory GAN for Palm-vein Adversarial Purification
Huafeng Qin,Yuming Fu,Huiyan Zhang,Mounim A. El-Yacoubi,Xinbo Gao,Qun Song,Jun Wang
2024-08-20
Abstract:Deep neural networks have recently achieved promising performance in the vein recognition task and have shown an increasing application trend, however, they are prone to adversarial perturbation attacks by adding imperceptible perturbations to the input, resulting in making incorrect recognition. To address this issue, we propose a novel defense model named MsMemoryGAN, which aims to filter the perturbations from adversarial samples before recognition. First, we design a multi-scale autoencoder to achieve high-quality reconstruction and two memory modules to learn the detailed patterns of normal samples at different scales. Second, we investigate a learnable metric in the memory module to retrieve the most relevant memory items to reconstruct the input image. Finally, the perceptional loss is combined with the pixel loss to further enhance the quality of the reconstructed image. During the training phase, the MsMemoryGAN learns to reconstruct the input by merely using fewer prototypical elements of the normal patterns recorded in the memory. At the testing stage, given an adversarial sample, the MsMemoryGAN retrieves its most relevant normal patterns in memory for the reconstruction. Perturbations in the adversarial sample are usually not reconstructed well, resulting in purifying the input from adversarial perturbations. We have conducted extensive experiments on two public vein datasets under different adversarial attack methods to evaluate the performance of the proposed approach. The experimental results show that our approach removes a wide variety of adversarial perturbations, allowing vein classifiers to achieve the highest recognition accuracy.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problem does this paper attempt to solve?
This paper aims to solve the problem of the vulnerability of deep neural networks to adversarial attacks in palm vein recognition tasks. Specifically, although deep - learning models perform well in vein recognition tasks and show broad application prospects, they are vulnerable to adversarial perturbation attacks. By adding imperceptible small perturbations to the input image, an attacker can make the model make incorrect identifications.
To meet this challenge, the author proposes a new defense model - **MsMemoryGAN** (Multi - scale Memory GAN), which can filter out perturbations from adversarial samples before recognition, thereby improving the robustness and security of the vein recognition system.
### Main contributions of MsMemoryGAN
1. **Multi - scale memory - enhanced autoencoder**:
- A multi - scale memory - enhanced autoencoder is proposed to reconstruct the input image with high quality and learn the detailed patterns of normal samples at different scales.
- Two memory modules are designed to capture the multi - scale features of normal samples.
2. **Learnable metric method**:
- A learnable metric method is introduced in the memory module to calculate the correlation between the latent code of the input image and the items in the memory module.
- This method can retrieve the memory items most relevant to the input more accurately, thereby improving the reconstruction quality.
3. **Combination of perceptual loss and adversarial loss**:
- The perceptual loss and pixel loss are combined to further improve the quality of the reconstructed image.
- The adversarial loss is used to train the generative adversarial network (GAN) to make the generated image more realistic.
4. **Experimental verification**:
- Extensive experiments are carried out on two public palm vein data sets to evaluate the performance of MsMemoryGAN under different adversarial attack methods.
- The experimental results show that this method can effectively remove a variety of adversarial perturbations, making the vein recognition model achieve the highest recognition accuracy.
### Formula representation
- **Feature representation**:
\[
z_t = E_t(x), \quad z_b = E_b(z_t)
\]
where \(E_t\) and \(E_b\) are the top - level and bottom - level encoders respectively, and \(x\) is the input image.
- **Memory module output**:
\[
\hat{z} = wM=\sum_{i = 1}^N w_i m_i
\]
where \(w\) is the soft - address vector, \(M\) is the memory matrix, and \(m_i\) is the \(i\) - th memory item.
- **Sparsify address weights**:
\[
b w_i=\text{normalize}\left(\max(w_i-\gamma, 0)\cdot\frac{w_i}{|w_i - \gamma|+\alpha}\right)
\]
where \(\gamma\) is the weight threshold and \(\alpha\) is a very small positive number to prevent the denominator from being zero.
Through these innovations, MsMemoryGAN can effectively purify the perturbations in adversarial samples, thereby improving the security and accuracy of the vein recognition system.