RoWSFormer: A Robust Watermarking Framework with Swin Transformer for Enhanced Geometric Attack Resilience

Weitong Chen,Yuheng Li
2024-09-23
Abstract:In recent years, digital watermarking techniques based on deep learning have been widely studied. To achieve both imperceptibility and robustness of image watermarks, most current methods employ convolutional neural networks to build robust watermarking frameworks. However, despite the success of CNN-based watermarking models, they struggle to achieve robustness against geometric attacks due to the limitations of convolutional neural networks in capturing global and long-range relationships. To address this limitation, we propose a robust watermarking framework based on the Swin Transformer, named RoWSFormer. Specifically, we design the Locally-Channel Enhanced Swin Transformer Block as the core of both the encoder and decoder. This block utilizes the self-attention mechanism to capture global and long-range information, thereby significantly improving adaptation to geometric distortions. Additionally, we construct the Frequency-Enhanced Transformer Block to extract frequency domain information, which further strengthens the robustness of the watermarking framework. Experimental results demonstrate that our RoWSFormer surpasses existing state-of-the-art watermarking methods. For most non-geometric attacks, RoWSFormer improves the PSNR by 3 dB while maintaining the same extraction accuracy. In the case of geometric attacks (such as rotation, scaling, and affine transformations), RoWSFormer achieves over a 6 dB improvement in PSNR, with extraction accuracy exceeding 97\%.
Multimedia,Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
### Problems the paper attempts to solve The paper aims to address two major key challenges in digital watermarking technology: **robustness** and **imperceptibility**. Specifically, in view of the poor performance of existing watermarking frameworks based on convolutional neural networks (CNN) when dealing with geometric attacks, the paper proposes a new framework based on Swin Transformer - RoWSFormer (Robust Watermarking Framework with Swin Transformer). The main objectives of the paper include: 1. **Improve robustness against geometric attacks**: Existing watermarking frameworks based on CNN have difficulty in effectively capturing global and long - distance relationships due to the limitations of convolution operations, resulting in poor performance when dealing with geometric attacks such as rotation, scaling and affine transformation. RoWSFormer significantly improves the adaptability to geometric attacks by introducing Swin Transformer and using the self - attention mechanism to capture global and long - distance information. 2. **Enhance frequency - domain feature extraction**: To further improve the robustness of the watermarking framework, RoWSFormer designs a Frequency - Enhanced Transformer Block (FETB) for extracting frequency - domain information. This helps to improve the robustness of the watermark under various attacks. 3. **Maintain imperceptibility**: While improving robustness, RoWSFormer also focuses on maintaining the imperceptibility of the watermark, ensuring that the watermarked image is as visually close as possible to the original image. ### Specific methods 1. **Locally - Channel Enhanced Swin Transformer Block (LCESTB)** : - **Swin Transformer Block**: Reduces computational cost through windowed self - attention mechanism and is suitable for image watermarking tasks. - **Locally - Channel Enhanced Block**: Combines convolutional layers and channel - attention mechanisms to extract local and channel features and enhance the ability to capture detailed channel information. 2. **Frequency - Enhanced Transformer Block (FETB)** : - **Transformer Block**: Uses multiple standard Transformer blocks to capture global features. - **Frequency - Enhanced Block**: Extracts frequency - domain features through discrete cosine transform (DCT) and calculates frequency - domain attention weights to further enhance the robustness of the watermark. 3. **Loss functions** : - **Image loss**: Uses mean - square error (MSE) to measure the difference between the watermarked image and the original image. - **Decoding loss**: Uses MSE to measure the difference between the extracted watermark and the original watermark. - **Constraint loss**: Ensures that the pixel values of the watermarked image are within the standard range ([0, 255]). ### Experimental results The experimental results show that RoWSFormer performs well under a variety of non - geometric and geometric attacks. In particular, in geometric attacks (such as rotation, scaling and affine transformation), the PSNR value of RoWSFormer is increased by more than 6 dB, and the extraction accuracy exceeds 97%. For most non - geometric attacks, the PSNR value of RoWSFormer is increased by 3 dB while maintaining the same extraction accuracy. ### Conclusion RoWSFormer significantly improves the robustness of digital watermarking technology when dealing with geometric attacks by introducing Swin Transformer and Frequency - Enhanced Transformer Block, while maintaining good imperceptibility. This framework has high practical value in practical applications.