Abstract:In recent years, digital watermarking techniques based on deep learning have been widely studied. To achieve both imperceptibility and robustness of image watermarks, most current methods employ convolutional neural networks to build robust watermarking frameworks. However, despite the success of CNN-based watermarking models, they struggle to achieve robustness against geometric attacks due to the limitations of convolutional neural networks in capturing global and long-range relationships. To address this limitation, we propose a robust watermarking framework based on the Swin Transformer, named RoWSFormer. Specifically, we design the Locally-Channel Enhanced Swin Transformer Block as the core of both the encoder and decoder. This block utilizes the self-attention mechanism to capture global and long-range information, thereby significantly improving adaptation to geometric distortions. Additionally, we construct the Frequency-Enhanced Transformer Block to extract frequency domain information, which further strengthens the robustness of the watermarking framework. Experimental results demonstrate that our RoWSFormer surpasses existing state-of-the-art watermarking methods. For most non-geometric attacks, RoWSFormer improves the PSNR by 3 dB while maintaining the same extraction accuracy. In the case of geometric attacks (such as rotation, scaling, and affine transformations), RoWSFormer achieves over a 6 dB improvement in PSNR, with extraction accuracy exceeding 97\%.

What problem does this paper attempt to address?

### Problems the paper attempts to solve The paper aims to address two major key challenges in digital watermarking technology: **robustness** and **imperceptibility**. Specifically, in view of the poor performance of existing watermarking frameworks based on convolutional neural networks (CNN) when dealing with geometric attacks, the paper proposes a new framework based on Swin Transformer - RoWSFormer (Robust Watermarking Framework with Swin Transformer). The main objectives of the paper include: 1. **Improve robustness against geometric attacks**: Existing watermarking frameworks based on CNN have difficulty in effectively capturing global and long - distance relationships due to the limitations of convolution operations, resulting in poor performance when dealing with geometric attacks such as rotation, scaling and affine transformation. RoWSFormer significantly improves the adaptability to geometric attacks by introducing Swin Transformer and using the self - attention mechanism to capture global and long - distance information. 2. **Enhance frequency - domain feature extraction**: To further improve the robustness of the watermarking framework, RoWSFormer designs a Frequency - Enhanced Transformer Block (FETB) for extracting frequency - domain information. This helps to improve the robustness of the watermark under various attacks. 3. **Maintain imperceptibility**: While improving robustness, RoWSFormer also focuses on maintaining the imperceptibility of the watermark, ensuring that the watermarked image is as visually close as possible to the original image. ### Specific methods 1. **Locally - Channel Enhanced Swin Transformer Block (LCESTB)** : - **Swin Transformer Block**: Reduces computational cost through windowed self - attention mechanism and is suitable for image watermarking tasks. - **Locally - Channel Enhanced Block**: Combines convolutional layers and channel - attention mechanisms to extract local and channel features and enhance the ability to capture detailed channel information. 2. **Frequency - Enhanced Transformer Block (FETB)** : - **Transformer Block**: Uses multiple standard Transformer blocks to capture global features. - **Frequency - Enhanced Block**: Extracts frequency - domain features through discrete cosine transform (DCT) and calculates frequency - domain attention weights to further enhance the robustness of the watermark. 3. **Loss functions** : - **Image loss**: Uses mean - square error (MSE) to measure the difference between the watermarked image and the original image. - **Decoding loss**: Uses MSE to measure the difference between the extracted watermark and the original watermark. - **Constraint loss**: Ensures that the pixel values of the watermarked image are within the standard range ([0, 255]). ### Experimental results The experimental results show that RoWSFormer performs well under a variety of non - geometric and geometric attacks. In particular, in geometric attacks (such as rotation, scaling and affine transformation), the PSNR value of RoWSFormer is increased by more than 6 dB, and the extraction accuracy exceeds 97%. For most non - geometric attacks, the PSNR value of RoWSFormer is increased by 3 dB while maintaining the same extraction accuracy. ### Conclusion RoWSFormer significantly improves the robustness of digital watermarking technology when dealing with geometric attacks by introducing Swin Transformer and Frequency - Enhanced Transformer Block, while maintaining good imperceptibility. This framework has high practical value in practical applications.

RoWSFormer: A Robust Watermarking Framework with Swin Transformer for Enhanced Geometric Attack Resilience

REFIT: A UnifiedWatermark Removal Framework for Deep Learning Systems with Limited Data

A Robust Grayscale Watermarking Algorithm

A Geometric Distortion Immunized Deep Watermarking Framework with Robustness Generalizability

An Image Watermarking Algorithm with High Robustness

DT CWT and Schur Decomposition Based Robust Watermarking Algorithm to Geometric Attacks.

Joint Wavelet and Spatial Transformation for Digital Watermarking

Robust Blind Video Watermarking with Adaptive Embedding Mechanism

Blind Deep-Learning-Based Image Watermarking Robust Against Geometric Transformations

Robust Adaptive Video Watermarking In The Spatial Domain

A Robust Watermarking Method Based On Wavelet And Zernike Transform

A geometrically robust multi-bit video watermarking algorithm based on 2-D DFT

Flow-Based Robust Watermarking with Invertible Noise Layer for Black-Box Distortions.

Robust Watermark Imaging Via Graph-signal Optimization

Robust Image Watermarking Based on Feature Regions

Robust And High Capacity Watermarking For Image Based On Dwt-Svd And Cnn

Blind and Robust Watermarking Algorithm for Remote Sensing Images Resistant to Geometric Attacks

When Robust Reversible Watermarking Meets Cropping Attacks

Robust Reversible Watermarking Via Clustering and Enhanced Pixel-Wise Masking.

Real-time Attacks on Robust Watermarking Tools in the Wild by CNN.

Optimal Semi-Fragile Watermarking Based on Maximum Entropy Random Walk and Swin Transformer for Tamper Localization