Abstract:The current deep learning methods for copy–move forgery detection (CMFD) are mostly based on deep convolutional neural networks, which frequently discard a large amount of detail information throughout convolutional feature extraction and have poor long-range information extraction capabilities. The Transformer structure is adept at modeling global context information, but the patch-wise self-attention calculation still neglects the extraction of details in local regions that have been tampered with. A local-information-refined dual-branch network, LBRT (Local Branch Refinement Transformer), is designed in this study. It performs Transformer encoding on the global patches segmented from the image and local patches re-segmented from the global patches using a global modeling branch and a local refinement branch, respectively. The self-attention features from both branches are precisely fused, and the fused feature map is then up-sampled and decoded. Therefore, LBRT considers both global semantic information modeling and local detail information refinement. The experimental results show that LBRT outperforms several state-of-the-art CMFD methods on the USCISI dataset, CASIA CMFD dataset, and DEFACTO CMFD dataset.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the key challenges in image copy - move forgery detection (CMFD). Specifically, the existing deep - learning methods are mainly based on deep convolutional neural networks (DCNN). These methods often discard a large amount of detailed information when performing convolutional feature extraction and perform poorly in extracting long - distance information. Although the Transformer structure is good at modeling global context information, it still ignores the extraction of local - area tampering details in the CMFD task. To solve these problems, the authors propose a new two - branch Transformer structure - LBRT (Local - Information - Refined Transformer). LBRT improves the existing methods in the following ways: 1. **Global context modeling branch**: Use the self - attention mechanism of the Transformer encoder to capture the long - distance context information between global patches of the image, thereby identifying the tampered copy - move areas. 2. **Local refinement branch**: Further divide the global patches into smaller local patches, and enhance the extraction of local information by introducing the Intra - Patch Re - Dividing Layer (IPRL) layer to optimize the detection of the edges of the forged areas. 3. **Feature fusion module**: Precisely fuse the features extracted by the global and local branches, optimize the combination of global and local features, and finally generate the predicted tampered - area mask. Through the above design, LBRT not only improves the ability to extract global information, but also enhances the capture of local detailed information, thereby improving the accuracy of the CMFD task. ### Main contributions of the paper 1. Propose a new Transformer - based CMFD network, which uses a two - branch structure to simultaneously extract long - distance global information and fine - grained local information, solves the shortcomings of previous DCNN methods in extracting global information, and enhances the ability of the Transformer encoding structure to extract local information. 2. Improve the Transformer encoding in the local branch. By introducing the IPRL layer, it is ensured that the local features of each image block can be properly extracted, and this can be achieved with only a small increase in parameters without the need for repeated large - scale pre - training. 3. Extensive experiments on the USCISI, CASIA CMFD, and DEFACTO CMFD datasets show that the proposed network outperforms advanced CMFD methods, including traditional DCNN models and DCNN models with additional attention mechanisms. ### Formula summary - Image pre - processing formula: \[ Z_l = X_p+\text{Pos} \] where \(X_p\) is the image - block embedding and \(\text{Pos}\) is the position embedding. - Global multi - head self - attention calculation formula: \[ A = \text{Softmax}\left(\frac{QK^T}{\sqrt{D_h}}\right) \] \[ GSA(Z_l)=AV \] \[ GMSA(Z_l)=\text{Concat}(GSA_1(Z_l), GSA_2(Z_l),\dots, GSA_{n_1}(Z_l)) \] - Local multi - head self - attention calculation formula: \[ LSA(z^l_m)=\text{Softmax}\left(\frac{QK^T}{\sqrt{D_h}}\right)V \] \[ LMSA(z^l_m)=\text{Concat}(LSA_1(z^l_m), LSA_2(z^l_m),\dots, LSA_{n_2}(z^l_m)) \] Through these improvements, LBRT performs excellently in the CMFD task and significantly improves the detection and location accuracy of the forged areas.

LBRT: Local-Information-Refined Transformer for Image Copy–Move Forgery Detection

CMFDFormer: Transformer-based Copy-Move Forgery Detection with Continual Learning

Lightweight and High-Precision Network for Image Copy-Move Forgery Detection

TBFormer: Two-Branch Transformer for Image Forgery Localization

Image Copy-Move Forgery Detection via Deep Cross-Scale PatchMatch

Coarse-to-fine spatial-channel-boundary attention network for image copy-move forgery detection

End-to-end Image Splicing Localization Based on Multi-Scale Features and Residual Refinement Module

MTFDN: An image copy‐move forgery detection method based on multi‐task learning

Strong robust copy-move forgery detection network based on layer-by-layer decoupling refinement

Cross-attention based two-branch networks for document image forgery localization in the Metaverse

Image copy-move forgery detection and localization based on super-BPD segmentation and DCNN

Image Copy-Move Forgery Detection via Deep PatchMatch and Pairwise Ranking Learning

LoFLAT: Local Feature Matching using Focused Linear Attention Transformer

Branch-Transformer: A Parallel Branch Architecture to Capture Local and Global Features for Language Identification

SPA-Net: A Deep Learning Approach Enhanced Using a Span-Partial Structure and Attention Mechanism for Image Copy-Move Forgery Detection

CMCF-Net: an End-to-End Context Multiscale Cross-Fusion Network for Robust Copy-Move Forgery Detection

Can Deep Network Balance Copy-Move Forgery Detection and Distinguishment?

Progressive Feedback-Enhanced Transformer for Image Forgery Localization

Dual branch convolutional neural network for copy move forgery detection

LGFCTR: Local and Global Feature Convolutional Transformer for Image Matching

Copy-move image forgery detection based on evolving circular domains coverage