LBRT: Local-Information-Refined Transformer for Image Copy–Move Forgery Detection

Peng Liang,Ziyuan Li,Hang Tu,Huimin Zhao
DOI: https://doi.org/10.3390/s24134143
IF: 3.9
2024-06-27
Sensors
Abstract:The current deep learning methods for copy–move forgery detection (CMFD) are mostly based on deep convolutional neural networks, which frequently discard a large amount of detail information throughout convolutional feature extraction and have poor long-range information extraction capabilities. The Transformer structure is adept at modeling global context information, but the patch-wise self-attention calculation still neglects the extraction of details in local regions that have been tampered with. A local-information-refined dual-branch network, LBRT (Local Branch Refinement Transformer), is designed in this study. It performs Transformer encoding on the global patches segmented from the image and local patches re-segmented from the global patches using a global modeling branch and a local refinement branch, respectively. The self-attention features from both branches are precisely fused, and the fused feature map is then up-sampled and decoded. Therefore, LBRT considers both global semantic information modeling and local detail information refinement. The experimental results show that LBRT outperforms several state-of-the-art CMFD methods on the USCISI dataset, CASIA CMFD dataset, and DEFACTO CMFD dataset.
engineering, electrical & electronic,chemistry, analytical,instruments & instrumentation
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the key challenges in image copy - move forgery detection (CMFD). Specifically, the existing deep - learning methods are mainly based on deep convolutional neural networks (DCNN). These methods often discard a large amount of detailed information when performing convolutional feature extraction and perform poorly in extracting long - distance information. Although the Transformer structure is good at modeling global context information, it still ignores the extraction of local - area tampering details in the CMFD task. To solve these problems, the authors propose a new two - branch Transformer structure - LBRT (Local - Information - Refined Transformer). LBRT improves the existing methods in the following ways: 1. **Global context modeling branch**: Use the self - attention mechanism of the Transformer encoder to capture the long - distance context information between global patches of the image, thereby identifying the tampered copy - move areas. 2. **Local refinement branch**: Further divide the global patches into smaller local patches, and enhance the extraction of local information by introducing the Intra - Patch Re - Dividing Layer (IPRL) layer to optimize the detection of the edges of the forged areas. 3. **Feature fusion module**: Precisely fuse the features extracted by the global and local branches, optimize the combination of global and local features, and finally generate the predicted tampered - area mask. Through the above design, LBRT not only improves the ability to extract global information, but also enhances the capture of local detailed information, thereby improving the accuracy of the CMFD task. ### Main contributions of the paper 1. Propose a new Transformer - based CMFD network, which uses a two - branch structure to simultaneously extract long - distance global information and fine - grained local information, solves the shortcomings of previous DCNN methods in extracting global information, and enhances the ability of the Transformer encoding structure to extract local information. 2. Improve the Transformer encoding in the local branch. By introducing the IPRL layer, it is ensured that the local features of each image block can be properly extracted, and this can be achieved with only a small increase in parameters without the need for repeated large - scale pre - training. 3. Extensive experiments on the USCISI, CASIA CMFD, and DEFACTO CMFD datasets show that the proposed network outperforms advanced CMFD methods, including traditional DCNN models and DCNN models with additional attention mechanisms. ### Formula summary - Image pre - processing formula: \[ Z_l = X_p+\text{Pos} \] where \(X_p\) is the image - block embedding and \(\text{Pos}\) is the position embedding. - Global multi - head self - attention calculation formula: \[ A = \text{Softmax}\left(\frac{QK^T}{\sqrt{D_h}}\right) \] \[ GSA(Z_l)=AV \] \[ GMSA(Z_l)=\text{Concat}(GSA_1(Z_l), GSA_2(Z_l),\dots, GSA_{n_1}(Z_l)) \] - Local multi - head self - attention calculation formula: \[ LSA(z^l_m)=\text{Softmax}\left(\frac{QK^T}{\sqrt{D_h}}\right)V \] \[ LMSA(z^l_m)=\text{Concat}(LSA_1(z^l_m), LSA_2(z^l_m),\dots, LSA_{n_2}(z^l_m)) \] Through these improvements, LBRT performs excellently in the CMFD task and significantly improves the detection and location accuracy of the forged areas.