Abstract:Digital image forensics plays a crucial role in image authentication and manipulation localization. Despite the progress powered by deep neural networks, existing forgery localization methodologies exhibit limitations when deployed to unseen datasets and perturbed images (i.e., lack of generalization and robustness to real-world applications). To circumvent these problems and aid image integrity, this paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts. The rationale is grounded on the observation that most image signal processors (ISP) involve the demosaicing process, which introduces pixel correlations in pristine images. Moreover, manipulating operations, including splicing, copy-move, and inpainting, directly affect such pixel regularity. We, therefore, first split the input image into several blocks and design masked self-attention mechanisms to model the global pixel dependency in input images. Simultaneously, we optimize another local pixel dependency stream to mine local manipulation clues within input forgery images. In addition, we design novel Learning-to-Weight Modules (LWM) to combine features from the two streams, thereby enhancing the final forgery localization performance. To improve the training process, we propose a novel Pixel-Inconsistency Data Augmentation (PIDA) strategy, driving the model to focus on capturing inherent pixel-level artifacts instead of mining semantic forgery traces. This work establishes a comprehensive benchmark integrating 15 representative detection models across 12 datasets. Extensive experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints and achieve state-of-the-art generalization and robustness performances in image manipulation localization.

What problem does this paper attempt to address?

This paper attempts to solve two major problems in the field of image forensics: **insufficient generalization ability** and **poor robustness**. Specifically, the existing forgery localization methods show limitations when applied to unseen datasets and perturbed images, and cannot be well generalized to real - world applications. To solve these problems, the author proposes an image manipulation localization method based on pixel - inconsistency modeling. ### 1. Research Background and Problems With the progress of digital image processing technology, image tampering (such as splicing, copy - move, inpainting, etc.) has become more and more complex and difficult to detect. These tampering operations will destroy the pixel regularity in the original image, especially the periodic pattern introduced in the demosaicing process. Therefore, how to effectively detect and localize these tampered areas has become an important research topic. ### 2. Main Contributions of the Paper In order to improve the generalization ability and robustness of image tampering localization, this paper proposes the following innovations: - **Two - stream Pixel - Dependence Modeling Framework**: Capture pixel - inconsistencies in the image by designing a local - pixel - dependence encoder and a global - pixel - dependence encoder. The local encoder uses Pixel - Difference Convolution (PDC) blocks to capture pixel - inconsistencies within local regions, while the global encoder models the global pixel - dependence relationships in the input image through the Masked Self - Attention mechanism. - **Learning - Weighted Module (LWM)**: In order to better fuse local and global features, the author introduces a learning - weighted module, which can dynamically adjust the importance of features according to the learned weights. - **Pixel - Inconsistency Data - Augmentation Strategy (PIDA)**: To improve the training process, the author proposes a new data - augmentation strategy, that is, generating forged samples only from real images. This strategy makes the model focus more on capturing pixel - level inconsistencies rather than semantic - level forgery traces. ### 3. Experimental Results The experimental results show that the proposed model exhibits excellent generalization ability and robustness on multiple datasets. By introducing pixel - inconsistency modeling, the model can extract forgery fingerprints more accurately and still maintain high performance under different types of image perturbations. ### 4. Formula Representation Some key formulas involved in the paper are as follows: - Formula of the Masked Self - Attention mechanism: \[ z_{i + 1}=\text{Mask}\left[\text{softmax}\left(\frac{f_{\text{query}}(z_i) f_{\text{key}}(z_i)^{\top}}{\sqrt{d}}\right)\right] f_{\text{value}}(z_i) \] - Formulas of Pixel - Difference Convolution (CPDC and RPDC): \[ f_C^l=\sum_{(x_i, x_c)\in\Omega} w_i(x_i - x_c) \] \[ f_R^l=\sum_{(x_i, x'_i)\in\Omega} w_i(x_i - x'_i) \] - Formula of the Learning - Weighted Module (LWM): \[ f_F = f_1\oplus f_2+A_1\odot f_1+A_2\odot f_2 \] Through these innovations, this paper provides a more general and powerful method for image tampering localization, which significantly improves the performance of existing techniques. --- Hope this summary can help you understand the core content and innovation points of this paper. If you have more questions or need further explanations, please feel free to ask!

Pixel-Inconsistency Modeling for Image Manipulation Localization

End-to-end Image Splicing Localization Based on Multi-Scale Features and Residual Refinement Module

Image Manipulation Localization Using Multi-Scale Feature Fusion and Adaptive Edge Supervision

Exploring Multi-view Pixel Contrast for General and Robust Image Forgery Localization

FP-Net: frequency-perception network with adversarial training for image manipulation localization

AGIL-SwinT: Attention-guided Inconsistency Learning for Face Forgery Detection

Learning Forgery Region-Aware and ID-Independent Features for Face Manipulation Detection

Object-level Copy-Move Forgery Image Detection based on Inconsistency Mining

Multi-spectral Class Center Network for Face Manipulation Detection and Localization

AdaIFL: Adaptive Image Forgery Localization Via a Dynamic and Importance-Aware Transformer Network

Image Forgery Localization via Guided Noise and Multi-Scale Feature Aggregation

Image Copy-Move Forgery Detection and Localization Scheme: How to Avoid Missed Detection and False Alarm

Hybrid LSTM and Encoder–Decoder Architecture for Detection of Image Forgeries

Image Manipulation Localization Using Spatial–Channel Fusion Excitation and Fine-Grained Feature Enhancement

CECL-Net: Contrastive Learning and Edge-Reconstruction-Driven Complementary Learning Network for Image Forgery Localization

Spotting the Difference: Context Retrieval and Analysis for Improved Forgery Detection and Localization

Noise-assisted Prompt Learning for Image Forgery Detection and Localization

Learning Discriminative Noise Guidance for Image Forgery Detection and Localization

DMFF-Net: Double-stream multilevel feature fusion network for image forgery localization

Fighting Fake News: Image Splice Detection via Learned Self-Consistency

Harmonizing Image Forgery Detection & Localization: Fusion of Complementary Approaches