Omni-IML: Towards Unified Image Manipulation Localization

Chenfan Qu,Yiwu Zhong,Fengjun Guo,Lianwen Jin
2024-11-22
Abstract:Image manipulation can lead to misinterpretation of visual content, posing significant risks to information security. Image Manipulation Localization (IML) has thus received increasing attention. However, existing IML methods rely heavily on task-specific designs, making them perform well only on one target image type but are mostly random guessing on other image types, and even joint training on multiple image types causes significant performance degradation. This hinders the deployment for real applications as it notably increases maintenance costs and the misclassification of image types leads to serious error accumulation. To this end, we propose Omni-IML, the first generalist model to unify diverse IML tasks. Specifically, Omni-IML achieves generalism by adopting the Modal Gate Encoder and the Dynamic Weight Decoder to adaptively determine the optimal encoding modality and the optimal decoder filters for each sample. We additionally propose an Anomaly Enhancement module that enhances the features of tampered regions with box supervision and helps the generalist model to extract common features across different IML tasks. We validate our approach on IML tasks across three major scenarios: natural images, document images, and face images. Without bells and whistles, our Omni-IML achieves state-of-the-art performance on all three tasks with a single unified model, providing valuable strategies and insights for real-world application and future research in generalist image forensics. Our code will be publicly available.
Computer Vision and Pattern Recognition,Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
This paper attempts to solve an important problem in the field of Image Manipulation Localization (IML): existing IML methods perform poorly on different types of images, and their performance drops significantly when jointly training multiple image types. Specifically, current IML models are usually designed for specific types of images (such as natural - style images, document images, and face images). These models are almost equivalent to random guessing when processing other types of images for which they are not designed, resulting in serious error accumulation and high maintenance costs. To address this challenge, the authors propose Omni - IML, a general - purpose model aimed at uniformly handling multiple IML tasks. Omni - IML achieves this goal through the following innovative modules: 1. **Modal Gate Encoder**: Automatically selects the best encoding modality (frequency + visual or pure visual) for each input sample to adapt to the characteristics of different types of images. 2. **Anomaly Enhancement**: Enhances the features of the forged area by introducing bounding - box supervision, helping the model extract common features across different IML tasks. 3. **Dynamic Weight Decoder**: Adaptively selects the best decoder filter for each sample, reducing conflicts in unified training. Through these designs, Omni - IML can achieve high - performance forgery localization simultaneously in three main scenarios: natural images, document images, and face images, without task - specific or benchmark - specific fine - tuning. Experimental results show that Omni - IML achieves state - of - the - art performance in all three tasks, significantly outperforming previous specialized methods for individual tasks. ### Formula Summary Some of the formulas involved in the paper are as follows: - Loss function of the modal gate encoder: \[ L_{MG} = CE(P_{rgb}, L_m)+CE(P_{fused}, L_m)+CE(P_{cls}, L_c) \] where \(L_c\) is defined as: \[ L_c=\begin{cases} 1 & \text{if } IoU(P_{rgb}, L_m)>IoU(P_{fused}, L_m)+ 0.1\\ 0 & \text{otherwise} \end{cases} \] - Loss function of the dynamic weight decoder: \[ L_{DWD}=CE(P_{DWD}, L_m)+CE(P_{co}, L_m) \] These formulas ensure the model's optimization and generalization ability on different tasks.