Detecting GAN-generated Imagery using Color Cues

Scott McCloskey,Michael Albright
DOI: https://doi.org/10.48550/arXiv.1812.08247
2018-12-20
Abstract:Image forensics is an increasingly relevant problem, as it can potentially address online disinformation campaigns and mitigate problematic aspects of social media. Of particular interest, given its recent successes, is the detection of imagery produced by Generative Adversarial Networks (GANs), e.g. `deepfakes'. Leveraging large training sets and extensive computing resources, recent work has shown that GANs can be trained to generate synthetic imagery which is (in some ways) indistinguishable from real imagery. We analyze the structure of the generating network of a popular GAN implementation, and show that the network's treatment of color is markedly different from a real camera in two ways. We further show that these two cues can be used to distinguish GAN-generated imagery from camera imagery, demonstrating effective discrimination between GAN imagery and real camera images used to train the GAN.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to identify the differences between images generated by generative adversarial networks (GANs) and images taken by real cameras. As social media has become an important means of spreading news, false information activities on the Internet have attracted wide attention. Image forensics has become increasingly important in verifying the authenticity of these stories. However, modern data - driven methods make it easier to generate artificial images from scratch, which poses a challenge to image forensics. In particular, recent research has shown that by using large training sets and extensive computing resources, GANs can be trained to generate synthetic images that are indistinguishable from real images in some respects. To meet this challenge, the author analyzes the structure of the GAN generator network, with particular attention to the way it forms colors, and points out two significant differences between the way the generator processes colors and real cameras: 1. **Normalization of internal values in the generator**: The internal values in the generator are normalized to limit the output, which limits the frequency of saturated pixels. 2. **Conversion of multi - channel internal representation**: The multi - channel internal representation of the generator is compressed into three channels of red, green, and blue. Although this conversion method is similar to the color image formation model, the weights used are very different from the similar spectral sensitivities of cameras. Based on these two clues, the author proposes a detection method that can effectively distinguish between GAN - generated images and real - camera images used to train GANs. This method not only helps to identify images completely generated by GANs, but also can play a role in more complex scenarios, such as when GAN - generated faces are embedded into larger camera images.