Rawformer: Unpaired Raw-to-Raw Translation for Learnable Camera ISPs

Georgy Perevozchikov,Nancy Mehta,Mahmoud Afifi,Radu Timofte
2024-07-15
Abstract:Modern smartphone camera quality heavily relies on the image signal processor (ISP) to enhance captured raw images, utilizing carefully designed modules to produce final output images encoded in a standard color space (e.g., sRGB). Neural-based end-to-end learnable ISPs offer promising advancements, potentially replacing traditional ISPs with their ability to adapt without requiring extensive tuning for each new camera model, as is often the case for nearly every module in traditional ISPs. However, the key challenge with the recent learning-based ISPs is the urge to collect large paired datasets for each distinct camera model due to the influence of intrinsic camera characteristics on the formation of input raw images. This paper tackles this challenge by introducing a novel method for unpaired learning of raw-to-raw translation across diverse cameras. Specifically, we propose Rawformer, an unsupervised Transformer-based encoder-decoder method for raw-to-raw translation. It accurately maps raw images captured by a certain camera to the target camera, facilitating the generalization of learnable ISPs to new unseen cameras. Our method demonstrates superior performance on real camera datasets, achieving higher accuracy compared to previous state-of-the-art techniques, and preserving a more robust correlation between the original and translated raw images. The codes and the pretrained models are available at <a class="link-external link-https" href="https://github.com/gosha20777/rawformer" rel="external noopener nofollow">this https URL</a>.
Image and Video Processing,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the limitations of modern smartphone cameras in image signal processing (ISP), especially when new camera models are introduced, it is difficult for the existing neural - network - based ISPs to adapt to the characteristics of new cameras. Specifically: 1. **Existing Challenges**: - Traditional ISP modules need extensive adjustment and optimization for each new camera model to achieve the required image quality. - Existing learning - based ISPs rely on a large number of paired raw datasets (raw - sRGB), which are very difficult and time - consuming to collect for each new camera model. - Different characteristics of new cameras (such as sensor sensitivity) will affect the formation of raw images, resulting in poor performance of pre - trained neural ISPs when processing raw images captured by new cameras. 2. **Paper Objectives**: - Propose a method without paired datasets that can perform unsupervised translation of raw images between different cameras (raw - to - raw translation), so that pre - trained neural ISPs can process raw images from unseen new cameras without retraining or fine - tuning. 3. **Solutions**: - The paper proposes an unsupervised Transformer - based encoder - decoder method named Rawformer for raw - image translation across different cameras. By efficiently encoding global and semantic correlations, Rawformer can accurately map the raw image captured by one camera to the space of another camera without using paired datasets. - Rawformer solves the problems existing in existing methods and achieves better generalization performance by introducing components such as Context - Sensing - Aware Scale Down - sampler (CSAD) and Up - sampler (CSAU) blocks, Condensed Query Attention (CQA) blocks, and cross - domain attention - guided discriminators. 4. **Contributions**: - Proposed Rawformer, a fully unsupervised Transformer - based method for conversion between raw images, enabling neural ISPs to generalize better across cameras. - Introduced context - sensing - aware scale down - sampler and up - sampler blocks, which can effectively summarize local and global context details. - Designed a novel cross - domain attention - driven discriminator and its dedicated head to stabilize network training. Through these innovations, the paper aims to solve the limitations of existing learning - based ISPs in processing raw images of new cameras and provides a more general and efficient solution.