Abstract:Modern smartphone camera quality heavily relies on the image signal processor (ISP) to enhance captured raw images, utilizing carefully designed modules to produce final output images encoded in a standard color space (e.g., sRGB). Neural-based end-to-end learnable ISPs offer promising advancements, potentially replacing traditional ISPs with their ability to adapt without requiring extensive tuning for each new camera model, as is often the case for nearly every module in traditional ISPs. However, the key challenge with the recent learning-based ISPs is the urge to collect large paired datasets for each distinct camera model due to the influence of intrinsic camera characteristics on the formation of input raw images. This paper tackles this challenge by introducing a novel method for unpaired learning of raw-to-raw translation across diverse cameras. Specifically, we propose Rawformer, an unsupervised Transformer-based encoder-decoder method for raw-to-raw translation. It accurately maps raw images captured by a certain camera to the target camera, facilitating the generalization of learnable ISPs to new unseen cameras. Our method demonstrates superior performance on real camera datasets, achieving higher accuracy compared to previous state-of-the-art techniques, and preserving a more robust correlation between the original and translated raw images. The codes and the pretrained models are available at <a class="link-external link-https" href="https://github.com/gosha20777/rawformer" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the limitations of modern smartphone cameras in image signal processing (ISP), especially when new camera models are introduced, it is difficult for the existing neural - network - based ISPs to adapt to the characteristics of new cameras. Specifically: 1. **Existing Challenges**: - Traditional ISP modules need extensive adjustment and optimization for each new camera model to achieve the required image quality. - Existing learning - based ISPs rely on a large number of paired raw datasets (raw - sRGB), which are very difficult and time - consuming to collect for each new camera model. - Different characteristics of new cameras (such as sensor sensitivity) will affect the formation of raw images, resulting in poor performance of pre - trained neural ISPs when processing raw images captured by new cameras. 2. **Paper Objectives**: - Propose a method without paired datasets that can perform unsupervised translation of raw images between different cameras (raw - to - raw translation), so that pre - trained neural ISPs can process raw images from unseen new cameras without retraining or fine - tuning. 3. **Solutions**: - The paper proposes an unsupervised Transformer - based encoder - decoder method named Rawformer for raw - image translation across different cameras. By efficiently encoding global and semantic correlations, Rawformer can accurately map the raw image captured by one camera to the space of another camera without using paired datasets. - Rawformer solves the problems existing in existing methods and achieves better generalization performance by introducing components such as Context - Sensing - Aware Scale Down - sampler (CSAD) and Up - sampler (CSAU) blocks, Condensed Query Attention (CQA) blocks, and cross - domain attention - guided discriminators. 4. **Contributions**: - Proposed Rawformer, a fully unsupervised Transformer - based method for conversion between raw images, enabling neural ISPs to generalize better across cameras. - Introduced context - sensing - aware scale down - sampler and up - sampler blocks, which can effectively summarize local and global context details. - Designed a novel cross - domain attention - driven discriminator and its dedicated head to stabilize network training. Through these innovations, the paper aims to solve the limitations of existing learning - based ISPs in processing raw images of new cameras and provides a more general and efficient solution.

Rawformer: Unpaired Raw-to-Raw Translation for Learnable Camera ISPs

Model-Based Image Signal Processors via Learnable Dictionaries

Uni-ISP: Unifying the Learning of ISPs from Multiple Cameras

ParamISP: Learned Forward and Inverse ISPs using Camera Parameters

RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images

Simple Image Signal Processing using Global Context Guidance

Efficient Visual Computing with Camera RAW Snapshots

RMFA-Net: A Neural ISP for Real RAW to RGB Image Reconstruction

RAW to tonemapped HDR camera ISP

ISP Distillation

Towards Low-Cost Learning-based Camera ISP via Unrolled Optimization

In-Camera Raw Compression: A New Paradigm from Image Acquisition to Display.

Self-Supervised Reversed Image Signal Processing via Reference-Guided Dynamic Parameter Selection

Day-to-Night Image Synthesis for Training Nighttime Neural ISPs

MetaISP -- Exploiting Global Scene Structure for Accurate Multi-Device Color Rendition

Metadata-Based RAW Reconstruction Via Implicit Neural Functions.

BSRAW: Improving Blind RAW Image Super-Resolution

Del-Net: A Single-Stage Network for Mobile Camera ISP

LDM-ISP: Enhancing Neural ISP for Low Light with Latent Diffusion Models

ISP-Agnostic Image Reconstruction for Under-Display Cameras

DeepISP: Toward Learning an End-to-End Image Processing Pipeline