Unsupervised HDR Image and Video Tone Mapping via Contrastive Learning

Cong Cao,Huanjing Yue,Xin Liu,Jingyu Yang
2023-06-26
Abstract:Capturing high dynamic range (HDR) images (videos) is attractive because it can reveal the details in both dark and bright regions. Since the mainstream screens only support low dynamic range (LDR) content, tone mapping algorithm is required to compress the dynamic range of HDR images (videos). Although image tone mapping has been widely explored, video tone mapping is lagging behind, especially for the deep-learning-based methods, due to the lack of HDR-LDR video pairs. In this work, we propose a unified framework (IVTMNet) for unsupervised image and video tone mapping. To improve unsupervised training, we propose domain and instance based contrastive learning loss. Instead of using a universal feature extractor, such as VGG to extract the features for similarity measurement, we propose a novel latent code, which is an aggregation of the brightness and contrast of extracted features, to measure the similarity of different pairs. We totally construct two negative pairs and three positive pairs to constrain the latent codes of tone mapped results. For the network structure, we propose a spatial-feature-enhanced (SFE) module to enable information exchange and transformation of nonlocal regions. For video tone mapping, we propose a temporal-feature-replaced (TFR) module to efficiently utilize the temporal correlation and improve the temporal consistency of video tone-mapped results. We construct a large-scale unpaired HDR-LDR video dataset to facilitate the unsupervised training process for video tone mapping. Experimental results demonstrate that our method outperforms state-of-the-art image and video tone mapping methods. Our code and dataset are available at <a class="link-external link-https" href="https://github.com/cao-cong/UnCLTMO" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
The main problems that this paper attempts to solve include: 1. **Tone mapping of high - dynamic - range (HDR) images and videos**: Since most display devices only support low - dynamic - range (LDR) content, tone mapping algorithms are required to compress the dynamic range of HDR images or videos for display on LDR screens. Although image tone mapping has been widely studied, research on video tone mapping is relatively lagging behind, especially in deep - learning - based methods. Due to the lack of HDR - LDR video pairs, this problem is more prominent. 2. **Challenges in unsupervised learning**: Traditional supervised learning methods rely on paired HDR - LDR data for training, but it is very difficult to obtain these paired data in practical applications. Therefore, how to effectively train a model without paired supervised data is an important issue. 3. **Temporal consistency problem in video tone mapping**: Unlike static images, video tone mapping needs to maintain temporal consistency, that is, the changes between adjacent frames should be as smooth as possible to avoid flickering and other temporal artifacts. Existing methods often face challenges in achieving this and find it difficult to maintain both temporal and spatial consistency simultaneously. To solve these problems, the authors propose a unified framework (IVTMNet) for unsupervised image and video tone mapping. Specifically, the main contributions of the paper include: - **Network structure**: The spatial feature enhancement (SFE) module and the temporal feature replacement (TFR) module are proposed for image and video tone mapping respectively. The SFE module enhances global features through graph convolution, while the TFR module utilizes temporal correlations to improve the temporal consistency of video results. - **Contrastive learning loss**: Domain - and instance - level contrastive learning losses are introduced to improve the effect of unsupervised training. By constructing positive and negative sample pairs, it is ensured that the generated results are close to high - quality LDR images and far from low - quality LDR images or input HDR images. - **Naturalness loss**: A naturalness loss is proposed to constrain the brightness and contrast of the output image, making it closer to natural images. - **Large - scale unpaired HDR - LDR video dataset**: A large - scale dataset containing real and synthetic HDR - LDR videos is constructed to promote the unsupervised training process. Through these innovations, the paper demonstrates the superior performance of its method on image and video tone mapping tasks and provides new ideas and tools for future research.