Unsupervised Image Prior via Prompt Learning and CLIP Semantic Guidance for Low-Light Image Enhancement

Igor Morawski,Kai He,Shusil Dangi,Winston H. Hsu
2024-05-19
Abstract:Currently, low-light conditions present a significant challenge for machine cognition. In this paper, rather than optimizing models by assuming that human and machine cognition are correlated, we use zero-reference low-light enhancement to improve the performance of downstream task models. We propose to improve the zero-reference low-light enhancement method by leveraging the rich visual-linguistic CLIP prior without any need for paired or unpaired normal-light data, which is laborious and difficult to collect. We propose a simple but effective strategy to learn prompts that help guide the enhancement method and experimentally show that the prompts learned without any need for normal-light data improve image contrast, reduce over-enhancement, and reduce noise over-amplification. Next, we propose to reuse the CLIP model for semantic guidance via zero-shot open vocabulary classification to optimize low-light enhancement for task-based performance rather than human visual perception. We conduct extensive experimental results showing that the proposed method leads to consistent improvements across various datasets regarding task-based performance and compare our method against state-of-the-art methods, showing favorable results across various low-light datasets.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
The paper aims to address the problem of image enhancement under low-light conditions, specifically focusing on zero-reference image enhancement without the need for paired or unpaired normal light data. Specifically, the authors propose a method that leverages the CLIP model (a pre-trained vision-language model) through prompt learning to improve low-light image enhancement. This method not only enhances image contrast and reduces information loss caused by over-enhancement but also minimizes the excessive amplification of noise. Additionally, the authors utilize the CLIP model for semantic guidance, optimizing low-light image enhancement through zero-shot open vocabulary classification. Experimental results demonstrate that this method shows significant performance improvement across a range of tasks on various low-light datasets. Unlike traditional methods that rely on human visual perception, this study focuses on machine cognition-oriented low-light image enhancement, aiming to improve the performance of subsequent task models rather than merely the quality of human visual perception.