Retinex-Diffusion: On Controlling Illumination Conditions in Diffusion Models via Retinex Theory

Xiaoyan Xing,Vincent Tao Hu,Jan Hendrik Metzen,Konrad Groh,Sezer Karaoglu,Theo Gevers
2024-07-29
Abstract:This paper introduces a novel approach to illumination manipulation in diffusion models, addressing the gap in conditional image generation with a focus on lighting conditions. We conceptualize the diffusion model as a black-box image render and strategically decompose its energy function in alignment with the image formation model. Our method effectively separates and controls illumination-related properties during the generative process. It generates images with realistic illumination effects, including cast shadow, soft shadow, and inter-reflections. Remarkably, it achieves this without the necessity for learning intrinsic decomposition, finding directions in latent space, or undergoing additional training with new datasets.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the issue of imprecise lighting condition control in image generation. Specifically: 1. **Lighting Control Issue**: Current conditional generation models (such as diffusion models) cannot precisely control lighting conditions when generating images. Although these models can generate realistic images, they fall short in terms of lighting control. 2. **Trade-off Between Physical Rendering and Diversity**: While physically-based rendering pipelines (such as Blender) can achieve high-fidelity lighting effects, this approach is time-consuming and lacks diversity. To solve these problems, the authors propose a new method based on physical principles that does not require additional training for lighting control in generated and real images. This method treats the diffusion model as a black-box image renderer and redesigns its energy function based on the image formation model, thereby achieving effective separation and control of lighting-related attributes. This approach not only enables realistic lighting effects (such as shadows, soft light, and reflections) during the generation process but also does not require intrinsic decomposition or finding directions in the latent space, nor does it need additional training with new datasets. Furthermore, this method can be easily integrated into most pixel-level diffusion models to enhance their lighting control capabilities.