Abstract:The flat lensless camera design reduces the camera size and weight significantly. In this design, the camera lens is replaced by another optical element that interferes with the incoming light. The image is recovered from the raw sensor measurements using a reconstruction algorithm. Yet, the quality of the reconstructed images is not satisfactory. To mitigate this, we propose utilizing a pre-trained diffusion model with a control network and a learned separable transformation for reconstruction. This allows us to build a prototype flat camera with high-quality imaging, presenting state-of-the-art results in both terms of quality and perceptuality. We demonstrate its ability to leverage also textual descriptions of the captured scene to further enhance reconstruction. Our reconstruction method which leverages the strong capabilities of a pre-trained diffusion model can be used in other imaging systems for improved reconstruction results.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: **How to reconstruct high - quality images from the measurement data of flat lensless cameras**. Specifically, traditional flat lensless cameras use diffractors or other optical elements instead of traditional lenses to significantly reduce the size and weight of the camera. However, this design results in unsatisfactory image reconstruction quality, making it difficult to obtain clear and accurate images. To solve this problem, the author proposes a new method named **DifuzCam**. This method utilizes pre - trained diffusion models, ControlNets, and learned separable transformations for image reconstruction. This method can not only improve the quality of image reconstruction, but also further enhance the reconstruction effect through text guidance. ### Main contributions 1. **Proposed a new computational photography method based on diffusion models**: for reconstructing high - quality images from the measurement data of flat lensless cameras. 2. **Achieved state - of - the - art reconstruction quality results on all evaluation metrics**. 3. **Introduced text - guidance technology**: to improve image reconstruction results by describing the text information of the captured scene. 4. **Proposed a deep control network with intermediate separable losses**: to improve convergence and reconstruction results. ### Method overview The workflow of DifuzCam is as follows: - **Input data**: The original sensor measurement data captured by the flat lensless camera. - **Separable transformation**: Convert the measurement data into a form suitable for processing by the diffusion model. - **Control network**: Generate images through the pre - trained diffusion model and adjust the generation process through the control network. - **Text - guidance (optional)**: Provide a text description of the captured scene to further optimize the reconstruction results. Through these innovations, DifuzCam not only improves the quality of image reconstruction, but also shows its potential application value in other imaging systems.

DifuzCam: Replacing Camera Lens with a Mask and a Diffusion Model

DoCam: Depth Sensing with an Optical Image Stabilization Supported RGB Camera.

Three Dimensional Reconstruction Using a Lenslet Light Field Camera

A Simple Framework for 3D Lensless Imaging with Programmable Masks

PhoCoLens: Photorealistic and Consistent Reconstruction in Lensless Imaging

FlatNet: Towards Photorealistic Scene Reconstruction from Lensless Measurements

Joint Image and Depth Estimation With Mask-Based Lensless Cameras

Learned reconstructions for practical mask-based lensless imaging

Lensless cameras using a mask based on almost perfect sequence through deep learning

Dnn-Fza Camera: A Deep Learning Approach Toward Broadband Fza Lensless Imaging

Dual-branch Fusion Model for Lensless Imaging

FlatCam: Thin, Bare-Sensor Cameras using Coded Aperture and Computation

DiffCalib: Reformulating Monocular Camera Calibration as Diffusion-Based Dense Incident Map Generation

Diffractive lensless imaging with optimized Voronoi-Fresnel phase

Temporal compressive edge imaging enabled by a lensless diffuser camera

Coded Illumination for 3D Lensless Imaging

Single-shot Lensless Imaging with Fresnel Zone Aperture and Incoherent Illumination.

Explicit-restriction Convolutional Framework for Lensless Imaging

CamI2V: Camera-Controlled Image-to-Video Diffusion Model

Computer modeling and simulation of light field camera and digital refocusing with attenuating mask

Seeing Through Obstructions with Diffractive Cloaking