Abstract:Image restoration aims to reconstruct the latent clear images from their degraded versions. Despite the notable achievement, existing methods predominantly focus on handling specific degradation types and thus require specialized models, impeding real-world applications in dynamic degradation scenarios. To address this issue, we propose Large Model Driven Image Restoration framework (LMDIR), a novel multiple-in-one image restoration paradigm that leverages the generic priors from large multi-modal language models (MMLMs) and the pretrained diffusion models. In detail, LMDIR integrates three key prior knowledges: 1) global degradation knowledge from MMLMs, 2) scene-aware contextual descriptions generated by MMLMs, and 3) fine-grained high-quality reference images synthesized by diffusion models guided by MMLM descriptions. Standing on above priors, our architecture comprises a query-based prompt encoder, degradation-aware transformer block injecting global degradation knowledge, content-aware transformer block incorporating scene description, and reference-based transformer block incorporating fine-grained image priors. This design facilitates single-stage training paradigm to address various degradations while supporting both automatic and user-guided restoration. Extensive experiments demonstrate that our designed method outperforms state-of-the-art competitors on multiple evaluation benchmarks.

What problem does this paper attempt to address?

The problem this paper attempts to address is that existing methods in the field of image restoration mostly can only handle specific types of degradation (such as rain streaks, low light, noise, etc.), requiring specialized models to be trained for each type of degradation. This is very inconvenient in practical applications, especially in dynamic scenes where the types of degradation are variable. To solve this problem, the authors propose a new framework called LMDIR (Large Model Driven Image Restoration Framework), which leverages the general prior knowledge of large-scale multimodal language models (MMLMs) and pre-trained diffusion models to achieve an "all-in-one" image restoration model capable of handling multiple types of degradation. Specifically, LMDIR enhances image restoration performance through the following three key prior knowledge: 1. **Global Degradation Knowledge**: Global degradation information extracted from multimodal language models. 2. **Scene-Aware Context Description**: Scene descriptions generated by multimodal language models. 3. **Fine-Grained High-Quality Reference Images**: Reference images synthesized by diffusion models based on descriptions generated by multimodal language models. The design of LMDIR includes four main components: - **Query-Based Prompt Encoder**: Optimizes the text information extracted from multimodal language models by combining low-level features of the image. - **Degradation-Aware Transformer Block**: Injects global degradation knowledge to enhance the model's ability to handle different types of degradation. - **Content-Aware Transformer Block**: Utilizes scene-aware content descriptions to improve the model's restoration performance. - **Reference-Aware Transformer Block**: Combines fine-grained image priors extracted from synthesized reference images to further enhance restoration quality. Experimental results show that LMDIR outperforms existing state-of-the-art all-in-one image restoration methods on multiple evaluation benchmarks.

Training-Free Large Model Priors for Multiple-in-One Image Restoration

Multi-modal Degradation Feature Learning for Unified Image Restoration Based on Contrastive Learning

Boosting Image Restoration via Priors from Pre-trained Models

FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration

LLMRA: Multi-modal Large Language Model based Restoration Assistant

Mixed Degradation Image Restoration via Local Dynamic Optimization and Conditional Embedding

ReFIR: Grounding Large Restoration Models with Retrieval Augmentation

MRIR: Integrating Multimodal Insights for Diffusion-based Realistic Image Restoration

DRM-IR: Task-Adaptive Deep Unfolding Network for All-In-One Image Restoration

Diff-Restorer: Unleashing Visual Prompts for Diffusion-based Universal Image Restoration

Perceptual Image Restoration with High-Quality Priori and Degradation Learning

Towards Unsupervised Blind Face Restoration using Diffusion Prior

Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models

Priors in Deep Image Restoration and Enhancement: A Survey

All-in-one Multi-degradation Image Restoration Network via Hierarchical Degradation Representation

Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration

Parameter Efficient Adaptation for Image Restoration with Heterogeneous Mixture-of-Experts

DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution

Universal Image Restoration with Text Prompt Diffusion

Dual Prior Learning for Blind and Blended Image Restoration

Adaptive Blind All-in-One Image Restoration