Haoyu Chen,Wenbo Li,Jinjin Gu,Jingjing Ren,Sixiang Chen,Tian Ye,Renjing Pei,Kaiwen Zhou,Fenglong Song,Lei Zhu
Abstract:Natural images captured by mobile devices often suffer from multiple types of degradation, such as noise, blur, and low light. Traditional image restoration methods require manual selection of specific tasks, algorithms, and execution sequences, which is time-consuming and may yield suboptimal results. All-in-one models, though capable of handling multiple tasks, typically support only a limited range and often produce overly smooth, low-fidelity outcomes due to their broad data distribution fitting. To address these challenges, we first define a new pipeline for restoring images with multiple degradations, and then introduce RestoreAgent, an intelligent image restoration system leveraging multimodal large language models. RestoreAgent autonomously assesses the type and extent of degradation in input images and performs restoration through (1) determining the appropriate restoration tasks, (2) optimizing the task sequence, (3) selecting the most suitable models, and (4) executing the restoration. Experimental results demonstrate the superior performance of RestoreAgent in handling complex degradation, surpassing human experts. Furthermore, the system modular design facilitates the fast integration of new tasks and models, enhancing its flexibility and scalability for various applications.
Computer Vision and Pattern Recognition,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
### What problems does this paper attempt to solve?
This paper aims to solve the problem of high - quality restoration of natural images under various degradation conditions (such as noise, blurring, and low - light, etc.). Specifically, the paper points out the following limitations of traditional image restoration methods:
1. **Manual selection of tasks and models**: Traditional methods require manual selection of specific tasks, algorithms, and execution sequences, which is not only time - consuming but may also lead to sub - optimal results.
2. **Limitations of all - in - one models**: Although "all - in - one" models (all - in - one models) can handle multiple tasks, they usually support only a limited range of tasks, and due to their wide data distribution fitting, they often produce overly smooth, low - fidelity results.
To solve these problems, the paper proposes a new image restoration framework - **RestoreAgent**, an intelligent image restoration system based on Multimodal Large Language Models (MLLM). The main goals of RestoreAgent are:
- **Automatically evaluate the type and degree of degradation**: Automatically identify the type of degradation in the input image and its severity.
- **Optimize the task sequence**: Determine the optimal task execution sequence to improve the restoration effect.
- **Select the optimal model**: Dynamically select the most appropriate model from the available model library according to the specific degradation pattern.
- **Automatically execute the restoration process**: Once the restoration sequence and model selection are determined, RestoreAgent can independently execute the entire restoration process without human intervention.
Through these functions, RestoreAgent can more efficiently handle complex multi - degraded images, outperform human experts, and can quickly adapt to new tasks and models, enhancing the flexibility and scalability of the system.
### Formula summary
To describe the problem, the paper defines a set \( D=\{d_1, d_2,\ldots, d_n\} \) containing multiple degradation types, where each \( d_i \) represents a specific type of image degradation (such as noise, JPEG artifacts, blurring, raindrop marks, fog, and low - light conditions). For each degradation type \( d_i \), there is a dedicated model library \( M_{d_i} \), containing multiple models \( \{M_{d_i}^1, M_{d_i}^2,\ldots\} \), and each model \( M_{d_i}^j \) is trained specifically to mitigate the degradation of type \( d_i \).
The formal definition of the problem is as follows:
- **Input**: A degraded image \( I \) affected by multiple degradation types \( D \), and a model library \( \{M_{d_1}, M_{d_2},\ldots, M_{d_n}\} \) for handling \( D \), and a user - provided scoring function \( S \) for evaluating the image restoration process.
- **Target**: Find the optimal model execution sequence \( \sigma=(M_{a_1}^{b_1}, M_{a_2}^{b_2},\ldots, M_{a_m}^{b_m}) \) such that the restoration quality \( S \) of the degraded image \( I \) is maximized, that is:
\[
\sigma^*=\arg\max_{\sigma\in S(D, M)} S(I, \sigma)
\]
where \( S(D, M) \) represents the set of all possible sequences of degradation types and model pairs.
By solving this problem, the researchers hope to find the optimal combination of restoration sequences and model selections, thereby improving the quality of images affected by multiple degradations and providing more effective and efficient solutions for complex image restoration.