SeLoRA: Self-Expanding Low-Rank Adaptation of Latent Diffusion Model for Medical Image Synthesis

Yuchen Mao,Hongwei Li,Wei Pang,Giorgos Papanastasiou,Guang Yang,Chengjia Wang
2024-08-14
Abstract:The persistent challenge of medical image synthesis posed by the scarcity of annotated data and the need to synthesize `missing modalities' for multi-modal analysis, underscored the imperative development of effective synthesis methods. Recently, the combination of Low-Rank Adaptation (LoRA) with latent diffusion models (LDMs) has emerged as a viable approach for efficiently adapting pre-trained large language models, in the medical field. However, the direct application of LoRA assumes uniform ranking across all linear layers, overlooking the significance of different weight matrices, and leading to sub-optimal outcomes. Prior works on LoRA prioritize the reduction of trainable parameters, and there exists an opportunity to further tailor this adaptation process to the intricate demands of medical image synthesis. In response, we present SeLoRA, a Self-Expanding Low-Rank Adaptation Module, that dynamically expands its ranking across layers during training, strategically placing additional ranks on crucial layers, to allow the model to elevate synthesis quality where it matters most. The proposed method not only enables LDMs to fine-tune on medical data efficiently but also empowers the model to achieve improved image quality with minimal ranking. The code of our SeLoRA method is publicly available on https://anonymous.4open.science/r/SeLoRA-980D .
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are two key challenges in medical image synthesis: 1. **Scarcity of labeled data**: Medical image datasets are usually small and the cost of labeling is high, resulting in insufficient data when training models. 2. **Synthesis requirements for "missing modalities" in multimodal analysis**: In multimodal medical image analysis, data of certain modalities may be missing, and effective synthesis methods are required to supplement these data. To solve these problems, the authors propose SeLoRA (Self - Expanding Low - Rank Adaptation), a self - expanding low - rank adaptation module. Specifically: - **Limitations of existing methods**: The traditional LoRA (Low - Rank Adaptation) method assumes that the ranks of all linear layers are uniform, which ignores the importance of different weight matrices and leads to sub - optimal results. In addition, LoRA mainly focuses on reducing the number of trainable parameters without fully considering the specific requirements of medical image synthesis. - **Advantages of SeLoRA**: SeLoRA dynamically expands the rank during the training process and flexibly adjusts the rank size according to the requirements of each layer. This method not only improves the quality of the synthesized image but also maintains the advantage of parameter efficiency. SeLoRA uses Fisher information to evaluate when to expand the rank and ensures that the expansion does not interfere with the model output. In summary, SeLoRA aims to optimize the quality of medical image synthesis by dynamically adjusting the rank while minimizing the number of trainable parameters, so as to more effectively meet the needs of scarce medical image data and multimodal synthesis. ### Formula display The formulas involved in the paper are as follows: 1. **Weight update formula for LoRA**: \[ W = W_0+AB \] where \( W\in\mathbb{R}^{d_{\text{in}}\times d_{\text{out}}} \) is the updated weight matrix, \( W_0\in\mathbb{R}^{d_{\text{in}}\times d_{\text{out}}} \) is the frozen original weight matrix, \( A\in\mathbb{R}^{d_{\text{in}}\times r} \) and \( B\in\mathbb{R}^{r\times d_{\text{out}}} \) are two low - rank decomposition matrices, and \( r \) is the rank. 2. **Extended form of SeLoRA**: \[ f(x)=xW_0 + x\begin{bmatrix}A&K\end{bmatrix}\begin{bmatrix}B\\0\end{bmatrix}+b_0 \] where \( K\in\mathbb{R}^{d_{\text{in}}\times1} \) is a vector initialized with Kaiming uniform. 3. **Fisher information estimation**: \[ \hat{I}_w=\frac{1}{|B|}\sum_{i = 1}^{|B|}\left(\frac{\partial L(b_i; w)}{\partial w}\right)^2 \] where \( |B| \) is the batch size and \( b_i \) is a sample in the batch. 4. **Fisher information score (FI - Score)**: \[ \text{FI - Score}=\sum_{i = 1}^{d_{\text{in}}}\sum_{j = 1}^{r}\hat{I}_{A_{i,j}}+\sum_{i = 1}^{r}\sum_{j = 1}^{d_{\text{out}}}\hat{I}_{B_{i,j}} \] 5. **Fisher information ratio (FI - Ratio)**: \[ \text{FI - Ra}