UniDense: Unleashing Diffusion Models with Meta-Routers for Universal Few-Shot Dense Prediction

Lintao Dong,Wei Zhai,Zheng-Jun Zha
DOI: https://doi.org/10.1145/3664647.3680831
2024-01-01
Abstract:Universal few-shot dense prediction requires a versatile model capable of learning any dense prediction task from limited labeled images, which necessitates the model to possess efficient adaptation abilities. Prevailing few-shot learning methods rely on efficient fine-tuning of model weights for few-shot adaptation, which carries the risk of disrupting the pre-trained knowledge and lacks the capability to extract task-specific knowledge contained in the pre-trained model. To overcome these limitations, our paper approaches universal few-shot dense prediction from a novel perspective. Unlike conventional fine-tuning techniques that use all model parameters and modify a specific set of weights for few-shot adaptation, our method focuses on selecting task-relevant computation pathways of the pre-trained model while keeping the model weights frozen. Building upon this idea, we introduce a novel framework UniDense for universal few-shot dense prediction. First, we construct a versatile MoE (Mixture of Experts) architecture for dense prediction based on the Stable Diffusion model. We then utilize episodes-based meta-learning to train a set of routers for this MoE model, called Meta-Routers, which act as hyper-networks responsible for selecting computation blocks relevant to each task. We demonstrate that fine-tuning these meta-routers enables efficient few-shot adaptation of the entire model. Moreover, for each few-shot task, we leverage support samples to extract a task embedding, which serves as a conditioning factor for meta-routers. This strategy allows meta-routers to dynamically adapt themselves for different few-shot task, leading to improved adaptation performance. Experiments on a challenging variant of Taskonomy dataset with 10 dense prediction tasks demonstrate the superiority of our approach.
What problem does this paper attempt to address?