Abstract:Visual-language models have advanced the development of universal models, yet their application in medical imaging remains constrained by specific functional requirements and the limited data. Current general-purpose models are typically designed with task-specific branches and heads, which restricts the shared feature space and the flexibility of model. To address these challenges, we have developed a decomposed-composed universal medical imaging paradigm (UniMed) that supports tasks at all levels. To this end, we first propose a decomposed decoder that can predict two types of outputs -- pixel and semantic, based on a defined input queue. Additionally, we introduce a composed decoder that unifies the input and output spaces and standardizes task annotations across different levels into a discrete token format. The coupled design of these two components enables the model to flexibly combine tasks and mutual benefits. Moreover, our joint representation learning strategy skilfully leverages large amounts of unlabeled data and unsupervised loss, achieving efficient one-stage pretraining for more robust performance. Experimental results show that UniMed achieves state-of-the-art performance on eight datasets across all three tasks and exhibits strong zero-shot and 100-shot transferability. We will release the code and trained models upon the paper's acceptance.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve several key challenges currently faced by general - purpose medical image analysis models: 1. **Task Diversity and Flexibility**: - Existing general - purpose models are usually designed with specific task branches and heads, which limit the flexibility of the shared feature space and make it difficult to handle semantic understanding and visual tasks simultaneously (such as locating lesions and identifying their types). - A model that can switch between different tasks seamlessly is required, allowing users to customize functions according to specific scenarios (for example, switching between detection and segmentation tasks when it comes to lesion screening or resection procedures). 2. **Data Volume and Annotation Diversity**: - The amount of medical image data is relatively limited, and the annotation methods are diverse. Different tasks require different levels of annotation (for example, classification tasks require image - level annotation, segmentation tasks require pixel - level annotation, and referring segmentation tasks combine text and pixel - level annotation). - Existing methods usually handle multi - task learning by adding extra branches or heads, which increases the model complexity and the difficulty of task balancing. 3. **Cross - task Collaboration and Knowledge Sharing**: - The annotation content varies greatly among different datasets, making it difficult to directly integrate and use these datasets. Mainstream methods split datasets with different annotations into multiple subsets for training, which significantly increases the computational complexity and limits the knowledge sharing between different annotations. 4. **Transferability**: - The model needs to have strong transferability to ensure that it can still provide high - quality predictions when faced with new data. To solve these problems, the authors propose a new general - purpose medical image analysis model - **UniMed**, which has the following features: - **Decomposition - Composition Decoder**: A decomposition decoder is introduced, which can predict two types of outputs (pixels and semantics) based on the defined input queue, and a combination decoder, which unifies the input and output spaces and standardizes the annotations of tasks at different levels into a discrete token format. - **Joint Representation Learning Strategy**: Utilize a large amount of unlabeled data and unsupervised loss to achieve efficient one - stage pre - training and improve the robust performance of the model. - **Cross - task Collaboration**: Through the design of the decomposition and combination decoders, the model can flexibly combine tasks and achieve mutual collaboration, supporting various task interactions. Experimental results show that UniMed achieves state - of - the - art performance on eight datasets and demonstrates strong zero - shot and few - shot transferability.

Universal Medical Image Representation Learning with Compositional Decoders

Unified Medical Image Pre-training in Language-Guided Common Semantic Space

UniSeg: A Prompt-driven Universal Segmentation Model as well as A Strong Representation Learner

Universal Model for 3D Medical Image Analysis

Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE

DIEM: Decomposition-Integration Enhancing Multimodal Insights

Universal Multimodal Representation for Language Understanding

Med-UniC: Unifying Cross-Lingual Medical Vision-Language Pre-Training by Diminishing Bias

Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity

MedUniSeg: 2D and 3D Medical Image Segmentation via a Prompt-driven Universal Model

Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training

Cascade Decoder: A Universal Decoding Method for Biomedical Image Segmentation

UniDCP: Unifying Multiple Medical Vision-language Tasks via Dynamic Cross-modal Learnable Prompts

DeViDe: Faceted medical knowledge for improved medical vision-language pre-training

Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks

Uni-Mlip: Unified Self-supervision for Medical Vision Language Pre-training

UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering

Unified semantic model for medical image segmentation

Large Language Model as a Universal Clinical Multi-task Decoder

Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model

Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training