USFM: A Universal Ultrasound Foundation Model Generalized to Tasks and Organs towards Label Efficient Image Analysis

Jing Jiao,Jin Zhou,Xiaokang Li,Menghua Xia,Yi Huang,Lihong Huang,Na Wang,Xiaofan Zhang,Shichong Zhou,Yuanyuan Wang,Yi Guo
2024-01-02
Abstract:Inadequate generality across different organs and tasks constrains the application of ultrasound (US) image analysis methods in smart healthcare. Building a universal US foundation model holds the potential to address these issues. Nevertheless, the development of such foundational models encounters intrinsic challenges in US analysis, i.e., insufficient databases, low quality, and ineffective features. In this paper, we present a universal US foundation model, named USFM, generalized to diverse tasks and organs towards label efficient US image analysis. First, a large-scale Multi-organ, Multi-center, and Multi-device US database was built, comprehensively containing over two million US images. Organ-balanced sampling was employed for unbiased learning. Then, USFM is self-supervised pre-trained on the sufficient US database. To extract the effective features from low-quality US images, we proposed a spatial-frequency dual masked image modeling method. A productive spatial noise addition-recovery approach was designed to learn meaningful US information robustly, while a novel frequency band-stop masking learning approach was also employed to extract complex, implicit grayscale distribution and textural variations. Extensive experiments were conducted on the various tasks of segmentation, classification, and image enhancement from diverse organs and diseases. Comparisons with representative US image analysis models illustrate the universality and effectiveness of USFM. The label efficiency experiments suggest the USFM obtains robust performance with only 20% annotation, laying the groundwork for the rapid development of US models in clinical practices.
Image and Video Processing
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the problem of insufficient application of ultrasonic image analysis methods in intelligent healthcare, especially the insufficient generalization ability between different organs and tasks. Specifically, current ultrasonic image analysis methods face the following challenges: 1. **Insufficient database**: There is a lack of sufficient and diverse ultrasonic image databases, which limits the generalization ability of the model. 2. **Low - quality images**: Ultrasonic images usually have problems such as low resolution, poor contrast and low signal - to - noise ratio, making it difficult to extract meaningful information. 3. **Ineffective feature extraction**: It is very difficult to extract effective and general features from low - quality ultrasonic images because these features are often implicit and complex. To address these challenges, the authors propose a general ultrasonic foundation model (USFM). This model is self - supervised pre - trained through a large - scale multi - organ, multi - center, and multi - device ultrasonic database, and adopts a spatial - frequency dual - mask image modeling method to extract effective ultrasonic features. USFM performs well in various downstream tasks (such as segmentation, classification, and image enhancement) and has high label efficiency, and can maintain good performance even with less labeled data. ### Main contributions 1. **Constructing the largest 3M - US database**: The authors constructed the currently largest multi - organ, multi - center, and multi - device ultrasonic image database (3M - US), which contains more than 2 million ultrasonic images from 12 different human organs. 2. **Organ - balanced sampling strategy**: To solve the organ imbalance problem in the database, an organ - balanced sampling strategy is adopted to ensure that the model does not over - learn the main organs and ignore the minority organs. 3. **Spatial - frequency dual - mask image modeling**: A spatial - frequency dual - mask method based on MIM is proposed, which masks in the spatial domain and the frequency domain respectively to extract robust and effective ultrasonic features. 4. **Extensive experimental verification**: The generalization ability and superior performance of USFM on various downstream tasks and different organs are verified through a large number of experiments, especially excellent in terms of label efficiency. ### Experimental setup To comprehensively verify the effectiveness and applicability of USFM, the authors conducted experiments from the following aspects: 1. **USFM pre - training visualization**: The masked images and reconstruction results of each organ in the 3M - US database, as well as the distribution of the USFM feature space are shown. 2. **Comparison of different downstream tasks and organs**: Downstream task experiments are carried out on multiple organ datasets, covering three common tasks in ultrasonic image analysis: segmentation, classification, and image enhancement. 3. **Label efficiency experiment**: It is verified that USFM can still maintain good performance with only a small amount of labeled data. 4. **Ablation study**: The effectiveness of the organ - balanced sampling and the spatial - frequency dual - mask method in constructing USFM is verified through ablation experiments. ### Conclusion This paper successfully solves several key problems in ultrasonic image analysis by constructing a large - scale 3M - US database and proposing a spatial - frequency dual - mask method, and shows the generalization ability and superior performance of USFM in various downstream tasks. These achievements lay the foundation for the rapid development of ultrasonic image analysis in intelligent healthcare.