Building Universal Foundation Models for Medical Image Analysis with Spatially Adaptive Networks

Lingxiao Luo,Xuanzhong Chen,Bingda Tang,Xinsheng Chen,Rong Han,Chengpeng Hu,Yujiang Li,Ting Chen
DOI: https://doi.org/10.48550/arxiv.2312.07630
2023-01-01
Abstract:Recent advancements in foundation models, typically trained withself-supervised learning on large-scale and diverse datasets, have shown greatpotential in medical image analysis. However, due to the significant spatialheterogeneity of medical imaging data, current models must tailor specificstructures for different datasets, making it challenging to leverage theabundant unlabeled data. In this work, we propose a universal foundation modelfor medical image analysis that processes images with heterogeneous spatialproperties using a unified structure. To accomplish this, we propose spatiallyadaptive networks (SPAD-Nets), a family of networks that dynamically adjust thestructures to adapt to the spatial properties of input images, to build such auniversal foundation model. We pre-train a spatial adaptive visual tokenizer(SPAD-VT) and then a spatial adaptive Vision Transformer (SPAD-ViT) via maskedimage modeling (MIM) on 55 public medical image datasets. The pre-training datacomprises over 9 million image slices, representing the largest, mostcomprehensive, and most diverse dataset to our knowledge for pre-traininguniversal foundation models for medical image analysis. The experimentalresults on downstream medical image classification and segmentation tasksdemonstrate the superior performance and label efficiency of our model. Ourcode is available at https://github.com/function2-llx/PUMIT.
What problem does this paper attempt to address?