Abstract:The heart is a relatively complex non-rigid motion organ in the human body. Quantitative motion analysis of the heart takes on a critical significance to help doctors with accurate diagnosis and treatment. Moreover, cardiovascular magnetic resonance imaging (CMRI) can be used to perform a more detailed quantitative analysis evaluation for cardiac diagnosis. Deformable image registration (DIR) has become a vital task in biomedical image analysis since tissue structures have variability in medical images. Recently, the model based on masked autoencoder (MAE) has recently been shown to be effective in computer vision tasks. Vision Transformer has the context aggregation ability to restore the semantic information in the original image regions by using a low proportion of visible image patches to predict the masked image patches. A novel Transformer-ConvNet architecture is proposed in this study based on MAE for medical image registration. The core of the Transformer is designed as a masked autoencoder (MAE) and a lightweight decoder structure, and feature extraction before the downstream registration task is transformed into the self-supervised learning task. This study also rethinks the calculation method of the multi-head self-attention mechanism in the Transformer encoder. We improve the query-key-value-based dot product attention by introducing both depthwise separable convolution (DWSC) and squeeze and excitation (SE) modules into the self-attention module to reduce the amount of parameter computation to highlight image details and maintain high spatial resolution image features. In addition, concurrent spatial and channel squeeze and excitation (scSE) module is embedded into the CNN structure, which also proves to be effective for extracting robust feature representations. The proposed method, called MAE-TransRNet, has better generalization. The proposed model is evaluated on the cardiac short-axis public dataset (with images and labels) at the 2017 Automated Cardiac Diagnosis Challenge (ACDC). The relevant qualitative and quantitative results (e.g., dice performance and Hausdorff distance) suggest that the proposed model can achieve superior results over those achieved by the state-of-the-art methods, thus proving that MAE and improved self-attention are more effective and promising for medical image registration tasks. Codes and models are available at https://github.com/XinXiao101/MAE-TransRNet .

Mask-aware transformer with structure invariant loss for CT translation

A Novel Method Of Synthetic Ct Generation From Mr Images Based On Convolutional Neural Networks

Contrast-Medium Anisotropy-Aware Tensor Total Variation Model for Robust Cerebral Perfusion CT Reconstruction with Low-Dose Scans

TransCT: Dual-path Transformer for Low Dose Computed Tomography

A dense and U-shaped transformer with dual-domain multi-loss function for sparse-view CT reconstruction

Vision Transformer with Progressive Tokenization for CT Metal Artifact Reduction

Multi-scale Tokens-Aware Transformer Network for Multi-region and Multi-sequence MR-to-CT Synthesis in A Single Model

Dual-domain sparse-view CT reconstruction with Transformers

Structure-Preserving Synthesis: MaskGAN for Unpaired MR-CT Translation

DDPTransformer: Dual-Domain With Parallel Transformer Network for Sparse View CT Image Reconstruction

Masked and Adaptive Transformer for Exemplar Based Image Translation

Masked Co-attentional Transformer reconstructs 100x ultra-fast/low-dose whole-body PET from longitudinal images and anatomically guided MRI

CyTran: A Cycle-Consistent Transformer with Multi-Level Consistency for Non-Contrast to Contrast CT Translation

CTformer: convolution-free Token2Token dilated vision transformer for low-dose CT denoising

Medical Multi-Modal Image Transformation with Modality Code Awareness

Soft Masked Mamba Diffusion Model for CT to MRI Conversion

MSCT-UNET: multi-scale contrastive transformer within U-shaped network for medical image segmentation

Enhancing CT Image synthesis from multi-modal MRI data based on a multi-task neural network framework

MCHNet: an Efficient Cross Attention-guided Hierarchical Multi-scale Network for Segmentation of Organs at Risk in CT Images

MAE-TransRNet: An improved transformer-ConvNet architecture with masked autoencoder for cardiac MRI registration

LoMAE: Simple Streamlined Low-level Masked Autoencoders for Robust, Generalized, and Interpretable Low-dose CT Denoising