ModeT: Learning Deformable Image Registration via Motion Decomposition Transformer

Haiqiao Wang,Dong Ni,Yi Wang
DOI: https://doi.org/10.1007/978-3-031-43999-5_70
2023-06-09
Abstract:The Transformer structures have been widely used in computer vision and have recently made an impact in the area of medical image registration. However, the use of Transformer in most registration networks is straightforward. These networks often merely use the attention mechanism to boost the feature learning as the segmentation networks do, but do not sufficiently design to be adapted for the registration task. In this paper, we propose a novel motion decomposition Transformer (ModeT) to explicitly model multiple motion modalities by fully exploiting the intrinsic capability of the Transformer structure for deformation estimation. The proposed ModeT naturally transforms the multi-head neighborhood attention relationship into the multi-coordinate relationship to model multiple motion modes. Then the competitive weighting module (CWM) fuses multiple deformation sub-fields to generate the resulting deformation field. Extensive experiments on two public brain magnetic resonance imaging (MRI) datasets show that our method outperforms current state-of-the-art registration networks and Transformers, demonstrating the potential of our ModeT for the challenging non-rigid deformation estimation problem. The benchmarks and our code are publicly available at <a class="link-external link-https" href="https://github.com/ZAX130/SmileCode" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper aims to solve the problem of deformation field estimation in non - rigid image registration, especially in the field of medical image registration. Specifically, the authors propose a new Motion - decomposition Transformer (ModeT) to explicitly model multiple motion patterns and improve deformation estimation by making full use of the inherent capabilities of the Transformer structure. The main contributions of the paper are as follows: 1. **Propose a new method**: Use the Transformer structure to naturally model the correspondence between images and convert it into a deformation field, thereby clearly separating the two tasks of feature extraction and deformation estimation, making the registration process more reasonable. 2. **Multi - head neighborhood attention mechanism**: ModeT uses a multi - head neighborhood attention mechanism to efficiently model multiple motion patterns and fuses multiple deformation sub - fields in a competitive manner through a competitive weighting module (CWM), improving the interpretability and consistency of the final deformation field. 3. **Pyramid structure**: Adopt a pyramid structure for feature extraction and deformation propagation, which helps to reduce the attention calculation range required for each layer, thereby reducing computational consumption. Experimental results show that this method outperforms the current state - of - the - art registration networks and Transformer models on two publicly available brain magnetic resonance imaging (MRI) datasets, demonstrating its potential in challenging non - rigid deformation estimation problems.