Abstract:<p>Multimodal image registration is a vital initial step in several medical image applications for providing complementary information from different data modalities. Since images with different modalities do not exhibit the same characteristics, finding their accurate correspondences remains a challenge. For convolutional multimodal registration methods, two components are quite significant: descriptive image feature as well as the suited similarity metric. However, these two components are often custom-designed and are infeasible to the high diversity of tissue appearance across modalities. In this paper, we translate image registration into a decision-making problem, where registration is achieved via an artificial agent trained by asynchronous reinforcement learning. More specifically, convolutional long-short-term-memory is incorporated after stacked convolutional layers in this method to extract spatial-temporal image features and learn the similarity metric implicitly. A customized reward function driven by landmark error is advocated to guide the agent to the correct registration direction. A Monte Carlo rollout strategy is also leveraged to perform as a look-ahead inference in the testing stage, to increase registration accuracy further. Experiments on paired CT and MR images of patients diagnosed as nasopharyngeal carcinoma demonstrate that our method achieves state-of-the-art performance in medical image registration.</p>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is multimodal medical image registration. Specifically, the goal of the paper is to achieve end - to - end multimodal image registration through the reinforcement learning (RL) framework to overcome the limitations of traditional methods in feature extraction and similarity metric definition. Since images of different modalities have significant differences in structure and appearance, finding the accurate correspondence between them is a challenge. The method proposed in the paper utilizes convolutional neural networks (CNN) and convolutional long - short - term memory networks (ConvLSTM) to extract spatio - temporal features and guides the agent to perform correct registration operations through a custom - defined reward function, thereby achieving high - precision image registration. ### Main Contributions 1. **Proposed a new reinforcement learning framework**: This framework combines the policy network and the value network and can learn the perception - action cycle from scratch without using pre - trained convolutional features. 2. **Designed a reward function based on landmark error**: This reward function helps to solve the problem of inconsistent transformation parameter units and promotes the stable convergence of the model. 3. **Introduced the Monte Carlo look - ahead strategy**: As look - ahead guidance in the testing phase to overcome the problem of unknown termination states, further improving the accuracy and stability of prediction. ### Method Overview - **State Representation**: The fixed image \(I_f\) and the moving image \(I_m\) are resampled to the same size (168×168), and the state \(s_t\) is represented by a 3D tensor composed of these two images. - **Action Space**: The action space is discretized, allowing the agent to freely explore the entire registration parameter space. Specifically, it includes 8 candidate transformations, corresponding to changes of ±1 pixel, ±1° and ±0.05 for translation, rotation and scaling respectively. - **Reward Function**: The reward function is based on the Euclidean distance of landmark points and is used to measure the improvement after the agent selects a specific action. If the distance is less than the threshold \(\tau\), it is considered that the termination state is reached and a high reward is given. ### Model Structure - **Deep Actor - Critic Network**: This network simultaneously maintains the policy function \(\pi(\cdot|s_t; \theta)\) and the value function \(V(s_t; \theta_t)\). The policy function is responsible for selecting actions according to the current state, and the value function is used to evaluate the value of the current state. - **Convolutional Neural Network and Convolutional Long - Short - Term Memory Network**: CNN extracts short - term local spatial features, while ConvLSTM not only discovers inter - frame changes but also extracts long - term spatial features, thus making full use of spatio - temporal redundant information. ### Training Protocol - **Asynchronous Advantage Actor - Critic Algorithm (A3C)**: Multiple agents are associated with different environments and update the policy asynchronously. Each agent starts from a pair of unaligned images until the termination state is reached or the maximum episode length is reached. Through these innovations, the method proposed in the paper has achieved state - of - the - art performance on clinical datasets, demonstrating strong capabilities in multimodal medical image registration tasks.

End-to-end multimodal image registration via reinforcement learning

Multimodal Medical Image Registration Via Common Representations Learning and Differentiable Geometric Constraints

Weakly-supervised convolutional neural networks for multimodal image registration

Synergistic registration of CT-MRI brain images and retinal images: A novel approach leveraging reinforcement learning and modified artificial rabbit optimization

Unsupervised Multimodal 3D Medical Image Registration with Multilevel Correlation Balanced Optimization

Attention Guided Policy Optimization for 3D Medical Image Registration

Deep learning-based 3D brain multimodal medical image registration

Deep Learning based Inter-Modality Image Registration Supervised by Intra-Modality Similarity

Unsupervised End-to-end Learning for Deformable Medical Image Registration

Unsupervised deep learning registration model for multimodal brain images

MvMM-RegNet: A new image registration framework based on multivariate mixture model and neural network estimation

Multimodal MR Image Registration Using Weakly Supervised Constrained Affine Network

CMR-Agent: Learning a Cross-Modal Agent for Iterative Image-to-Point Cloud Registration

Unsupervised Registration Method based on Deep Neural Network: Application to cardiac and liver MR images

Unsupervised MMRegNet Based on Spatially Encoded Gradient Information.

Weakly supervised medical image registration with multi-information guidance

Unsupervised Multimodal Image Registration with Adaptative Gradient Guidance

Mutual Information Neural Estimation in CNN-based End-to-end Medical Image Registration

Unimodal Cyclic Regularization for Training Multimodal Image Registration Networks.

An Artificial Agent for Robust Image Registration

Networks for Joint Affine and Non-parametric Image Registration