Abstract:The advanced magnetic resonance (MR) image reconstructions such as the compressed sensing and subspace-based imaging are considered as large-scale, iterative, optimization problems. Given the large number of reconstructions required by the practical clinical usage, the computation time of these advanced reconstruction methods is often unacceptable. In this work, we propose using Google's Tensor Processing Units (TPUs) to accelerate the MR image reconstruction. TPU is an application-specific integrated circuit (ASIC) for machine learning applications, which has recently been used to solve large-scale scientific computing problems. As proof-of-concept, we implement the alternating direction method of multipliers (ADMM) in TensorFlow to reconstruct images on TPUs. The reconstruction is based on multi-channel, sparsely sampled, and radial-trajectory $k$-space data with sparsity constraints. The forward and inverse non-uniform Fourier transform operations are formulated in terms of matrix multiplications as in the discrete Fourier transform. The sparsifying transform and its adjoint operations are formulated as convolutions. The data decomposition is applied to the measured $k$-space data such that the aforementioned tensor operations are localized within individual TPU cores. The data decomposition and the inter-core communication strategy are designed in accordance with the TPU interconnect network topology in order to minimize the communication time. The accuracy and the high parallel efficiency of the proposed TPU-based image reconstruction method are demonstrated through numerical examples.

What problem does this paper attempt to address?

### Problems the paper attempts to solve This paper aims to solve the computational efficiency problem in magnetic resonance imaging (MRI) reconstruction. Specifically, modern advanced MRI image reconstruction methods (such as compressed sensing and subspace imaging) are regarded as large - scale, iterative optimization problems. Due to the large number of reconstruction tasks required in clinical practice, the computational time of these advanced reconstruction methods is usually unacceptable. For this reason, the author proposes to use Google's tensor processing unit (TPU) to accelerate the MRI reconstruction process. ### Background and motivation 1. **Importance of MRI**: MRI is a powerful imaging tool that can non - invasively reveal the structure, function, and biological information of the human body. Since its invention in the 1970s, MRI has revolutionized the field of medical imaging because of its excellent soft - tissue contrast and high spatial resolution. 2. **Limitations of sampling speed**: Although MR hardware and imaging sequences have made significant progress in the past few decades, the data acquisition speed has approached the physical and physiological limits. Therefore, the Nyquist sampling standard required for traditional Fourier - transform image reconstruction has become a bottleneck for further accelerating MRI. 3. **Modern MRI methods**: Modern MRI imaging methods such as parallel imaging, compressed sensing, and subspace imaging significantly reduce the imaging time by sparsely sampling the k - space and use coil sensitivity and prior knowledge (such as sparsity) to reconstruct artifact - free images from undersampled data. However, these advanced reconstruction methods are usually based on large - scale, iterative optimization algorithms, in which the computational time of the non - uniform Fourier transform often fails to meet the actual clinical needs. 4. **Need for hardware acceleration**: To achieve clinically practical running times, hardware accelerators and parallel computing are used to handle computationally intensive MRI reconstruction tasks. Graphics processing units (GPU) have been widely studied for accelerating non - uniform Fourier transforms and iterative image reconstruction methods (such as the conjugate gradient method and the alternating direction multiplier method (ADMM)) to achieve advanced reconstruction of multi - channel undersampled data. Although significant progress has been made, for large - scale dynamic image reconstruction problems, the running time is still not compatible with clinical practice. 5. **Advantages of TPU**: In recent years, the success of machine learning (especially deep learning) has given rise to new hardware accelerators, among which Google's TPU is considered a promising method to solve the computational challenges brought by continuous and exponentially growing data. Although the TPU was initially designed as an application - specific integrated circuit (ASIC) to run cutting - edge machine - learning models on Google Cloud, it has also been recently used to solve large - scale scientific computing problems. ### Proposed solutions 1. **Application of TPU**: The author proposes to use the TPU to accelerate MRI reconstruction. The TPU has the following four main advantages: - **Tensor operations**: The main operations in MRI image reconstruction (such as non - uniform Fourier transform, sparsifying transform, and sensitivity profile encoding) can all be expressed as tensor operations, thus taking full advantage of the TPU's strength in efficient matrix multiplication. - **Data decomposition and communication strategies**: The data decomposition and communication strategies are consistent with the TPU interconnect network topology, making all the above - mentioned tensor operations localized within a single core, requiring only minimal image - size data communication per iteration. - **Large - capacity memory**: The TPU has a relatively large on - package memory capacity and can efficiently handle large - scale problems. - **Ease of programming**: The TPU can be easily programmed through software front - ends (such as TensorFlow), which provides rich scientific computing functions and simplifies the deployment of distributed MRI image reconstruction algorithms on the TPU. 2. **Implementation methods**: To verify the TPU - accelerated MRI image reconstruction, the author implemented the ADMM algorithm in TensorFlow to reconstruct images from multi - channel, sparsely sampled, and radially - trajectoried k - space data. The forward and inverse non - uniform Fourier transforms are expressed as matrix multiplications, and the sparsifying transform and its adjoint operations are expressed as convolutions. ### Results and discussion 1. **Accuracy analysis**: The accuracy of TPU - reconstructed images was verified by comparing them with CPU - reconstructed images. The results show that for non - iterative DFT image reconstruction, the relative difference between the TPU and the CPU is approximately 0.1%; for ADMM - iterative image reconstruction, the relative difference is approximately 1%. Considering that the reconstruction error of compressed - sensing MRI is usually around 5% when using float32.

Accelerating MRI Reconstruction on TPUs

Improved Robust Tensor Principal Component Analysis for Accelerating Dynamic MR Imaging Reconstruction

Directional Tensor Product Complex Tight Framelets for Compressed Sensing MRI Reconstruction

Parallel Imaging and Convolutional Neural Network Combined Fast MR Image Reconstruction: Applications in Low-Latency Accelerated Real-Time Imaging.

High-resolution imaging on TPUs

Accelerating MRI Reconstruction Via Three-Dimensional Dual-Dictionary Learning Using CUDA

Accelerating the Reconstruction of Magnetic Resonance Imaging by Three-Dimensional Dual-Dictionary Learning Using CUDA.

PILOT: Physics-Informed Learned Optimized Trajectories for Accelerated MRI

Hardware-Enabled Efficient Data Processing with Tensor-Train Decomposition

A Novel Parallel Algorithm for Sparse Tensor Matrix Chain Multiplication via TCU-Acceleration

Accelerating Compressed-Sensing-Based DCE-MR Image Reconstruction with GPU

Hardware Acceleration of Explainable Machine Learning using Tensor Processing Units

Accelerating Dynamic MRI Reconstruction Using Adaptive Sequentially Truncated Higher-Order Singular Value Decomposition

Image Reconstruction by Mumford–Shah Regularization for Low-Dose CT with Multi-Gpu Acceleration

Deep Cardiac MRI Reconstruction with ADMM

Sparse MTTKRP Acceleration for Tensor Decomposition on GPU

Learning Task-Specific Strategies for Accelerated MRI

Ultra-Fast T2-Weighted MR Reconstruction Using Complementary T1-Weighted Information.

Multi-GPU Jacobian accelerated computing for soft-field tomography

Accelerated High-Dimensional MR Imaging with Sparse Sampling Using Low-Rank Tensors.

Medical Image Computing and Computer Assisted Intervention – MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part II