A microstructure estimation Transformer inspired by sparse representation for diffusion MRI

Tianshu Zheng,Guohui Yan,Haotian Li,Weihao Zheng,Wen Shi,Yi Zhang,Chuyang Ye,Dan Wu
DOI: https://doi.org/10.1016/j.media.2023.102788
Abstract:Diffusion magnetic resonance imaging (dMRI) is an important tool in characterizing tissue microstructure based on biophysical models, which are typically multi-compartmental models with mathematically complex and highly non-linear forms. Resolving microstructures from these models with conventional optimization techniques is prone to estimation errors and requires dense sampling in the q-space with a long scan time. Deep learning based approaches have been proposed to overcome these limitations. Motivated by the superior performance of the Transformer in feature extraction than the convolutional structure, in this work, we present a learning-based framework based on Transformer, namely, a Microstructure Estimation Transformer with Sparse Coding (METSC) for dMRI-based microstructural parameter estimation. To take advantage of the Transformer while addressing its limitation in large training data requirement, we explicitly introduce an inductive bias-model bias into the Transformer using a sparse coding technique to facilitate the training process. Thus, the METSC is composed with three stages, an embedding stage, a sparse representation stage, and a mapping stage. The embedding stage is a Transformer-based structure that encodes the signal in a high-level space to ensure the core voxel of a patch is represented effectively. In the sparse representation stage, a dictionary is constructed by solving a sparse reconstruction problem that unfolds the Iterative Hard Thresholding (IHT) process. The mapping stage is essentially a decoder that computes the microstructural parameters from the output of the second stage, based on the weighted sum of normalized dictionary coefficients where the weights are also learned. We tested our framework on two dMRI models with downsampled q-space data, including the intravoxel incoherent motion (IVIM) model and the neurite orientation dispersion and density imaging (NODDI) model. The proposed method achieved up to 11.25 folds of acceleration while retaining high fitting accuracy for NODDI fitting, reducing the mean squared error (MSE) up to 70% compared with the previous q-space learning approach. METSC outperformed the other state-of-the-art learning-based methods, including the model-free and model-based methods. The network also showed robustness against noise and generalizability across different datasets. The superior performance of METSC indicates its potential to improve dMRI acquisition and model fitting in clinical applications.
What problem does this paper attempt to address?