Abstract:PURPOSE: Iterative reconstruction techniques hold great potential to mitigate the effects of data noise and/or incompleteness, and hence can facilitate the patient dose reduction. However, they are not suitable for routine clinical practice due to their long reconstruction times. In this work, the authors accelerated the computations by fully taking advantage of the highly parallel computational power on single and multiple graphics processing units (GPUs). In particular, the forward projection algorithm, which is not included in the close-form formulas, will be accelerated and optimized by using GPU here.METHODS: The main contribution is a novel forward projection algorithm that uses multithreads to handle the computations associated with a bunch of adjacent rays simultaneously. The proposed algorithm is free of divergence and bank conflict on GPU, and benefits from data locality and data reuse. It achieves the efficiency particularly by (i) employing a tiled algorithm with three-level parallelization, (ii) optimizing thread block size, (iii) maximizing data reuse on constant memory and shared memory, and (iv) exploiting built-in texture memory interpolation capability to increase efficiency. In addition, to accelerate the iterative algorithms and the Feldkamp-Davis-Kress (FDK) algorithm on GPU, the authors apply batched fast Fourier transform (FFT) to expedite filtering process in FDK and utilize projection bundling parallelism during backprojection to shorten the execution times in FDK and the expectation-maximization (EM).RESULTS: Numerical experiments conducted on an NVIDIA Tesla C1060 GPU demonstrated the superiority of the proposed algorithms in computational time saving. The forward projection, filtering, and backprojection times for generating a volume image of 512 x 512 x 512 with 360 projection data of 512 x 512 using one GPU are about 4.13, 0.65, and 2.47 s (including distance weighting), respectively. In particular, the proposed forward projection algorithm is ray-driven and its paralleli-zation strategy evolves from single-thread-for-single-ray (38.56 s), multithreads-for-single-ray (26.05 s), to multithreads-for-multirays (4.13 s). For the voxel-driven backprojection, the use of texture memory reduces the reconstruction time from 4.95 to 3.35 s. By applying the projection bundle technique, the computation time is further reduced to 2.47 s. When employing multiple GPUs, near-perfect speedups were observed as the number of GPUs increases. For example, by using four GPUs, the time for the forward projection, filtering, and backprojection are further reduced to 1.11, 0.18, and 0.66 s. The results obtained by GPU-based algorithms are virtually indistinguishable with those by CPU.CONCLUSIONS: The authors have proposed a highly optimized GPU-based forward projection algorithm, as well as the GPU-based FDK and expectation-maximization reconstruction algorithms. Our compute unified device architecture (CUDA) codes provide the exceedingly fast forward projection and backprojection that outperform those using the shading languages, cell broadband engine architecture and previous CUDA implementations. The reconstruction times in the FDK and the EM algorithms were considerably shortened, and thus can facilitate their routine usage in a variety of applications such as image quality improvement and dose reduction.

GPU-accelerated MART and Concurrent Cross-Correlation for Tomographic PIV

The Implementation of the Three-Dimensional Unified Gas-Kinetic Wave-Particle Method on Multiple Graphics Processing Units

GPU Accelerated Computation for Surface Topography Measurement

GPU Accelerated Digital Volume Correlation

TomocuPy - efficient GPU-based tomographic reconstruction with asynchronous data processing

A Graphics Processing Unit Implementation and Optimization for Parallel Double-Difference Seismic Tomography

Parallel Multi-threaded Gridrec Algorithm for Computer Tomography on GPU for Edge Computing

Particle Field Deconvolution Multiplicative Algebraic Reconstruction Technique for Tomographic Particle Image Velocimetry Reconstruction

A Multi-GPU Parallel Algorithm in Hypersonic Flow Computations

Gpu Speed-Up For The Implicit Navier-Stokes Solver

Fast Cone-Beam CT Image Reconstruction Using GPU Hardware

GPU-based 3D cone-beam CT image reconstruction: application to micro CT

Multi-GPU Jacobian accelerated computing for soft-field tomography

[A GPU-based Fast Volume CT Reconstructive Algorithm Method].

Parallel Computing For Quantitative Blood Flow Imaging In Photoacoustic Microscopy

Multidisciplinary simulation acceleration using multiple shared memory graphical processing units

GPU acceleration of an iterative scheme for gas-kinetic model equations with memory reduction techniques

A GPU-enabled acceleration algorithm for the CAM5 cloud microphysics scheme

GPU-accelerated parallel image reconstruction strategies for magnetic particle imaging

A fast forward projection using multithreads for multirays on GPUs in medical image reconstruction

GPU Implementation of the Discrete Unified Gas Kinetic Scheme for Low-Speed Isothermal Flows