Abstract:Medical PhysicsVolume 42, Issue 4 p. 1474-1476 Point/counterpointFree Access GPU technology is the hope for near real-time Monte Carlo dose calculations Xun Jia Ph.D., Xun Jia Ph.D. Department of Radiation Oncology, The University of Texas Southwestern Medical Center, Dallas, Texas 75390 (Tel: 214-648-3224; E-mail: [email protected])Search for more papers by this authorX. George Xu Ph.D., X. George Xu Ph.D. Nuclear Engineering Program, Rensselaer Polytechnic Institute, Troy, New York 12180 (Tel: 518-276-4014; E-mail: [email protected])Search for more papers by this authorColin G. Orton Ph.D., Colin G. Orton Ph.D. ModeratorSearch for more papers by this author Xun Jia Ph.D., Xun Jia Ph.D. Department of Radiation Oncology, The University of Texas Southwestern Medical Center, Dallas, Texas 75390 (Tel: 214-648-3224; E-mail: [email protected])Search for more papers by this authorX. George Xu Ph.D., X. George Xu Ph.D. Nuclear Engineering Program, Rensselaer Polytechnic Institute, Troy, New York 12180 (Tel: 518-276-4014; E-mail: [email protected])Search for more papers by this authorColin G. Orton Ph.D., Colin G. Orton Ph.D. ModeratorSearch for more papers by this author First published: 11 March 2015 https://doi.org/10.1118/1.4903901Citations: 14AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL OVERVIEW Monte Carlo (MC) dose calculations are recognized as being the most accurate modality for radiotherapy treatment planning but, because of the excessive computational time required, they cannot presently be used for near real-time dose calculations. Currently, the most common way to accelerate MC dose calculations is to use clusters of central processing units (CPUs), but some believe that the future of near real-time MC dose calculations lies not with clusters of CPUs but with the use of graphics processing unit (GPU) technology. This is the claim debated in this month's Point/Counterpoint. Arguing for the Proposition is Xun Jia, Ph.D. Dr. Jia received his Masters degree in Applied Mathematics and Ph.D. degree in Physics, both from UCLA. He is currently an Assistant Professor in the Department of Radiation Oncology, University of Texas Southwestern Medical Center. Dr. Jia's research focuses on GPU-based high-performance computing for medical physics and medical imaging. He has developed several Monte Carlo packages to improve efficiency for photon, electron, and proton transport. Dr. Jia's research has been supported by government and industrial grants and he has published 60 peer-reviewed papers. He is currently a section editor of the Journal of Applied Clinical Medical Physics. Arguing against the Proposition is X. George Xu, Ph.D. Dr. Xu obtained his Ph.D. in Nuclear Engineering from Texas A&M University, College Station, TX and, for the past 20 years, he has been on the faculty of Rensselaer Polytechnic Institute, Troy, NY, where he currently holds the Edward E. Hood Endowed Chair of Engineering. Dr. Xu's research has centered around applications of Monte Carlo methods to problems in radiation protection, imaging, and radiation therapy. He has been continuously funded by the NIH over the past ten years, including an R01 grant to develop a new Monte Carlo code, archer, for heterogeneous computing involving GPUs and coprocessors. He is the author of more than 150 journal papers and book chapters, and 270 conference abstracts. Dr. Xu is a Fellow of the American Association of Physicists in Medicine, the Health Physics Society, and the American Nuclear Society. In 2014, he was re-elected to a 6-yr term as a council member of the National Council on Radiation Protection and Measurements. FOR THE PROPOSITION: Xun Jia, Ph.D. Opening Statement Clinical applications of MC dose calculations have been limited by the long computation time to achieve a sufficient precision level. Over the years, great efforts have been devoted to accelerating MC simulations. Recently, with the success of GPU-based high-performance computing,1,2 particularly for MC simulations, near real-time (e.g., seconds or subseconds) dose calculation is becoming feasible. Achieving this will not only facilitate its routine utilization, but also realize novel applications to advance radiotherapy practice, such as MC-based inverse treatment planning. To date, the computation time for a typical photon plan has been reduced to less than a minute with ∼1% uncertainty using only one GPU, and the speed can be further boosted with multiple GPUs by a factor proportional to the number of GPUs. Also reported are computation times as low as seconds to tens of seconds for different applications.3,4 Notably, the group at UT Southwestern5 has developed a GPU application to visualize an MC-reconstructed dose delivery process in almost real-time during beam delivery, with a refresh frequency of >10 Hz. These achievements have clearly demonstrated the potential of near real-time MC dose calculations. Besides advantages in speed, GPUs also hold other favorable features for clinical applications. First, GPUs are orders of magnitude lower in cost than a conventional high-performance-computing structure with a similar processing power. Second, GPUs are locally hosted and managed. This is particularly important for problems aiming at near real-time applications, since data-transfer and job-scheduling times cannot be neglected if the computation facility is remotely placed and shared by many users. Patient privacy may also be a concern when transferring medical data to a remote facility. Of course we cannot neglect disadvantages of using GPUs for MC. As a new platform, redevelopment of codes is necessary. However, burdens of initial code development have been overcome to a large extent, and several packages have been successfully built. Efforts have also been initiated to write MC packages in OpenCL to increase portability.6 While there are also technical issues hindering computational efficiency, e.g., thread divergence and memory writing conflicts, many solutions exist to remove or alleviate them.4,7 I would also like to mention a strong competitor of the GPU, the Intel many integrated core (MIC) processor. What makes this particularly attractive is its x86 compatibility, which can run existing CPU codes with minor modification. However, just like for GPUs, substantial effort is needed to achieve optimal performance.8 Simply running an existing code may not achieve high acceleration, because parallel-computing specific issues such as memory access and vectorization were not considered sufficiently in the conventional CPU code. As of today, there has been only limited study regarding MC dose calculations on MIC processors. While it holds the potential to improve efficiency, a lot of research is needed. In conclusion, GPU technology has the capability of substantially accelerating MC simulations. Its advantages and extensive research efforts demonstrate the hope for near real-time dose calculations. AGAINST THE PROPOSITION: X. George Xu, Ph.D. Opening Statement Since the invention of computers in the 1940s, MC codes have been developed for nuclear engineering, high-energy physics, and, recently, medical physics applications. However, most radiation treatment planning is done currently using dosimetry algorithms that are extremely fast, but only "approximately" correct.9 Given the lasting interest in accelerating MC methods, the recent hype related to the GPU is not surprising. Originally marketed by NVidia as household devices, GPU-based game consoles offered amazingly fast graphics at an affordable price. It did not take long, however, for the scientific community to realize that these desktop toys were actually parallel computers. As summarized in two review papers,1,2 GPU adopters from the medical physics community wasted no time in reporting overwhelmingly positive experiences, including a dozen studies that focused specifically on MC dosimetry. Impressive, but inconsistent, "speedup factors" ranging from single digits to several hundreds were reported within months, sometimes by the same group. It has become a cliché to highlight how fast an MC-based dose calculation can be done with a GPU. Such results indeed attracted a lot of attention from medical physicists who are notoriously busy and seeking expediency. There are two strong indications that GPU technology is only hype and not the hope for near real-time, fully MC dose calculations. First, we have not seen any convincing evidence that the GPU is indeed better than traditional solutions for running MC dose calculations. Both of the above review papers1,2 enjoyed referencing the rapidly increasing number of GPU-related journal articles—which only reinforces the concept of a "hype cycle." Furthermore, the authors of the GPU-accelerated MC studies obscure the issue by omitting details on how they compared GPU performance with traditional CPUs. CPU-based clusters are currently so cheap that one can assemble a desk-side 32-core cluster for about $3000US—the cost of a high-end CPU/GPU system. Using software optimization schemes and hyperthreading, such a CPU cluster may achieve a speedup similar to the best reported for GPUs, without the painful process of rewriting the MC code for the GPU/compute unified device architecture (GPU/CUDA) environment. But few of the GPU enthusiasts optimized the CPU code in order to make fair performance comparisons. It has been observed that a lack of "fair comparison" measures is responsible for exaggerated GPU performance.10 Second, competing technologies are mostly ignored by GPU adopters. Intel's Xeon Phi coprocessor, for example, which comes with 60 embedded Pentium cores, is capable of achieving a similar level of parallelism as GPUs.11–13 Adopting the coprocessor is relatively easy and a large number of them are, in fact, used in Tianhe-2—the world's number-1 supercomputer. The "heterogeneous computing" era has just begun and it is uncertain which hardware (and software) technology will dominate the market.14 The excitement brought by the GPU has reignited our interest in achieving real-time MC dose calculations and one should take full advantage of the research opportunities.15 However, an inflated expectation can be counterproductive, especially when investing in a single technology that may be obsolete in ten years. Rebuttal: Xun Jia, Ph.D. I agree that variations in reported GPU-acceleration factors exist due to different degrees of software/hardware utilization and optimization. However, it is quite difficult, if not impossible, to conduct an absolutely fair comparison. For example, I would like to mention the software aspect that unfairly treats GPUs: Software optimization schemes, such as variance reduction techniques widely employed in CPU-based MC packages, have been barely explored for GPUs. The deterministic nature of such algorithms is expected to be particularly favorable for GPU's single-instruction-multiple-thread structure. Yet it is absolute computational efficiency, rather than performance relative to CPUs, that determines the feasibility of near real-time MC calculations. The fact that a single GPU can already compute dose in seconds strongly supports this feasibility. Practicality should also be considered. While a low-end cluster with 4–8 computers may offer high speed, it is more advantageous in a clinical environment to use GPU-enabled computers in terms of energy efficiency, ease of management, etc. The utilization of GPUs in scientific computing is absolutely more than hype. Among the world's top 500 supercomputers, 46 of them use GPU-based coprocessors compared to only 17 systems with MIC coprocessors. A few major vendors in radiotherapy, e.g., RaySearch and Elekta, already employ GPUs in their products. I agree that multiple options are available to substantially accelerate MC in this era of booming technology. Intel MIC is a great example. Nonetheless, it too may be hype which only emphasizes the ease of programmability based on existing CPU codes but hides the required efforts of performance tuning. There is probably no single technology that is undoubtedly better than others. However, based on the overall consideration of GPU's advantages and developments so far, I believe that GPU technology is the hope for near real-time MC dose calculations. Rebuttal: X. George Xu, Ph.D. I agree with Dr. Jia that the capability of real-time MC dose calculations is within reach owing largely to the innovative technology and marketing strategies by Nvidia. The greatest roadblock to GPU is the fact that the effort to translate legacy MC codes to the new CUDA programming environment is prohibitively expensive. GPU also faces tough technological challenges, including limited memory and data bandwidth.14 Given the steep investment and market risk, for everyone to jump onto the GPU wagon is costly and unwise. To CPU enthusiasts, multithreading techniques such as OpenMP and Pthreads are readily available for parallel computing. Intel CPUs come with hyperthreading for concurrent execution, and various compiler options can be used for optimization. As a competing architecture, Intel's MIC is much easier to adopt. To avoid "unfair comparison" between GPU and CPU,11 one should consider the above-mentioned software optimization techniques and pick a "multicore" CPU (instead of a "single-core") at a similar price to the GPU implementation. Comparative studies should also consider software related labor expenses. When we recently compared the performances of ARCHER—an MC dosimetry code developed from scratch by my Ph.D. students11–13—in the CPU, GPU, and MIC platforms, we found that GPU's advantages as a dose engine are less dramatic than some of those reported in the literature. All things considered, traditional CPU clusters and MIC remain serious competitors to GPUs when energy efficiency is not the priority. In the next five years, all these technologies are expected to evolve rapidly. The potential waste of capital and human resources due to hype and misleading information should be avoided. To this end, peer-reviewed journal publication and grant application processes should emphasize balanced GPU studies that offer the best methodologies and practices to the medical physics community. REFERENCES 1X. Jia, P. Ziegenhein, and S. B. Jiang, "GPU-based high-performance computing for radiation therapy," Phys. Med. Biol. 59, R151– R182 (2014).10.1088/0031-9155/59/4/R151 2G. Pratx and L. Xing, "GPU computing in medical physics: A review," Med. Phys. 38, 2685– 2697 (2011).10.1118/1.3578605 3S. Hissoiny, M. D'Amours, B. Ozell, P. Despres, and L. Beaulieu, "Sub-second high dose rate brachytherapy Monte Carlo dose calculations with bGPUMCD," Med. Phys. 39, 4559– 4567 (2012).10.1118/1.4730500 4X. Jia, J. Schuemann, H. Paganetti, and S. B. Jiang, "GPU-based fast Monte Carlo dose calculation for proton therapy," Phys. Med. Biol. 57, 7783– 7797 (2012).10.1088/0031-9155/57/23/7783 5F. Shi, X. Gu, Y. Graves, S. Jiang, and X. Jia, "A real-time virtual delivery system for photon radiotherapy delivery monitoring," Med. Phys. 41(6), 432 (2014).10.1118/1.4889184 6Khronos OpenCL Working Group, "The open standard for parallel programming of heterogeneous systems" (2013), available at: https://www.khronos.org/opencl/.others. 7S. Hissoiny, B. Ozell, H. Bouchard, and P. Despres, "GPUMCD: A new GPU-oriented Monte Carlo dose calculation platform," Med. Phys. 38, 754– 764 (2011).10.1118/1.3539725 8D. Mackay, "Optimization and performance tuning for Intel®Xeon Phi™ coprocessors–Part 1: Optimization essentials" (2012), available at: https://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi- coprocessors-part-1-optimization.others. 9D. W. O. Rogers, "Fifty years of Monte Carlo simulations for medical physics," Phys. Med. Biol. 51, R287– R301 (2006).10.1088/0031-9155/51/13/R17 10V. W. Lee, C. Kim, J. Chhugani, M. Deisher, D. Kim, A. D. Nguyen, N. Satish, M. Smelyanskiy, S. Chennupaty, P. Hammarlund, R. Singhal, and P. Dubey, "Debunking the 100X GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU," in Proceedings of the 37th Annual International Symposium on Computer Architecture (ACM, New York, NY, 2010), Vol. 38(3), pp. 451– 460. 11T. Liu, X. G. Xu, and C. D. Carothers, "Comparison of two accelerators for Monte Carlo radiation transport calculations, NVIDIA Tesla M2090 GPU and Intel Xeon Phi 3120 coprocessor: A case study for x-ray CT imaging dose calculation," in Joint International Conference on Supercomputing in Nuclear Applications and Monte Carlo (SNA + MC 2013), Paris, France, 27–31 October (EDP Sciences, Les Ulis, France, 2014). 12L. Su, Y. M. Yang, B. Bednarz, E. Sterpin, X. Du, T. Liu, W. Ji, and X. G. Xu, "ARCHERRT—A photon-electron coupled Monte Carlo dose computing engine for GPU: Software development and application to helical tomotherapy," Med. Phys. 41, 071709 (13pp.) (2014).10.1118/1.4884229 13X. G. Xu, T. Liu, L. Su, X. Du, M. J. Riblett, W. Ji, D. Gu, C. D. Carothers, M. S. Shephard, F. B. Brown, M. K. Kalra, and B. Liu, "archer, a new Monte Carlo software tool for emerging heterogeneous computing environments," in Joint International Conference on Supercomputing in Nuclear Applications and Monte Carlo (SNA + MC 2013), Paris, France, 27–31 October (EDP Sciences, Les Ulis, France, 2014). 14B. R. Gaster, L. Howes, D. R. Kaeli, P. Mistry, and D. Schaa, Heterogeneous Computing with OpenCL, 2nd ed. (Elsevier, Inc., Waltham, MA, 2013). 15T. Friedman, Do believe the hype, New York times, 2 November, 2010, available at: http://www.nytimes.com/2010/11/03/opinion/03friedman.html?_r=0.others. Citing Literature Volume42, Issue4April 2015Pages 1474-1476 ReferencesRelatedInformation

GPU Technology is the Hope for Near Real‐time Monte Carlo Dose Calculations

GPU-accelerated Monte Carlo convolution/superposition implementation for dose calculation

GPU-based fast Monte Carlo simulation for radiotherapy dose calculation

Fast on-site Monte Carlo tool for dose calculations in CT applications

A New Approach to Integrate GPU-based Monte Carlo Simulation into Inverse Treatment Plan Optimization for Proton Therapy.

Development of a GPU-based Monte Carlo dose calculation code for coupled electron-photon transport

A fast GPU-based Monte Carlo simulation of proton transport with detailed modeling of non-elastic interactions

Real-time dose computation: GPU-accelerated source modeling and superposition/convolution

A GPU-accelerated Monte Carlo code, RT2for coupled transport of photon, electron/positron, and neutron

EVALUATION OF SPEEDUP OF MONTE CARLO CALCULATIONS OF TWO SIMPLE REACTOR PHYSICS PROBLEMS CODED FOR THE GPU/CUDA ENVIRONMENT

GPU-Accelerated Monte Carlo Electron Transport Methods: Development and Application for Radiation Dose Calculations Using Six GPU cards

GPU-based ultra fast dose calculation using a finite pencil beam model

Accelerated ray tracing for radiotherapy dose calculations on a GPU

A GPU implementation of a track-repeating algorithm for proton radiotherapy dose calculations

An OpenCL-based Monte Carlo dose calculation engine (oclMC) for coupled photon-electron transport

TU-AB-BRC-10: Modeling of Radiotherapy Linac Source Terms Using ARCHER Monte Carlo Code: Performance Comparison of GPU and MIC Computing Accelerators.

Technical note: A GPU‐based shared Monte Carlo method for fast photon transport in multi‐energy x‐ray exposures

GPU-based Parallel Monte Carlo Simulation for Radiotherapy Dose Calculation

Development and application of graphics processor units-based Monte Carlo simulation in radiation dose calculation

A general-purpose Monte Carlo particle transport code based on inverse transform sampling for radiotherapy dose calculation

Multi-GPU implementation of a VMAT treatment plan optimization algorithm