Application of the asynchronous advantage actor–critic machine learning algorithm to real-time accelerator tuning
Yun Zou,Qing-Zi Xing,Bai-Chuan Wang,Shu-Xin Zheng,Cheng Cheng,Zhong-Ming Wang,Xue-Wu Wang
DOI: https://doi.org/10.1007/s41365-019-0668-1
2019-01-01
Nuclear Science and Techniques
Abstract:This paper describes a real-time beam tuning method with an improved asynchronous advantage actor–critic (A3C) algorithm for accelerator systems. The operating parameters of devices are usually inconsistent with the predictions of physical designs because of errors in mechanical matching and installation. Therefore, parameter optimization methods such as pointwise scanning, evolutionary algorithms (EAs), and robust conjugate direction search are widely used in beam tuning to compensate for this inconsistency. However, it is difficult for them to deal with a large number of discrete local optima. The A3C algorithm, which has been applied in the automated control field, provides an approach for improving multi-dimensional optimization. The A3C algorithm is introduced and improved for the real-time beam tuning code for accelerators. Experiments in which optimization is achieved by using pointwise scanning, the genetic algorithm (one kind of EAs), and the A3C-algorithm are conducted and compared to optimize the currents of four steering magnets and two solenoids in the low-energy beam transport section (LEBT) of the Xi’an Proton Application Facility. Optimal currents are determined when the highest transmission of a radio frequency quadrupole (RFQ) accelerator downstream of the LEBT is achieved. The optimal work points of the tuned accelerator were obtained with currents of 0 A, 0 A, 0 A, and 0.1 A, for the four steering magnets, and 107 A and 96 A for the two solenoids. Furthermore, the highest transmission of the RFQ was 91.2%. Meanwhile, the lower time required for the optimization with the A3C algorithm was successfully verified. Optimization with the A3C algorithm consumed 42% and 78% less time than pointwise scanning with random initialization and pre-trained initialization of weights, respectively.