Explore New Computing Environment for LHAASO Offline Data Analysis

Qiulan Huang,Gongxing Sun,Qiao Yin,Zhanchen Wei,Qiang Li
DOI: https://doi.org/10.22323/1.327.0021
2018-01-01
Abstract:This paper explores a way to build a new computing environment based on Hadoop to make the Large High Altitude Air Shower Observatory(LHAASO) jobs run on it transparently. Particularly, we discuss a new mechanism to support LHAASO software to random access data in HDFS. This new feature allows the Map/Reduce tasks to random read/write data on the local file system instead of using Hadoop data streaming interface. This makes HEP jobs run on Hadoop possible. We also develop MapReduce patterns for LHAASO jobs such as Corsika simulation, ARGO detector simulation (Geant4), KM2A simulation and Medea++ reconstruction. And user-friendly interface is provided. In addition, we provide the real-time cluster monitoring in terms of cluster healthy, number of running jobs, finished jobs and killed jobs. Also the accounting system is included. This work has been in production for LHAASO offline data analysis to gain about 20,000 CPU hours per month since September, 2016. The results show the efficiency of IO intensive job can be improved about 46%. Finally, we describe our ongoing work of data migration tool to serve the data move between HDFS and other storage systems.
What problem does this paper attempt to address?