Processing of GASKAP-HI pilot survey data using a commercial supercomputer

Ian P. Kemp,Nickolas M. Pingel,Rowan Worth,Justin Wake,Daniel A. Mitchell,Stuart D. Midgely,Steven J. Tingay,James Dempsey,Helga Dénes,John M. Dickey,Steven J. Gibson,Kate E. Jameson,Callum Lynn,Yik Ki Ma,Antoine Marchal,Naomi M. McClure-Griffiths,Snežana Stanimirović,Jacco Th. van Loon
2024-11-26
Abstract:Modern radio telescopes generate large amounts of data, with the next generation Very Large Array (ngVLA) and the Square Kilometre Array (SKA) expected to feed up to 292 GB of visibilities per second to the science data processor (SDP). However, the continued exponential growth in the power of the world's largest supercomputers suggests that for the foreseeable future there will be sufficient capacity available to provide for astronomers' needs in processing 'science ready' products from the new generation of telescopes, with commercial platforms becoming an option for overflow capacity. The purpose of the current work is to trial the use of commercial high performance computing (HPC) for a large scale processing task in astronomy, in this case processing data from the GASKAP-HI pilot surveys. We delineate a four-step process which can be followed by other researchers wishing to port an existing workflow from a public facility to a commercial provider. We used the process to provide reference images for an ongoing upgrade to ASKAPSoft (the ASKAP SDP software), and to provide science images for the GASKAP collaboration, using the joint deconvolution capability of WSClean. We document the approach to optimising the pipeline to minimise cost and elapsed time at the commercial provider, and give a resource estimate for processing future full survey data. Finally we document advantages, disadvantages, and lessons learned from the project, which will aid other researchers aiming to use commercial supercomputing for radio astronomy imaging. We found the key advantage to be immediate access and high availability, and the main disadvantage to be the need for improved HPC knowledge to take best advantage of the facility.
Instrumentation and Methods for Astrophysics
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to explore how to use commercial supercomputers to process data from the GASKAP - H I pilot survey. Specifically, the researchers hope to evaluate the feasibility and efficiency of commercial high - performance computing (HPC) platforms in large - scale astronomical data - processing tasks through this experiment, and provide references for more extensive applications in the future. ### Background of the Paper and Problem Description Modern radio telescopes generate a large amount of data. For example, the next - generation Very Large Array (ngVLA) and the Square Kilometre Array (SKA) are expected to produce up to 292 GB of visibility data per second. Faced with such a huge amount of data, radio astronomy relies heavily on high - performance computing (HPC) for imaging and other analyses to extract useful scientific information. However, as the performance of the world's most powerful supercomputers continues to improve, the capabilities of these supercomputers are expected to be sufficient to meet the needs of astronomers in the next few years. In addition, commercial platforms may also become an option for spill - over capacity. ### Research Objectives The research objective of this paper is to test the application of commercial supercomputing in large - scale astronomical data processing by using the supercomputer owned by DUG Technology in Perth, Western Australia to process the GASKAP - H I pilot survey data. Specific objectives include: 1. **Feasibility Assessment**: Identify and record the feasibility, advantages and disadvantages of using commercial facilities for data processing. 2. **Resource Estimation**: Provide resource estimates for using commercial supercomputing to process complete survey data. 3. **Reference Image Generation**: Use the joint deconvolution technique to generate reference images for the updated joint deconvolution algorithm in ASKAPSoft. 4. **Scientific Product Generation**: Generate fully - processed cubes from the pilot survey data for use by the scientific team. ### Main Challenges and Solutions - **Data Transmission**: Due to the high cost of data storage on commercial platforms, the researchers developed a "drip - irrigation" data - transmission method to gradually download the required data in an automated manner, avoiding overloading the CASDA system. - **Task Allocation Optimization**: By optimizing the selection of different software tools and node allocation, the processing efficiency is improved and the cost is reduced. For example, the most time - consuming tasks are assigned to CLX nodes with higher performance but also higher cost. - **Workflow Migration**: Migrate the existing workflow from the ANU environment to the DUG platform and adjust it according to the characteristics of the new platform. ### Conclusions Through this experiment, the researchers have verified the potential of commercial supercomputers in processing large - scale radio - astronomical data, especially in scenarios where immediate access and high availability are required. However, this also requires researchers to have more HPC knowledge in order to make full use of these facilities. This experience will contribute to more utilization of commercial supercomputing resources in the future, especially in the model of the SKA regional center. --- I hope the above summary can help you understand the core problems and solutions of this paper. If you have more questions or need further information, please feel free to let me know!