Virtual Data Space System for National High-Performance Computing Environment

QIN Guangjun,XIAO Limin,ZHANG Guangyan,NIU Beifang,CHEN Zhiguang
DOI: https://doi.org/10.11959/j.issn.2096-0271.2021016
IF: 3.3
2021-01-01
Big Data Research
Abstract:High-performance computing (HPC) environment is the core information infrastructure supporting national scientific and technological innovation, economic development and national defense construction.High-performance computing powers around the world have been building wide-area HPC environments based on multi-supercomputing center resources.However, in the high-performance computing environment, there are many kinds of resources and wide geographical distribution, which cannot effectively exert the aggregation effect of resources, and it is difficult to meet the requirements of large-scale applications for unified management and efficient access to wide-area distributed data.To this end, a complete set of technologies were proposed, which could be used to build wide-area global virtual data space, including virtual data space model, cross-domain virtual data space constructing, efficiently migrating data in a wide-area environment, co-scheduling of storage resources and computing job and cross-domain high concurrency data aggregation processing, etc.Based on the above, a virtual data space system has been developed for the national high-performance computing environment (NHPCE), which can effectively support the unified and efficient access to the wide area distributed heterogeneous storage resources, and the distributed data in the wide-area environment can be shared and cooperative processed in a cross-domain manner.At present, the system was experimental deployed in NHPCE and three typical large-scale applications, such as molecular docking, genome-wide association study and weather forecasting model, have been verified.The verification results show that the developed technology and software system can effectively aggregate the wide area distributed storage resources and meet the data space requirements of large-scale applications.
What problem does this paper attempt to address?