A flexible I/O arbitration framework for netCDF‐based big data processing workflows on high‐end supercomputers

Jianwei Liao,Balazs Gerofi,Guo Yuan Lien,Takemasa Miyoshi,Seiya Nishizawa,Hirofumi Tomita,Wei-Keng Liao,Alok Nidhi Choudhary,Yutaka Ishikawa
DOI: https://doi.org/10.1002/cpe.4161
2017-01-01
Concurrency and Computation: Practice and Experience
Abstract:On the verge of the convergence between high-performance computing and Big Data processing, it has become increasingly prevalent to deploy large-scale data analytics workloads on high-end supercomputers. Such applications often come in the form of complex workflows with various different components, assimilating data from scientific simulations as well as from measurements streamed from sensor networks, such as radars and satellites. For example, as part of the Flagship 2020 (post-K) supercomputer project of Japan, RIKEN is investigating the feasibility of a highly accurate weather forecasting system that would provide a real-time outlook for severe guerrilla rainstorms. One of the main performance bottlenecks of this application is the lack of efficient communication among workflow components, which currently takes place over the parallel file system.In this paper, we present an initial study of a direct communication framework designed for complex workflows that eliminates unnecessary file I/O among components. Specifically, we propose an I/O arbitration layer that provides direct parallel data transfer (both synchronous and asynchronous) among job components that rely on the netCDF interface for performing I/O operations. Our solution requires only minimal modifications to application code. Moreover, we propose a configuration file-based approach that allows users to specify the desired data transfer pattern among workflow components, offering a general solution for different application contexts. We present a preliminary evaluation of the proposed framework on the K Computer (running on up to 4800 compute nodes) using RIKEN's experimental weather forecasting workflow as a case study.
What problem does this paper attempt to address?