Automatic, Application-Aware I/O Forwarding Resource Allocation
Xu Ji,Bin Yang,Tianyu Zhang,Xiaosong Ma,Xiupeng Zhu,Xiyang Wang,Nosayba El-Sayed,Jidong Zhai,Weiguo Liu,Wei Xue
2019-01-01
Abstract:The I/O forwarding architecture is widely adopted on modern supercomputers, with a layer of intermediate nodes sitting between the many compute nodes and backend storage nodes. This allows compute nodes to run more efficiently and stably with a leaner OS, offloads I/O coordination and communication with backend from the compute nodes, maintains less concurrent connections to storage systems, and provides additional resources for effective caching, prefetching, write buffering, and I/O aggregation. However, with many existing machines, these forwarding nodes are assigned to serve a fixed set of compute nodes.We explore an automatic mechanism, DFRA, for application-adaptive dynamic forwarding resource allocation. We use I/O monitoring data that proves affordable to acquire in real time and maintain for long-term history analysis. Upon each job's dispatch, DFRA conducts a historybased study to determine whether the job should be granted more forwarding resources or given dedicated forwarding nodes. Such customized I/O forwarding lets the small fraction of I/O-intensive applications achieve higher I/O performance and scalability, meanwhile effectively isolating disruptive I/O activities. We implemented, evaluated, and deployed DFRA on Sunway TaihuLight, the current No.3 supercomputer in the world. It improves applications' I/O performance by up to 18.9x, eliminates most of the inter-application I/O interference, and has saved over 200 million of core-hours during its test deployment on TaihuLight for 11 months. Finally, our proposed DFRA design is not platform-dependent, making it applicable to the management of existing and future I/O forwarding or burst buffer resources.