HARL: Optimizing Parallel File Systems with Heterogeneity-Aware Region-Level Data Layout
Shuibing He,Yang Wang,Xian-He Sun,Chengzhong Xu
DOI: https://doi.org/10.1109/tc.2016.2637905
IF: 3.183
2016-01-01
IEEE Transactions on Computers
Abstract:Parallel file system (PFS) is commonly used in high-end computing systems. With the emergence of solid state drives (SSDs), hybrid PFS, which consists of both HDD and SSD servers, provides a practical I/O system solution for data-intensive applications. However, most existing data layout schemes are inefficient for hybrid PFS due to their unawareness of server heterogeneities and workload changes in different parts of a file. In this study, we propose a heterogeneity-aware region-level data layout scheme, HARL, to improve the data distribution of a hybrid PFS. HARL first divides a file into fine-grained, varying sized regions according to the workload features of an application, then determines appropriate file stripe sizes on servers for each region based on the performance of heterogeneous servers. Furthermore, to further improve the performance of a hybrid PFS, we propose a dynamic region-level layout scheme, HARL-D, which creates multiple replicas for each region and redirects file requests to the proper replicas with the lowest access costs at the runtime. Experimental results of representative benchmarks and a real application show that HARL can greatly improve I/O system performance, and demonstrate the advantages of HARL-D over HARL.