SSDUP: a traffic-aware ssd burst buffer for HPC systems.

Xuanhua Shi,Ming Li,Wei Liu,Hai Jin,Chen Yu,Yong Chen
DOI: https://doi.org/10.1145/3079079.3079087
2017-01-01
Abstract:Many high performance computing (HPC) applications are highly data intensive. Current HPC storage systems still use hard disk drives (HDDs) as their dominant storage devices, which suffer from disk head thrashing when accessing random data. New storage devices such as solid state drives (SSDs), which can handle random data access much more efficiently, have been widely deployed as the buffer to HDDs in many production HPC systems. Burst buffer has also been proposed to manage the SSD buffering of bursty write requests. Although burst buffer can improve I/O performance in many cases, we find that it has some limitations such as requiring large SSD capacity and harmonious overlapping between computation phase and data flushing stage. In this paper, we propose a scheme, called SSDUP (a traffic-aware SSD burst buffer), to improve the burst buffer by addressing the above limitations. In order to reduce the SSD capacity demand, we develop a novel traffic-detection method to detect the randomness in the write traffic. Based on this method, only the random writes are buffered to SSD and other writes are deemed sequential and propagated to HDDs directly. In order to overcome the difficulty of perfectly overlapping the computation phase and the flushing stage, we propose a pipeline mechanism for the SSD buffer, in which the data buffering and data flushing are performed in pipeline. Finally, in order to further improve the performance of buffering random writes in SSD, we covert the random writes to sequential writes in SSD by storing the data with a log structure. Further, we propose to use the AVL tree structure to store the sequence information of the data. We have implemented a prototype of SSDUP based on the OrangeFS and performed extensive experimental evaluation. The experimental results show that the proposed SSDUP scheme can improve the write performance by more than 50% on average.
What problem does this paper attempt to address?