CalmWPC: A Buffer Management to Calm Down Write Performance Cliff for NAND Flash-Based Storage Systems
Hui Sun,Guodong Chen,Jianzhong Huang,Xiao Qin,Weisong Shi
DOI: https://doi.org/10.1016/j.future.2018.08.014
IF: 7.307
2018-01-01
Future Generation Computer Systems
Abstract:NAND Flash-based solid state disks (i.e., SSDs) are widely applied in large-scale storage systems. However, NAND Flash is featured with the asymmetric read and write performance, high erase latency, and the limited number of program/erase cycles (P/Es). Under random write-intensive workloads, a garbage collection (i.e., GC) process inside SSDs causes write performance cliff, which causes high latency for I/O access and degrades SSD lifetime. In real-time transactional applications, such large write performance cliff affects the response time of I/O requests, thereby leading to serious critical errors in real-time applications. To handle this issue, we propose a buffer management strategy called CalmWPC to calm down SSD write performance cliff. CalmWPC seamlessly integrates a data cluster-based data management, a historical access-based prediction algorithm, a semantic fingerprint database. The prediction algorithm checks the future data-cluster activity while classifying the cluster based on its historical write operations. The fingerprint database stores semantic messages for write/update between the buffer and NAND Flash memory. With the fingerprint database in place, CalmWPC calculates the number of invalid data pages in a block in real time. CalmWPC flushes the data cluster into flash memory when the number of update pages reaches a predefined threshold. Our CalmWPC optimizes write performance cliff during GC under random-write workloads. Experimental results reveal that CalmWPC is able to reduce write performance cliff, improve the average latency of user I/Os, and optimize write amplification. Take Financial1 as an example, CalmWPC reduces the write performance cliff by averages of 60.9% and 60.0% compared with LRU and CFLRU. CalmWPC also shortens the response time of LRU and CFLRU by averages of 69.4% and 70.1%, respectively. (C) 2018 Elsevier B.V. All rights reserved.