Filtering and Matching of Data Blocks to Avoid Disk Bottleneck in De-duplication File System

jiajia zhang,xingjun zhang,runting zhao,xiaoshe dong
DOI: https://doi.org/10.1007/978-3-662-44491-7_6
2014-01-01
Abstract:Since the growing scale of data has generated huge redundancy, de-duplication which can eliminate redundancy and improve space utilization of storage device has been widely adopted. De-duplication filesystem can provide a unified interface to the upper application and implement inline de-duplication. In this paper, we design and implement FmdFS, a kernel-space de-duplication filesystem. Due to memory limitation, metadata of FmdFS is stored on disk group. Meanwhile a scale-adaptive binary tree filter is constructed in memory, which not only avoids access to the metadata on disk for searching fingerprints of most new data, but also records the groups where duplicate data is stored. In addition, FmdFS uses LRU hash cache, which holds the metadata group that has been recently accessed, to exploit locality to match the duplicate data to avoid access to the metadata on disk. In comparison with traditional de-duplication filesystems, FmdFS has the higher write performance.
What problem does this paper attempt to address?