Mining Infrequently-Accessed File Correlations In Distributed File System

Lihua Yu,Gang Chen,Jinxiang Dong
DOI: https://doi.org/10.1007/978-3-540-72524-4_65
2007-01-01
Abstract:File correlation mining, as a technique to enhance file system performance, can usually be exploited for many purposes such as to improve the effectiveness of cache, to optimize file layout, as well as to enable disk file prefetching. While most research works on file correlations focus on traditional stand-alone file systems, this paper investigates the problem of mining file correlations in a distributed environment. We present a parallel data mining algorithm called PFC-Miner (Parallel File Correlation Miner), which is based on Locality Sensitive Hashing. PFC-Miner can efficiently discover correlations between infrequently- accessed files which are more valuable for web applications. Experimental results show that PFC-Miner can efficiently discover file correlations in distributed file systems without compromiSmg the accuracy, and that the proposed approach has good scalability.
What problem does this paper attempt to address?