A Novel Scalable Architecture of Cloud Storage System for Small Files Based on P2P
Qi-fei Zhang,Xue-zeng Pan,Yan Shen,Wen-juan Li
DOI: https://doi.org/10.1109/clusterw.2012.27
2012-01-01
Abstract:Scalability and Latency are the two important performance indicators for the distributed file system, and Google and Apache have achieved a great success with GFS and HDFS when operating big files, but the latency is too long when reading and writing small-size files, because the concurrent I/O can't work for small files, besides the master node is difficult to extend in the cloud storage system with Master/Slave structure. In this paper, we propose a distributed cloud storage system based on P2P, where a central route node is introduced to improve the resource query efficiency, so clients can find data using only one message compared with Chord's log(N). The central routing node only stores the status and routing information of all data nodes, which are indexed by the Trie Tree structure, so query time meets the requirement of online query. The data nodes store file's content and file's metadata thus the system is easy to extend because the master node no longer needs to store the metadata. Clients can also cache the routing information, so the read and write time is reduced according to the Locality Principle. Experiments show that the reading and writing time is significantly reduced compared with Hadoop HDFS.