Joint Source Selection And Transfer Optimization For Erasure Coding Storage System
Han Zhang,Xingang Shi,Yingya Guo,Haijun Geng,Zhiliang Wang,Xia Yin
DOI: https://doi.org/10.1109/PCCC.2017.8280455
2017-01-01
Abstract:With the deployment of big data applications, more and more data are stored in the online storage. Erasure coding storage system has been widely used by companies such as Google and Facebook, since it provides space-optimal data redundancy to protect against data loss. In erasure coding storage system, (n, k) MDS erasure code is used to divide file into n chunks. When a user want to access the file, any subset of k out of n chunks will be needed to reconstruct the file. In this case, how to select k out of n chunks and how to let the chunks transfer quickly become important problems. In this paper, we joint the two problems together to optimize. Our optimization goal is to minimize average file access time (FAT). To achieve this, we propose smallest load first heuristic to do source selection and design an online algorithm to reduce chunks transfer latency. Base on this, we design and implement D-Target, a centralized scheduler that tries to minimize average FAT in distributed erasure coding storage system. We then test D-Target's performance by trace-driven simulation. Results show that, for the trace of AT & T, D-Target performs 2.5x, 1.7x, 1.8x, 3.6x better than TCP, Aalo, Barrat and pFabric respectively.