Repairable Block Failure Resilient Codes

Gokhan Calis,O. Ozan Koyluoglu
DOI: https://doi.org/10.48550/arXiv.1406.7264
2014-06-28
Abstract:In large scale distributed storage systems (DSS) deployed in cloud computing, correlated failures resulting in simultaneous failure (or, unavailability) of blocks of nodes are common. In such scenarios, the stored data or a content of a failed node can only be reconstructed from the available live nodes belonging to available blocks. To analyze the resilience of the system against such block failures, this work introduces the framework of Block Failure Resilient (BFR) codes, wherein the data (e.g., file in DSS) can be decoded by reading out from a same number of codeword symbols (nodes) from each available blocks of the underlying codeword. Further, repairable BFR codes are introduced, wherein any codeword symbol in a failed block can be repaired by contacting to remaining blocks in the system. Motivated from regenerating codes, file size bounds for repairable BFR codes are derived, trade-off between per node storage and repair bandwidth is analyzed, and BFR-MSR and BFR-MBR points are derived. Explicit codes achieving these two operating points for a wide set of parameters are constructed by utilizing combinatorial designs, wherein the codewords of the underlying outer codes are distributed to BFR codeword symbols according to projective planes.
Information Theory
What problem does this paper attempt to address?