Storage Allocation for Multi-Class Distributed Data Storage Systems

Koosha Pourtahmasi Roshandeh,Moslem Noori,Masoud Ardakani,Chintha Tellambura
DOI: https://doi.org/10.48550/arXiv.1701.06506
2017-01-24
Abstract:Distributed storage systems (DSSs) provide a scalable solution for reliably storing massive amounts of data coming from various sources. Heterogeneity of these data sources often means different data classes (types) exist in a DSS, each needing a different level of quality of service (QoS). As a result, efficient data storage and retrieval processes that satisfy various QoS requirements are needed. This paper studies storage allocation, meaning how data of different classes must be spread over the set of storage nodes of a DSS. More specifically, assuming a probabilistic access to the storage nodes, we aim at maximizing the weighted sum of the probability of successful data recovery of data classes, when for each class a minimum QoS (probability of successful recovery) is guaranteed. Solving this optimization problem for a general setup is intractable. Thus, we find the optimal storage allocation when the data of each class is spread minimally over the storage nodes, i.e. minimal spreading allocation (MSA). Using upper bounds on the performance of the optimal storage allocation, we show that the optimal MSA allocation approaches the optimal performance in many practical cases. Computer simulations are also presented to better illustrate the results.
Information Theory
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to efficiently allocate storage resources in multi - class distributed storage systems (DSS) to meet the quality - of - service (QoS) requirements of different data classes. Specifically, the paper focuses on maximizing the weighted sum of the successful recovery probabilities of all data classes under the premise of ensuring the minimum successful recovery probability for each data class. The weights here reflect the QoS requirements of each data class. ### Background and Problem Definition Distributed storage systems (DSS) redundantly store different types of data on multiple storage nodes in the network, providing a reliable and scalable solution for storing large amounts of data from various sources. Due to the heterogeneity of these data sources, there are usually multiple data classes in DSS, and each class requires a different QoS level. Therefore, efficient storage and retrieval processes are required to meet various QoS requirements. ### Research Objectives The paper studies the storage allocation problem, that is, how to distribute different classes of data on the set of storage nodes in DSS. Specifically, assuming that the access to storage nodes is probabilistic, the goal is to maximize the weighted sum of the successful recovery probabilities of all data classes while ensuring the minimum successful recovery probability for each class. ### Main Contributions 1. **Problem Modeling**: - The paper first defines a system model of multi - class DSS, which includes a storage model and an access model. - The storage model describes how to encode and allocate different classes of data to storage nodes. - The access model adopts a probabilistic access model, that is, the server attempts to access all storage nodes, but the probability of successful access to each node is \( p \). 2. **Optimization Problem**: - The paper formalizes the storage allocation problem as a nonlinear integer optimization problem, with the goal of maximizing the weighted sum of the successful recovery probabilities of all data classes. - To simplify the problem, the paper focuses on minimizing spread allocation (MSA) and proposes an iterative algorithm to solve the optimal MSA. 3. **Low - Complexity Approximate Solution**: - The paper also proposes a low - complexity approximate solution method, whose worst - case time complexity is \( O(K) \) and whose performance is close to the optimal solution in many cases. 4. **Performance Analysis**: - The paper verifies the effectiveness of the proposed method through numerical simulation and analyzes the performance gap between MSA and general optimal storage allocation. - A theorem is proposed to determine the situation where the performance of MSA is close to perfect recovery within certain access probability ranges. ### Conclusion Through theoretical analysis and numerical simulation, the paper proves that in multi - class DSS, using minimizing spread allocation (MSA) can effectively maximize the weighted sum of the successful recovery probabilities of all data classes under the premise of ensuring the minimum successful recovery probability for each data class. This provides an important theoretical basis and practical guidance for designing efficient and reliable distributed storage systems.