A General Scalable and Accurate Decentralized Level Monitoring Method for Large-Scale Dynamic Service Provision in Hybrid Clouds.
Yongquan Fu,Yijie Wang,Ernst Biersack
DOI: https://doi.org/10.1016/j.future.2012.11.001
IF: 7.307
2012-01-01
Future Generation Computer Systems
Abstract:Hybrid cloud computing combines private clouds with geographically-distributed resources from public clouds, desktop grids or in-house gateways to provide the most flexibility of each kind of cloud platforms. Service provisioning for wide-area applications such as cloud backup or cloud network games is sensitive to wide-area network metrics such as round trip time, bandwidth, or loss rates. In order to optimize the quality of the service provision in hybrid clouds, it is highly valuable for the hybrid clouds to collect detailed network metrics between participating nodes of the hybrid clouds. However, since nodes can be large-scale and dynamic, the network metrics may be diverse for different cloud services, it is challenging to increase the generality, scalability, accuracy, and the robustness of the measurement process. We propose a novel distributed level monitoring method HPM (Hierarchical Performance Measurement) satisfying these requirements. For each kind of network metric, HPM represents the degree of pairwise closeness with discrete level values inspired by the hierarchical clustering tree. HPM maps probed metric to discrete levels based on an existing distributed K-means clustering method that helps maximize the similarity of the network metric in the same level, which therefore optimizes the matching between pairwise levels and the real-world pairwise proximity. Furthermore, for scalability reasons, HPM computes the pairwise levels with decentralized coordinates. Each node independently maintains its low-dimensional coordinate based on a novel decentralized implementation of the Maximum Margin Matrix Factorization method, which optimizes the mapping between the network metrics and the level values. Simulation results for the round trip time, bandwidth, loss, and hop count metric confirm that HPM converges fast, is robust to parameter settings, scales well with increasing levels or system size, and adapts well to diverse metrics. A prototype deployment on the PlanetLab platform shows that HPM not only converges fast, but also incurs modest bandwidth costs. Finally, applying HPM to optimize the service provision of hybrid clouds shows that HPM can achieve close to optimal solutions.