Abstract:Cloud systems, which are typical cyber–physical systems, consist of physical nodes and virtualized facilities that collaborate to fulfill cloud computing services. The advent of visualization technology engenders resource sharing and service parallelism in cloud services, introducing novel challenges to system modeling. In this study, we construct a systematic model that concurrently evaluates system reliability, performance, and power consumption (PC) while delineating cloud service disruptions arising from random hardware and software failures. Initially, we depict system states using a birth–death process that accommodates resource sharing and service parallelism. Given the relatively concise service duration and regular failure distributions, we employ transient-state transition probabilities instead of steady-state analysis. The birth–death process effectively links system reliability, performance, and PC through service durations governed by service assignment decisions and failure/repair distributions. Subsequently, we devise a multistage sample path randomization method to estimate system metrics and other factors related to service availability. The findings highlight that the trade-off between performance and PC, under the umbrella of reliability guarantees, hinges on the equilibrium between service duration and unit power. To further delve into the subject, we formulate optimization models for service assignment and juxtapose optimal decisions under varying availability scenarios, workload levels, and service attributes. Numerical results indicate that service parallelism can improve performance and conserve energy when the workload remains moderate. However, as the workload escalates, the repercussions of resource sharing-induced performance loss become more pronounced due to resource capacity limitations. In cases where system availability is constrained, resource sharing should be approached cautiously to ensure adherence to deadline requirements. This study theoretically analyzes the interrelations among system reliability, performance, and PC, offering valuable insights for making informed decisions in cloud service assignments.

Quantitative Analysis Research on Survivability of Information Systems in Cloud

A Framework of Quantitative Analysis for Information System Survivability

Survivability Analysis For Information Systems

A framework for quantifying information system survivability

Survivability computation of networked information systems

Layered Computation for Information System Survivability

Reliability-Based Design Optimization for Cloud Migration

Qos-Aware Indiscriminate Volume Storage Cloud

Improving Failure Tolerance in Large-Scale Cloud Computing Systems

Reliability Analysis of Distributed Storage Systems Considering Data Loss and Theft

Service Oriented Resilience Strategy for Cloud Data Center

An Approach for Resiliency Quantification of Large Scale Systems

Metrics based Workload Analysis Technique for IaaS Cloud

SSUR: An Approach to Optimizing Virtual Machine Allocation Strategy Based on User Requirements for Cloud Data Center

Analyzing Survivability of Distributed Information System using SPN

Cloud-integrated cyber–physical systems: Reliability, performance and power consumption with shared-servers and parallelized services

Designing an Accounting Information Management System Using Big Data and Cloud Technology

A Solution for A Disaster Recovery Service System in Multi-cloud Environment

Performance, Fault-Tolerance and Scalability Analysis of Virtual Infrastructure Management System

A Secure And Reliable Hybrid Model For Cloud-Of-Clouds Storage Systems

How to Shutdown a Cloud: a DDoS Attack in a Private Infrastructure-As-a-service Cloud.