Resilience in computer systems and networks

Kishor S. Trivedi,Dong Seong Kim,Rahul Ghosh
DOI: https://doi.org/10.1145/1687399.1687415
2009-01-01
Abstract:The term resilience is used differently by different communities. In general engineering systems, fast recovery from a degraded system state is often termed as resilience. Computer networking community defines it as the combination of trustworthiness (dependability, security, performability) and tolerance (survivability, disruption tolerance, and traffic tolerance). Dependable computing community defined resilience as the persistence of service delivery that can justifiably be trusted, when facing changes. In this paper, resilience definitions of systems and networks will be presented. Metrics for resilience will be compared with dependability metrics such as availability, performance, performability. Simple examples will be used to show quantification of resilience via probabilistic analytic models.
What problem does this paper attempt to address?