Effect of Failure Propagation on Cold Vs. Hot Standby Tradeoff in Heterogeneous 1-out-of-n: G Systems.

Gregory Levitin,Liudong Xing,Hanoch Ben-Haim,Yuanshun Dai
DOI: https://doi.org/10.1109/tr.2014.2355514
IF: 5.883
2014-01-01
IEEE Transactions on Reliability
Abstract:This paper considers 1-out-of- N:G heterogeneous fault-tolerant systems that are designed with a mix of hot and cold standby redundancies to achieve the tradeoff between restoration and operation costs of standby elements. In such systems, the way in which the elements are distributed between hot and cold standby groups and the initiation sequence of all the cold standby elements can greatly affect the system reliability and mission cost. Therefore, it is significant to solve the optimal standby element distributing and sequencing problem (SE-DSP). The failure that occurs in a system element can propagate, causing the outage of other system elements, which complicates the solution to the SE-DSP problem. In this paper, we first propose a numerical method for evaluating the reliability and expected mission cost of 1-out-of- N:G systems with mixed hot and cold redundancy types and propagated failures. Two different failure propagation modes are considered: an element failure causing the outage of all the system elements, and an element failure causing the outage of only working or hot standby elements but not cold standby elements. A genetic algorithm is utilized as an optimization tool for solving the formulated SE-DSP problem, leading to a solution that can minimize the expected mission cost of the system while providing a desired level of the system reliability. Effects of the failure propagation probability on the system reliability, expected mission cost, as well as the optimization results are investigated. The suggested methodology can facilitate a reliability-cost tradeoff study of the considered systems, thus assisting in optimal decision making regarding the system's standby policy. Examples are provided for illustrating the considered problem as well as the proposed solution methodology.
What problem does this paper attempt to address?