Design and implementation of an availability assessment system for transaction processing-oriented fault tolerant computers
Yi Feng,Decheng Zuo,Zhan Zhang,Haiying Zhou,Xiaozong Yang
DOI: https://doi.org/10.3772/j.issn.1002-0470.2012.09.004
2012-01-01
Abstract:To overcome the limitation in sample system number and test period during the availability test for a transaction processing-oriented fault tolerant computer, an availability assessment method was proposed and a corresponding assessment system was realized. The availability assessment system consists of a multi-level fault injection platform, an application workloads simulator and an availability assessment toolkit. The fault injection platform is designed for automatically injecting various fault-loads into target systems in batches. The application workloads simulator can generate transactions launched by end-users and send them to target systems as workloads. The availability assessment toolkit is designed for several tests, including reliability relationship test among functional subsystems, reliability relationship test among field replaceable units (FRUs), redundancy test of different kind of FRUs, mean time to recovery (MTTR) test, and availability validation test. The evaluation results of the tests on HP Superdome fault-tolerant server accord with official documents, which proves the effectiveness of the assessment system. This research is important for computer manufacturers to predict availability metric and it is also important for end-users to verify system availability.