Fault Detection for High Availability RAID System
Zhiming Liu,Jichang Sha,Xiaohua Yang,Yaping Wan
2010-01-01
Abstract:Designing storage systems to provide high availability in the face of failures needs the use of various data protection techniques, such as dual-controller RAID. The failure of controller may cause data inconsistencies of RAID storage system. Heartbeat is used to detect controllers whether survival. So, the heartbeat cycle's impact on the high availability of a dual-controller hot-standby system has become the key of current research. To address the problem of fixed setting heartbeat in building high availability system currently, an adaptive heartbeat fault detection model of dual controller, which can adjust heartbeat cycle based on the frequency of data read-write request, is designed to improve the high availability of dual-controller RAID storage system. Additionally, this heartbeat mechanism can be used for other applications in distributed settings such as detecting node failures, performance monitoring, and query optimization. Based on this model, the high availability stochastic Petri net model of fault detection was established and used to evaluate the effect of the availability. In addition, we define a AHA (Adaptive Heart Ability) parameter to scale the ability of system heartbeat cycle to adapt to the environment which is changing. The results show that, relatively speaking with fixed configuration, the design is valid and effective, and can enhance dual controller RAID system high availability.