Fault Tolerance of Stateful Microservices for Industrial Edge Scenarios

Yuke Jia,Tiejun Wang,Tianbo Qiu,Xiaohan Zhang,Rui Wang,Tianyu Wo
DOI: https://doi.org/10.1109/jcc59055.2023.00013
2023-01-01
Abstract:Due to the ubiquitous increase of Industrial Internet of Things(IIoT) devices, there is a tendency to move some of the microservices-based applications from Cloud to Edge. However, edge devices are prone to node failures because of weak reliability, resulting in the loss of stateful microservices computing state, which may involve fault tolerance of stateful microservices. Moreover, the method of traditional mechanisms for microservices fault tolerance could not meet the real-time requirement. Within this context, based on stateful microservices characteristics, we propose a novel fault tolerant mechanism for IIoT Edge, which mainly consists of causal logging and distributed checkpoint algorithm. This fault recovery mechanism utilizes causal logging to record the nondeterministic events of microservices, and completes the state recovery of microservices by loading checkpoint and replaying log records, which achieves exactly-once guarantees for distributed microservices. In addition, a set of experiments was performed to evaluate the proposed mechanism by integration with Kubernetes. The results show that the proposed mechanism has less impact on service performance compared with other methods.
What problem does this paper attempt to address?