Efficiently Recovering Stateful System Components of Multi-server Microkernels

Wentai Li,Jinyu Gu,Nian Liu,Binyu Zang
DOI: https://doi.org/10.1109/icdcs51616.2021.00054
2021-01-01
Abstract:Microkernel OSes provide OS services through mutually-isolated system servers running in different user processes, which brings stronger fault isolation than monolithic OSes. Nevertheless, considering the fault recovery capability of system servers, most existing microkernel OSes usually do no more than restarting a fault server, which will cause a server to lose all its running states and then may affect all the applications relying on it. In this paper, we present a mechanism named TxIPC that can efficiently recover stateful system servers on microkernel OSes. Since a system server provides the service by inter-process communication (IPC), TxIPC makes it fault resilient by handling each IPC in a transaction-like manner. Specifically, if a fault happens in a server (during one IPC handling procedure), TxIPC aborts all the updates made by the IPC and thus recovers the server from that fault. Evaluations show that TxIPC can enable servers to recover from 99.8% (injected) faults with 3% - 45% performance overhead on application benchmarks, which significantly outperforms existing counterparts.
What problem does this paper attempt to address?