Fault Tolerance : Design and Exploratory IdeasPerDiS Deliverable PDSR-97-009

P. Ferreira,Joao. C. Garciag
Abstract:In this article we describe the fault-tolerance architecture of the PerDiS platform. This architecture results from the work that was done in the rst six months at INESC with strong interaction with the other partners of the project (mainly INRIA-SOR and INRIA-SIRAC). We describe the overall fault-tolerance architecture and its integration within PerDiS, the interfaces and implementation provided in the preliminary platform, and the aspects that we intend to explore in the next months so we can support them in the intermediate and advanced platforms. 1 Introduction Cooperative engineering requires fault-tolerance software. Even in a local network , crashes and communication failures occur with non-negligible frequency. Such partial failures may cause inconsistencies in applications, with unpredictable results. Worse still, careless application of corrective measures may aggravate the inconsistencies rather than x them. In the presence of faults, the platform mechanisms must remain safe, and long-running applications should be able to make progress. Our fault-tolerance objectives are to: (i) reliably store persistent data on backing storage; (ii) replicate backing storage, and ensure consistency between replicas ; (iii) ensure transac-tional properties; (iv) support tentative updates; (v) provide checkpointing. Due to the nature of concurrent engineering work in a large-scale environment, data is heavily shared but update connicts are relatively uncommon. Optimistic transaction models are appropriate in this environment and are also well adapted to slow and unreliable network links. The fault-tolerance architecture described in this article takes into account the aspects mentioned above and resulted from the work done at INESC during the rst six months of the PerDiS project. This design was done with strong interaction with the other partners (mainly INRIA-SOR and INRIA-SIRAC).
What problem does this paper attempt to address?