Can Agent Intelligence be used to Achieve Fault Tolerant Parallel Computing Systems?

Blesson Varghese,Gerard McKee,Vassil Alexandrov
DOI: https://doi.org/10.1142/S012962641100028X
2013-08-13
Abstract:The work reported in this paper is motivated towards validating an alternative approach for fault tolerance over traditional methods like checkpointing that constrain efficacious fault tolerance. Can agent intelligence be used to achieve fault tolerant parallel computing systems? If so, "What agent capabilities are required for fault tolerance?", "What parallel computational tasks can benefit from such agent capabilities?" and "How can agent capabilities be implemented for fault tolerance?" need to be addressed. Cognitive capabilities essential for achieving fault tolerance through agents are considered. Parallel reduction algorithms are identified as a class of algorithms that can benefit from cognitive agent capabilities. The Message Passing Interface is utilized for implementing an intelligent agent based approach. Preliminary results obtained from the experiments validate the feasibility of an agent based approach for achieving fault tolerance in parallel computing systems.
Distributed, Parallel, and Cluster Computing,Multiagent Systems
What problem does this paper attempt to address?