Fault-Tolerant Strategy for Topology Reconfiguration of Manycore Systems Based on Message Passing Model

Zixu Wu,Fangfa Fu,Yu Lu,Jingxiang Wang
2014-01-01
Abstract:System fault‐recovery time is a key objective for fault tolerance in manycore systems .To accelerate system recovery from faults ,a fast topology reconfiguration strategy is proposed for fault tolerance in message passing model based manycore systems .Firstly ,a mapping domain is defined for each core according to the fault condition of the physical topology and Hungarian algorithm is adopted for fast generation of the initial solution .Secondly ,by restricting twisted mappings ,Tabu search is employed to perform a fast optimization based on the initial solution and obtain the final reconfiguration mapping solution .Finally ,by updating the mapping table on each computational node according to the reconfiguration mapping solution and completing the topology reconfiguration , the core‐level fault tolerance of a manycore system is realized .The experimental results show that ,the proposed strategy is capable of finding an optimal topology reconfiguration solution rapidly and recovering the system successfully w hile maintaining low time overhead for fault tolerance .
What problem does this paper attempt to address?