X10-ft

Chenning Xie,Zhijun Hao,Haibo Chen
DOI: https://doi.org/10.1145/2442992.2442994
2013-01-01
Abstract:The emergence of multicore machines has made exploiting parallelism a necessity to harness the abundant computing resources in both a single machine and clusters. This, however, may hinder programming productivities as threaded and distributed programming is hard to use correctly and concurrency/distributed bugs are hard to spot. Asynchronous partitioned global address space (APGAS) model is a programming model aiming at unifying programming for multicore and clusters at good productivity. Unfortunately, the current implementation of APGAS programming model lacks support for fault tolerance and a single transient failure may render hours to months of computation useless.
What problem does this paper attempt to address?