Exploiting Data Dependency to Mitigate Stragglers in Distributed Spatial Simulation.
Eman Bin Khunayn,Shanika Karunasekera,Hairuo Xie,Kotagiri Ramamohanarao
DOI: https://doi.org/10.1145/3139958.3140018
2017-01-01
Abstract:Distributed spatial simulations commonly employ Bulk Synchronous Parallel model (BSP) implementation. However, implementations using BSP are usually fraught with the straggler problem, where the delay of any worker slows down the entire system. Random stragglers commonly occur due to many reasons: imbalanced workload, operating system scheduling, or communication delays. The straggler problem is further exasperated with increasing parallelism. To reduce the straggler problem and preserve simplicity and scalability advantages of the BSP model, we propose a new parallel model, which we call Priority Asynchronous Parallel (PAP) model. PAP exploits data dependencies of parallel processes to be computed and synchronized based on data priority to the other workers. For further computational improvement, we develop a load balancing and partitioning method, called GridGraph that utilizes the spatial and connectivity properties of the simulation space to reduce the size of exchanged data in addition to balancing the workload among workers. The proposed schemes are implemented and evaluated in a microscopic traffic simulator. Running traffic simulation for Melbourne, Beijing, and New York cities on 80 workers, the simulation achieves a performance speedup of around 47.4% for Melbourne, 52.18% for Beijing, and 65.84% for New York, using PAP model combined with GridGraph partitioning compared to BSP model.