Recent Advances in Parallel Computing and Distributed Network
Zhiyang Li,Keqiu Li
DOI: https://doi.org/10.1002/cpe.3512
2015-01-01
Abstract:As applications of computing systems have permeated in all aspects of daily life, the power of computing system has become increasingly critical, which offers many challenging problems on the area of efficiency, performance, reliability, security, and interoperability. New programming paradigms, interconnection networks, and storage systems have joined the traditional workflow and parallel computing technologies for the highest-performance systems. This special issue presents the recent advances in parallel computing and distributed network, which were selected out of the significantly extended versions of accepted papers in the 2014 World Ubiquitous Science Congress (U-Science 2014) 1, the 14th International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP 2014) 2, and a large number of open submissions. The selection has been very rigorous, and only the best papers were selected. Tang et al. 3 note that there exists the correlation of volunteer or desktop failures in Desktop Grid and Volunteer Computing Systems. To achieve long-term and sustained high throughput, they present a hybrid MapReduce (HybridMR) computing environment, in which the cluster nodes and the volunteer computing nodes are integrated. HybridMR includes two innovative solutions. The first one is a hybrid distributed file system to alleviate the volatility of desktop PCs. The second innovation is a new node priority-based fair scheduling algorithm to achieve both data storage balance and job assignment balance. They also provide performance evaluation on the I/O, fault-tolerance and cost-saving of HybridMR, denoting that new model is not only able to achieve a higher throughput and efficiency, but also able to achieve the “green computing” goal. Ji et al. 4 study the efficient and scalable RNN algorithms in the distributed environment. Noted that the major downside of the existing RNN is its inherent sequential nature and using in-memory algorithm, they firstly use the inverted grid, not the R-tree or Voronoi to index the data. It is proved that the grid increases opportunities for parallelism. Furthermore, two pruning ways Lazy-Scalable Reverse Nearest Neighbor (SRNN) and Eager-SRNN are proposed to improve the performance. Finally, they perform extensive experiments on both real and synthetic datasets, demonstrating that their methods outperform the state-of-the-art algorithms in scalable RNN queries. As in future extreme-scale systems, one compute node will have multiple accelerators. Dong et al. 5 make an attempt on this kind of programming clusters that have multiple Xeon Phi coprocessors in each compute node. To increase the efficiency, they present an offload programming approach that allows each coprocessor to run an independent sub-program, while bi-directional and asynchronous coprocessor–coprocessor data transfers are directly enabled by Intel's low-level APIs of COI and SCIF. They also present a hybrid programming strategy combining techniques such as Message Passing Interface (MPI), Open Multi-Processing (OpenMP), Coprocessor Offload Infrastructure (COI) and Symmetric Communication Interface (SCIF), thus extending their work to cover clusters with multi-coprocessor nodes. They also provide performance results of the proposed COI-SCIF approach running on Tianhe-2, in terms of both bandwidth benchmark measurements and time usages of a real-world 3D application. Topology structure is usually viewed as a big issue for interconnection networks. Zhang et al. 6 detailed studied one of the typical topology structures, hyper-star graph HS(2n, n), and found some interesting and attractive properties. More specifically, they show that the surface area of HS(2n; n) is . Furthermore, they prove that HS(2n, n) is isomorphic to the well-known middle cube, thus linking the Hamiltonicity of HS(2n, n) to that of the middle cube. Finally, they study the embedding properties of HS(2n, n) by showing that full binary trees can be embedded into the network with dilation 1, and an optimal algorithm is found for performing neighborhood broadcasting on HS(2n, n). Noting that the balance of performance and hardware costs becomes quite challenging in traditional hypercube-based topology structures, Qi et al. 7 build a new interconnection topology structure named EFH. Different from EH, some complementary edges are added to link a node with its farthest node of the Hypercube. Owe to these complementary edges, the authors investigate that the network diameter of EFH is about half of the diameter of EH. Furthermore, they design a more efficient routing algorithm and a load balancing algorithm used in their EFH structure. They also give a lot of strict proofs on the properties of EFH and analyze its fault tolerance capabilities such as fault diameter and cost effectiveness factor. Aiming at handling distance fraud attacks and relay attacks in anonymous RFID applications, Yang et al. 8 presents an improved Distance-Bounding Trust Protocol (DBTP). With the proposed DBTP, a tag can defend distance fraud from the malicious reader by the output of trust values. In addition, they deploy trusted third party architecture to provide anonymity for tags in anonymous RFID systems without requiring tag identifiers. This enabled DBTP to further defend relay attacks in anonymous RFID systems. Finally, they build a prototype of DBTP using commercial RFID readers to track off-the-shelf RFID tags and evaluate the performance of DBTP not only by theoretical analysis but also by a large set of extensive simulations. Cheng et al. 9 noted that in hybrid storage systems, the two-level cache DRAM and SSD typically use independent cache replacement policies, which makes cache resource management inefficient and lower the system performance. They propose a novel AMC replacement algorithm to deal with this problem. Compared with classic multi-level exclusive caching techniques, they introduce combined selective Promote and Demote operations to dynamically determine the level and keep “hot” data blocks in DRAM and SSD caches. Furthermore, they design an online method using probabilistic Promote and Demote values that are adjusted via the usage of blocks already cached. In experiments, they show the proposed AMC algorithm reduces average response time and increases SSD lifetime, compared with the traditional multi-level cache algorithms 2C-Least Recently Used (LRU), ind-LRU, and exc-LRU. We hope that you will enjoy reading these papers in this special issue. We would like to thank the authors for contributing their papers to this issue, and thank all the reviewers for their time and constructive reviews. Finally, we would like to thank the editors of Concurrency and Computation: Practice and Experience for providing this opportunity to publish this special issue.