BSP-based Parallel Simplex Method

Wang Dong,Hu Baosheng,Peng Qinke,Tan Yudong

DOI: https://doi.org/10.1109/hpc.2000.843514

2000-01-01

Abstract:We introduce a BSP-based parallel simplex method algorithm and analyze its computational cost. Then we give some experimental results on a PC cluster and draw some conclusions.

What problem does this paper attempt to address?

Design of BS P Algorithms and Their Software on PC Clusters

彭勤科,许宏斌,谭煜东,胡保生

DOI: https://doi.org/10.3969/j.issn.1000-7180.2002.03.001

2002-01-01

Abstract:In this paper,we analyse the properties of BSP model a nd PC cluster.The principles of design of BSP algorithms and their software are given.We illustr ate the methods that use cost formula to evaluate BSP algorit hms a nd their software.The results show that it can get a good evaluation of p arallel algorithm by using cost formula of BSP model,and it is helpful for the design and optimization of parallel algorithms on PC clusters.
Research on BSP-Based Large Scale Linear Programming Parallel Algorithm

TIAN Yuan,PENG Qin-ke

DOI: https://doi.org/10.3969/j.issn.1673-629X.2005.12.004

2005-01-01

Abstract:There are some large-scale linear programming problems in many projects.Their computational process needs more working time and the efficiency is influenced.In this paper,present a BSP(bulk synchronous parallel) large scale linear programming parallel algorithm,which can be implemented on PC cluster.Investigate the cost function and speedup of this algorithm.An implementation has been tested on PC cluster developed by us using the Oxford BSPlib.The results show that the BSP algorithm is of great value to a large scale LP problem in practice.
Application of Parallel BSP Model in Real-time Cluster Computing System

XUE Hong-ye,LI Yan-jun,DU Hong

DOI: https://doi.org/10.3969/j.issn.1000-3428.2008.04.024

2008-01-01

Abstract:This paper analyses the characteristic of parallel computing model’s application in multi-source data processing,constructs the BSP model of real-time cluster computing system,designs granularity of multi-assignment data processing,and puts forward a kind of new method to realize BSP model in real-time cluster computing system.The appliance proves the validity of the algorithm.
Artificial Neural Network Algorithm Based on BSP

谭煜东,彭勤科,许宏斌,胡保生

DOI: https://doi.org/10.3969/j.issn.1000-3428.2002.01.024

2002-01-01

Abstract:In this paper,we present an artificial neural network algorithm Based on BSP (bulk synchronous parallel) which can be implemented on PC cluster. We investigate the cost function and speedup of this algorithm. An implementation has been tested on PC cluster developed by using the Oxford BSPlib. The results show that the BSP algorithm is of great value to a large scale ANN problem in practice.
Implementation of A Parallel Graph Partition Algorithm to Speed Up Bsp Computing

Shengmei Luo,Lixia Liu,Hongxu Wang,Bin Wu,Yang Liu

DOI: https://doi.org/10.1109/fskd.2014.6980928

2014-01-01

Abstract:The processing and mining information in large scale graph data have proven to be challenging. The bulk synchronous parallel (BSP) computing model is suitable for this task. In this paper, we implement the multi-level step-wise partitioning (MSP) algorithm in BSP programming model, and replace the original graph partition method. The results on both experimental data and real world data proved this improvement achieved better data locality, reduced communication between work nodes, and it made a better performance than the original method.
Parallel Simulated Annealing Using Simplex Method

Ya-Zhong Luo,Guo-Jin Tang

DOI: https://doi.org/10.2514/6.2004-4584

IF: 2.624

2006-01-01

AIAA Journal

Abstract:By combining the advantages of Simulated Annealing with that of Simplex Method, a new kind of parallel Simulated Annealing is proposed in this paper. The details of the parallel Simulated Annealing are described. Experiment results are presented for several mathematical example problems and a 10-bar truss structural optimization. Just using a very simple penalty function, the parallel Simulated Annealing obtains the best-know solutions for two nonlinear constrained optimization problems including Golinski’s speed reducer problem, besides its computational cost is the lowest. The results demonstrate that our algorithm is effective and efficient is solving both functional and structural optimization problems, and it is superior to genetic algorithm, evolutionary programming, chaos optimization algorithm, and sequential simulated annealing, etc. The theoretic analysis on the speedup of the parallel Simulated Annealing is made, which shows that speedup is much larger with the more complex optimization problem. The master/client parallel computation is implemented using MPI, and the experiment testifies the theoretic result.
Research on Parallel Computing Model of Bsp in the Nows Adapt Environment

AJ Song,QK Peng,BS Hu

DOI: https://doi.org/10.1109/icnnsp.2003.1279342

2003-01-01

Abstract:In this paper, the characteristic of parallel computing model of BSP and NOWS was analyzed. The research indicates that based from some algorithm that designed the rationalization parallel computing, the parallel computing model is fit with the environment. With the improvement of parallel computing algorithm based on NOWS of linear programming normal, the simplex method has obtained the best result to validate the conclusion.
A Parallel Method With Hybrid Algorithms For Mixed Integer Nonlinear Programming

Kai Zhou,Wei Wan,Xi Chen,Zhijiang Shao,Lorenz T. Biegler

DOI: https://doi.org/10.1016/B978-0-444-63234-0.50046-4

2013-01-01

Abstract:This study aims at improving the solution efficiency of Mixed Integer Nonlinear Programming (MINLP) through parallelism. Unlike most conventional parallel implementations of MINLP solvers, which utilize multi-threads to share the burden in the serial mode, the proposed method combines hybrid algorithms running on different threads. Two types of algorithms are designed in a parallel structure. One is the Quesada and Grossman's LP/NLP based branch and bound algorithm (QG); the other is Tabu Search (TS). The proposed method attempts to minimize the search space through continuous communication and exchange of intermediate results from each thread. Three kinds of information are exchanged between the two threads. First, the best solution in TS, if feasible, serves as a valid upper bound for QG. Second, new approximations which can further tighten the lower bound of QG can be generated at nodes provided by the TS. Third, strong branching in QG may fix some integer variables, which can help reduce the search space of TS. Both threads can thus benefit from the exchanged information in the hybrid method. Numerical results show that solution time can be greatly reduced for the tested MINLP. In addition, complexity analysis of the parallel approach suggests that the proposed method has the potential for superlinear speedup.
A Parallel Bsp Algorithm For Irregular Dynamic Programming

Malcolm Yoke Hean Low,Weiguo Liu,Bertil Schmidt

DOI: https://doi.org/10.1007/978-3-540-76837-1_19

2007-01-01

Abstract:Dynamic programming is a widely applied algorithm design technique in many areas such as computational biology and scientific computing. Typical applications using this technique are compute-intensive and suffer from long runtimes on sequential architectures. Therefore, several parallel algorithms for both fine-grained and coarse-grained architectures have been introduced. However, the commonly used data partitioning scheme can not be efficiently applied to irregular dynamic programming algorithms, i.e. dynamic programming algorithms with an uneven load density pattern. In this paper we present a tunable parallel Bulk Synchronous Parallel (BSP) algorithm for such kind of applications. This new algorithm can balance the workload among processors using a tunable block-cyclic data partitioning method and thus is capable of getting almost linear performance gains. We present a theoretical analysis and experimentally show that it leads to significant runtime savings for pairwise sequence alignment with general gap penalties using BSPonMPI on a PC cluster.
Massively Parallel SPMD Algorithm for Cluster Computing — Combining Genetic Algorithm with Uphill

Zhihui Du,Meng Ding,Sanli Li,Shuyou Li,Mengyue Wu,Jing Zhu

2001-01-01

Abstract:Genetic Algorithm (GA), which borrows the idea of Darwinian principle of natural selection, is a powerful global search and optimization method. This paper presents a SPMD(Single Program Multiple Data) algorithm which combines GA with local searching algorithm – uphill. The hybrid parallel method not only improves the convergence of GA but also accelerates the convergence speed of GA. Approximate solutions can be found quickly for complex optimization problems and more precise solutions can also be found by employing the same algorithm to fine-tune the approximate solutions. GA is an inherently [4] parallel algorithm. The SPMD algorithm exploits the parallelism of GA , at the same time, overcomes the premature and poor convergence properties of GA. The algorithm is applied on typical multiple local minima functions, TSP problem and an engineering computation problem QCBED on our selfdeveloped cluster system THNPSC-1. Experiments show that the algorithm is robust and it can find high quality solution with high speed.
Parallel L-BFGS-B Algorithm on GPU.

Yun Fei,Guodong Rong,Bin Wang,Wenping Wang

DOI: https://doi.org/10.1016/j.cag.2014.01.002

IF: 1.821

2014-01-01

Computers & Graphics

Abstract:Due to the rapid advance of general-purpose graphics processing unit (GPU), it is an active research topic to study performance improvement of non-linear optimization with parallel implementation on GPU, as attested by the much research on parallel implementation of relatively simple optimization methods, such as the conjugate gradient method. We study in this context the L-BFGS-B method, or the limited memory Broyden–Fletcher–Goldfarb–Shanno with boundaries, which is a sophisticated yet efficient optimization method widely used in computer graphics as well as general scientific computation. By analyzing and resolving the inherent dependencies of some of its search steps, we propose an efficient GPU-based parallel implementation of L-BFGS-B on the GPU. We justify our design decisions and demonstrate significant speed-up by our parallel implementation in solving the centroidal Voronoi tessellation (CVT) problem as well as some typical computing problems.
Computational Experience of an Interior-Point SQP Algorithm in a Parallel Branch-and-Bound Framework

Eva K. Lee,John E. Mitchell

DOI: https://doi.org/10.1007/978-1-4757-3216-0_13

2000-01-01

Abstract:An interior-point algorithm within a parallel branch-and-bound framework for solving nonlinear mixed integer programs is described. The nonlinear programming relaxations at each node are solved using an interior point SQP method. In contrast to solving the relaxation to optimality at each tree node, the relaxation is only solved to near-optimality. Analogous to employing advanced bases in simplex-based linear MIP solvers, a “dynamic” collection of warmstart vectors is kept to provide “advanced warmstarts” at each branch-and-bound node. The code has the capability to run in both shared-memory and distributed-memory parallel environments. Preliminary computational results on various classes of linear mixed integer programs and quadratic portfolio problems are presented.
Performance Analysis of Parallel Smoothed Particle Hydrodynamics on Multi-Core CPUs

Chen Wenbo,Yucheng Yao,Yang Zhang

DOI: https://doi.org/10.1109/cciot.2014.7062511

2014-01-01

Abstract:This paper presents a parallel SPH implementation on multi-core CPUs. The implementation uses a hash table to store particles data and divides the program code into 2 parts for parallelization. The first part has no data race, but the second part has data race. Then, the paper compares the running time and parallel speedup of each part to find the bottleneck of the parallel SPH program. The results show that the program can achieve linear speedup just with the first part to be parallelized when the search radius is large. And the second part has become a performance bottleneck only when the search radius is small enough (for each cell only contains one or two particles on average). We present a method to parallelize the second part without affecting the performance of the first part. The results show that our method can ease the performance bottleneck when the search radius is small.
Parallel Cluster-BFS and Applications to Shortest Paths

Letong Wang,Guy Blelloch,Yan Gu,Yihan Sun

2024-10-27

Abstract:Breadth-first Search (BFS) is one of the most important graph processing subroutines, especially for computing the unweighted distance. Many applications may require running BFS from multiple sources. Sequentially, when running BFS on a cluster of nearby vertices, a known optimization is using bit-parallelism. Given a subset of vertices with size $k$ and the distance between any pair of them is no more than $d$, BFS can be applied to all of them in total work $O(dm(k/w+1))$, where $w$ is the length of a word in bits and $m$ is the number of edges. We will refer to this approach as cluster-BFS (C-BFS). Such an approach has been studied and shown effective both in theory and in practice in the sequential setting. However, it remains unknown how this can be combined with thread-level parallelism. In this paper, we focus on designing efficient parallel C-BFS based on BFS to answer unweighted distance queries. Our solution combines the strengths of bit-level parallelism and thread-level parallelism, and achieves significant speedup over the plain sequential solution. We also apply our algorithm to real-world applications. In particular, we identified another application (landmark-labeling for the approximate distance oracle) that can take advantage of parallel C-BFS. Under the same memory budget, our new solution improves accuracy and/or time on all the 18 tested graphs.

Data Structures and Algorithms,Distributed, Parallel, and Cluster Computing
Hybrid SPMD Simulated Annealing Algorithm and Its Applications

都志辉,李三立,吴梦月,李树有,朱静

DOI: https://doi.org/10.3321/j.issn:0254-4164.2001.01.012

2001-01-01

Abstract:Simulated Annealing (SA) is a frequently used stochasticalgorithm to deal with combinatorial optimization problems and it converges with probability infinitely close to 1. However, this parameter sensitive algorithm causes long execution time which prevents it from being accepted for many real applications. Serial SA method has been discussed not only in pure algorithm research but also in many applications. How to parallelize the SA algorithm and how to improve its performance is what this paper concerns. In the research of complex parallel applications, we find that the features of different calculation phases are quite different. One algorithm can only suit for one special phase. So if only one pure algorithm is used in these applications, the performance is not high. The paper presents a hybrid SPMD (Single Program Multiple Data) algorithm which combines SA with local search algorithm——Downhill. The hybrid method not only keeps the convergence of SA but also improves the convergence speed of SA. Approximate solutions can be found quickly for complex optimization problems and more precise solutions can also be found by employing the same algorithm to fine-tune the approximate solutions. SA is an essential serial algorithm, but the SPMD algorithm breaks up the serial bottleneck of SA and in some range its performance scales up with the increase of processors. At the same time, the SPMD algorithm does not require careful choice of control parameters. Cluster computing is a new kind of parallel computing mode and has been used in many fields. It is chosen as our experimental environment not only because it is a typical parallel environment but also because it is available easily. The algorithm has been implemented on a cluster system THNPSC-1. Fives typical multiple maximum functions are used to test the SPMD algorithm. The results show that the algorithm can always find the best values. The application on a quantitative electron crystallography problem shows that the algorithm is robust and it can find high quality solution with high speed. The conclusion is that SA can be parallelized with high performance and for complex optimization problems, different methods can be combines together and in different phases, and different method can be used to speed up the optimization procedure.
Massively parallel simulated annealing embedded with downhill-a SPMD algorithm for cluster computing

Zhihui Du,Sanli Li,Shuyou Li,Mengyue Wu,Jing Zhu

DOI: https://doi.org/10.1109/IWCC.1999.810899

1999-01-01

Abstract:Simulated Annealing (SA) is a frequently used stochastic algorithm to deal with combinatorial optimization problems and it converges with probability infinitely close to 1. SA is an NP algorithm and the long executive time prevents it from being accepted for many real-time applications. This paper presents a SPMD (Single Program Multiple Data) algorithm which combines SA with local searching algorithm-downhill. The hybrid method not only keeps the convergence of SA but also improves the convergence speed of SA. Approximate solutions can be found quickly for complex optimization problems and more precise solutions can also be found by employing the same algorithm to fine-tune the approximate solutions. SA is an essential serial algorithm, but the SPMD algorithm breaks up the serial bottleneck of SA and its performance scales up linearly with the increase of processors, at the same time, the SPMD algorithm does not require careful choice of control parameters. Application cases show that the algorithm is robust and it can find high quality solution with high speed
BSPADMM: Block Splitting Proximal ADMM for Sparse Representation with Strong Scalability

Yidong Chen,Jingshan Pan,Zidong Han,Yonghong Hu,Meng Guo,Zhonghua Lu

DOI: https://doi.org/10.1007/s42514-023-00164-w

2024-01-01

CCF Transactions on High Performance Computing

Abstract:Sparse representation (SR) is a fundamental component of linear representation techniques and plays a crucial role in signal processing, machine learning, and computer vision. Most parallel methods for solving sparse representations rely on the alternating direction method of multipliers (ADMM). However, the classical 2-block ADMM or N-block ADMM often suffer from three problems: (1) solving the subproblem requires solving a linear system, (2) unsuitable sparse data structure for parallelization, and (3) unsatisfactory parallel efficiency and scalability performance. In this paper, we propose a parallel ADMM-based algorithm called block splitting proximal ADMM (BSPADMM). First, BSPADMM organizes the sparse signals in the compressed sparse columns (CSC) format, and each processor deals with them independently. Second, BSPADMM designs the proximal term that avoids solving a linear system of the subproblem during iterations. Its advantage is that the BSPADMM computes the subproblem by using sparse matrix–vector multiplication, without communication between processors. Third, each processor updates the size asynchronously, which eliminates the synchronization effort of adjusting the step size between processes. Thus, the communication overhead can be naturally reduced. Our experimental results on three datasets of varying scales show that BSPADMM outperforms state-of-the-art ADMM techniques, including the adaptive relaxed ADMM (ARADMM) and N-block ADMM, in terms of computing time and parallel efficiency. BSPADMM runs 1.64 times faster than the N-block ADMM, and the ratio grows to 8.27 times as the dataset size doubles. More importantly, the parallel efficiency of BSPADMM remains above 70% as the number of processors grows to 10,000, demonstrating strong scalability.
Research on Parallel Implementing of the MD Algorithm in Cluster

徐伟,李玉忱,王丽

DOI: https://doi.org/10.3969/j.issn.1000-3428.2002.03.041

2002-01-01

Abstract:In this paper, parallel implementing of the molecular dynamics algorithm in cluster has been discussed. With analyzing the principle of molecular dynamics, we has brought out a parallel molecular dynamics algorithm based on BSP model and analyzes its performance.
The IBiCGStab method on bulk synchronous parallel architectures

Laurence Tianruo Yang,Ruth E. Shaw

DOI: https://doi.org/10.1109/HPCSA.2002.1019147

2002-01-01

Abstract:In this paper, an improved version of the BiCGStab method for the solutions of large and sparse linear systems of equations with unsymmetric coefficient matrices is proposed. The method combines elements of numerical stability and parallel algorithm design without increasing the computational costs. The algorithm is derived such that all inner products of a single iteration step are independent and communication time required for inner product can be overlapped efficiently with computation time of vector updates. Therefore, the cost of global communication can be significantly reduced. In this paper, the bulk synchronous parallel (BSP) model is used to design a fully efficient, scalable and portable parallel proposed algorithm and to provide accurate performance prediction of the algorithm for a wide range of architectures including the Cray T3D, the Parsytec, and a cluster of workstations connected by an Ethernet. This performance model provides us useful insight in the time complexity of the method using only a few system dependent parameters based on a simple and accurate cost modelling. The theoretical performance prediction are compared with some preliminary measured timing results of a numerical application from ocean flow simulation.

BSP-based Parallel Simplex Method

Design of BS P Algorithms and Their Software on PC Clusters

Research on BSP-Based Large Scale Linear Programming Parallel Algorithm

Application of Parallel BSP Model in Real-time Cluster Computing System

Artificial Neural Network Algorithm Based on BSP

Implementation of A Parallel Graph Partition Algorithm to Speed Up Bsp Computing

Parallel Simulated Annealing Using Simplex Method

Research on Parallel Computing Model of Bsp in the Nows Adapt Environment

A Parallel Method With Hybrid Algorithms For Mixed Integer Nonlinear Programming

A Parallel Bsp Algorithm For Irregular Dynamic Programming

Massively Parallel SPMD Algorithm for Cluster Computing — Combining Genetic Algorithm with Uphill

Parallel L-BFGS-B Algorithm on GPU.

Computational Experience of an Interior-Point SQP Algorithm in a Parallel Branch-and-Bound Framework

Performance Analysis of Parallel Smoothed Particle Hydrodynamics on Multi-Core CPUs

Parallel Cluster-BFS and Applications to Shortest Paths

Hybrid SPMD Simulated Annealing Algorithm and Its Applications

Massively parallel simulated annealing embedded with downhill-a SPMD algorithm for cluster computing

BSPADMM: Block Splitting Proximal ADMM for Sparse Representation with Strong Scalability

Research on Parallel Implementing of the MD Algorithm in Cluster

The IBiCGStab method on bulk synchronous parallel architectures