Extending the Nested Parallel Model to the Nested Dataflow Model with Provably Efficient Schedulers

David Dinh,Harsha Vardhan Simhadri,Yuan Tang
DOI: https://doi.org/10.1145/2935764.2935797
2016-01-01
Abstract:The nested parallel (a.k.a. fork-join) model is widely used for writing parallel programs. However, the two composition constructs, i.e. "||" (parallel) and ";" (serial), that comprise the nested-parallel model are insufficient in expressing "partial dependencies" in a program. We propose a new dataflow composition construct "↝" to express partial dependencies in algorithms in a processor- and cache-oblivious way, thus extending the Nested Parallel (NP) model to the Nested Dataflow (ND) model. We redesign several divide-and-conquer algorithms ranging from dense linear algebra to dynamic-programming in the ND model and prove that they all have optimal span while retaining optimal cache complexity. We propose the design of runtime schedulers that map ND programs to multicore processors with multiple levels of possibly shared caches (i.e, Parallel Memory Hierarchies) and prove guarantees on their ability to balance nodes across processors and preserve locality. For this, we adapt space-bounded (SB) schedulers for the ND model. We show that our algorithms have increased "parallelizability" in the ND model, and that SB schedulers can use the extra parallelizability to achieve asymptotically optimal bounds on cache misses and running time on a greater number of processors than in the NP model. The running time for the algorithms in this paper is O((∑ i=0 h-1 Q*(t;σ⋅ M i )⋅ C i )/p) on a p-processor machine, where Q* is the parallel cache complexity of task t, C i is the cost of cache miss at level-i cache which is of size M i , and σ∈(0,1) is a constant.
What problem does this paper attempt to address?