Abstract:It's important to hit a space-time balance for a real-world algorithm to achieve high performance on modern shared-memory multicore and many-core systems. However, a large class of dynamic programs with more than O(1) dependency achieved optimality either in space or time, but not both. In the literature, the problem is known as the fundamental space-time tradeo, We propose the notion of "Processor-Adaptiveness". In contrast to the prior "Processor-Awareness", our approach does not partition statically the problem space to the processor grid, but uses the processor count P to just upper bound the space and cache requirement in a cache-oblivious fashion. In the meantime, our processor-adaptive algorithms enjoy the full benefits of "dynamic load-balance", which is a key to achieve satisfactory speedup on a shared-memory system, especially when the problem dimension n is reasonably larger than P. By utilizing the "busy-leaves" property of runtime scheduler and a program managed memory pool that combines the advantages of stack and heap, we show that our STAR (Space-Time Adaptive and Reductive) technique can help these dynamic programs to achieving sublinear time bounds while keeping to be asymptotically work-, space-, and cache-optimal. The key achievement of this paper is to obtain the first sublinear O(n(3/4) logn) time and optimal O(n(3)) work GAP algorithm; If we further bound the space and cache requirement of the algorithm to be asymptotically optimal, there will be a factor of P increase in time bound without sacrificing the work bound. If P = o(n(1/4) / log n), the time bound stays sublinear and may be a better tradeo, between time and space requirements in practice.

A parallel O(n27n/8) time-memory-processor tradeoff for Knapsack-like problems

Parallel O(n2 (7n/8)) time-memory-processor tradeoff for knapsack-like problems

A Parallel Time-Memory-Processor Tradeoff O(2~(5n/6)) for Knapsack-Like NP-Complete Problems

A Parallel Time-Memory-Processor Tradeoff O(25n/6) for Knapsack-Like NP-Complete Problems

Adaptive and Cost-Optimal Parallel Algorithm for the 0-1 Knapsack Problem

Improved Parallel Three-List Algorithm for the Knapsack Problem without Memory Conflicts

Optimal Parallel Algorithm for the Knapsack Problem Without Memory Conflicts

An Optimal Parallel Algorithm for the Knapsack Problem Based on EREW

On the Parallel Computation for the Knapsack Problem

A Cost-Optimal Parallel Algorithm for the 0-1 Knapsack Problem and Its Performance on Multicore CPU and GPU Implementations.

An Adaptive Algorithm for the Knapsack Problem

A Parallel Algorithm by Sampling for the Knapsack Problem Based on MIMD Parallel Computers

A Parallel Algorithm For Solving The Knapsack Problem On The Cluster

Research on the Parallel Algorithm for the Knapsack Problem Based on Sampling and MIMD

Parallel Stateful Logic in RRAM: Theoretical Analysis and Arithmetic Design

Brief Announcement: Star (Space-Time Adaptive And Reductive) Algorithms For Dynamic Programming Recurrences With More Than O(1) Dependency

Balanced Partitioning of Several Cache-Oblivious Algorithms

In-Place Parallel-Partition Algorithms using Exclusive-Read-and-Write Memory

An Efficient Parallel Algorithm for Rectangular Packing Based on Bintree Expression

Processor-Aware Cache-Oblivious Algorithms✱

Rethinking Parallel Memory Access Pattern in Number Theoretic Transform Design