Streaming Task Graph Scheduling for Dataflow Architectures

Tiziano De Matteis,Lukas Gianinazzi,Johannes de Fine Licht,Torsten Hoefler
2023-06-05
Abstract:Dataflow devices represent an avenue towards saving the control and data movement overhead of Load-Store Architectures. Various dataflow accelerators have been proposed, but how to efficiently schedule applications on such devices remains an open problem. The programmer can explicitly implement both temporal and spatial parallelism, and pipelining across multiple processing elements can be crucial to take advantage of the fast on-chip interconnect, enabling the concurrent execution of different program components. This paper introduces canonical task graphs, a model that enables streaming scheduling of task graphs over dataflow architectures. We show how a task graph can be statically analyzed to understand its steady-state behavior, and we use this information to partition it into temporally multiplexed components of spatially executed tasks. Results on synthetic and realistic workloads show how streaming scheduling can increase speedup and device utilization over a traditional scheduling approach.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
The paper aims to address the task graph scheduling problem on Dataflow Architectures. Specifically, the authors propose a new model—canonical task graphs—to describe applications and perform static analysis to understand their steady-state behavior. Through this method, they partition the task graph into multiple spatially executed task components and achieve temporal multiplexing between these components. The research results indicate that, compared to traditional scheduling methods, this streaming scheduling method can improve speedup and increase device utilization. The main contributions of the paper include: 1. Proposing the canonical task graph model to facilitate modeling and analysis of applications executed on abstract dataflow architectures; 2. Proposing a task scheduling algorithm that considers spatial and temporal multiplexing; 3. Deriving the bounds on the parallel execution time of task graphs; 4. Providing an algorithm that ensures deadlock-free execution in the presence of pipelined tasks.