EinDecomp: Decomposition of Declaratively-Specified Machine Learning and Numerical Computations for Parallel Execution

Daniel Bourgeois,Zhimin Ding,Dimitrije Jankov,Jiehui Li,Mahmoud Sleem,Yuxin Tang,Jiawen Yao,Xinyu Yao,Chris Jermaine
2024-10-04
Abstract:We consider the problem of automatically decomposing operations over tensors or arrays so that they can be executed in parallel on multiple devices. We address two, closely-linked questions. First, what programming abstraction should systems for tensor-based computing offer to enable such decompositions? Second, given that abstraction, how should such systems automatically decompose a tensor-based computation? We assert that tensor-based systems should offer a programming abstraction based on an extended Einstein summation notation, which is a fully declarative, mathematical specification for tensor computations. We show that any computation specified in the Einstein summation notation can be re-written into an equivalent tensor-relational computation, and this re-write generalizes existing notations of tensor parallelism such as "data parallel'' and "model parallel.'' We consider the algorithmic problem of optimally computing a tensor-relational decomposition of a graph of operations specified in our extended Einstein summation notation, and we experimentally show the value of the algorithm that we develop.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?