Python Implementation of the Dynamic Distributed Dimensional Data Model
Hayden Jananthan,Lauren Milechin,Michael Jones,William Arcand,William Bergeron,David Bestor,Chansup Byun,Michael Houle,Matthew Hubbell,Vijay Gadepally,Anna Klein,Peter Michaleas,Guillermo Morales,Julie Mullen,Andrew Prout,Albert Reuther,Antonio Rosa,Siddharth Samsi,Charles Yee,Jeremy Kepner
DOI: https://doi.org/10.1109/HPEC55821.2022.9926316
2022-11-23
Abstract:Python has become a standard scientific computing language with fast-growing support of machine learning and data analysis modules, as well as an increasing usage of big data. The Dynamic Distributed Dimensional Data Model (D4M) offers a highly composable, unified data model with strong performance built to handle big data fast and efficiently. In this work we present an implementation of D4M in Python. $<a class="link-external link-http" href="http://D4M.py" rel="external noopener nofollow">this http URL</a>$ implements all foundational functionality of D4M and includes Accumulo and SQL database support via Graphulo. We describe the mathematical background and motivation, an explanation of the approaches made for its fundamental functions and building blocks, and performance results which compare $<a class="link-external link-http" href="http://D4M.py" rel="external noopener nofollow">this http URL</a>$'s performance to D4M-MATLAB and <a class="link-external link-http" href="http://D4M.jl" rel="external noopener nofollow">this http URL</a>.
Distributed, Parallel, and Cluster Computing