Comprehensive techniques for multi-tenant deep learning framework on a Hadoop YARN cluster

Seoungbeom Heo,Dae-Cheol Kang,Hyeounji Jang,Hyeock-Jin Lee,Minkyoung Cho,Jik-Soo Kim
DOI: https://doi.org/10.1007/s10586-022-03799-6
2022-11-17
Cluster Computing
Abstract:We have designed and implemented a new data processing framework called “MeLoN” (Multi-tenant dEep Learning framework On yarN) which aims to effectively support distributed deep learning applications that can show another type of data-intensive workloads in the YARN-based Hadoop ecosystem. MeLoN is developed as one of Hadoop YARN applications so that it can transparently co-host existing deep learning applications with other data processing workflows. In this paper, we present comprehensive techniques that can effectively support multiple deep learning applications in a Hadoop YARN cluster by leveraging fine-grained GPU over-provisioning policy and a high-performance parallel file system for data staging which can improve the overall system throughput. Through our extensive experiments based on the representative deep learning workloads, we demonstrate that MeLoN can make an effective convergence of deep learning and the big data platform Hadoop by employing YARN-based resource allocation and execution mechanisms for running distributed deep learning applications. We believe that MeLoN can bring many additional interesting research issues including profiling of expected GPU memory usages of deep learning applications, supporting more complicated deep learning related jobs based on queuing systems which can ultimately contribute to a new data processing framework in the YARN-based Hadoop ecosystem.
computer science, information systems, theory & methods
What problem does this paper attempt to address?