Abstract:In recent years, a constant and fast information growing has characterized digital applications in the majority of real-life scenarios. Thus, a new information asset, namely Big Data, has been defined and lead to different challenges, mainly related to data storage, management and analysis. Focusing on the last challenge, several Big Data analytics techniques have been developed, based on Machine Learning and Deep Learning paradigms. When dealing with Big Data, traditional approaches often take a lot of time to produce even a single predictive model, due to the extremely high demand of computational resources. The design of approaches specifically oriented to Big Data is required to overcome these computational issues. Most solutions rely on the deployment of Big Data analytics infrastructures on a cluster of machines and/or on parallelization techniques. When deployment and parallelization apply to Machine Learning and Deep Learning, we can refer to the terms Distributed Machine Learning and Distributed Deep Learning, respectively. We here discuss the main principles and features of Distributed Machine Learning and Distributed Deep Learning frameworks. The main contribution of this work is a survey of solutions proposed in the literature, through the investigation of selected features and capabilities. In particular, the survey provides a comparative analysis according to the following classification criteria: implemented parallelization technique, supporting device, supported architecture, implemented communication mode, working mode, and class of algorithms. The paper also gives an overview of the most commonly used criteria and metrics for the performance evaluation of analyzed frameworks; finally, some emerging but promising optimization techniques are reviewed apart from our classification.

BigDL: A Distributed Deep Learning Framework for Big Data

BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster

XDL: an industrial deep learning framework for high-dimensional sparse data

Bigflow: A General Optimization Layer for Distributed Computing Frameworks

BDAP: A Big Data Analysis Platform Based on Spark

A Distributed Deep Representation Learning Model for Big Image Data Classification

HPDL: Towards a General Framework for High-performance Distributed Deep Learning.

Deep Learning on Operational Facility Data Related to Large-Scale Distributed Area Scientific Workflows

Deep Learning Model And Its Application In Big Data

A Distributed and Scalable Machine Learning Approach for Big Data

AutoDDL: Automatic Distributed Deep Learning With Near-Optimal Bandwidth Cost

Distributed Analytics For Big Data: A Survey

On Distributed Deep Network for Processing Large-Scale Sets of Complex Data

BDAP: A Data Mining Platform Based on Spark

Deep learning with big data: state of the art and development

Mobile Big Data Analytics Using Deep Learning and Apache Spark

ElasticDL: A Kubernetes-native Deep Learning Framework with Fault-tolerance and Elastic Scheduling

Accelerating Deep Learning Systems Via Critical Set Identification and Model Compression.

DISTRIBUTED HIGH-PERFORMANCE COMPUTING METHODS FOR ACCELERATING DEEP LEARNING TRAINING

A Survey on Deep Learning for Big Data

Design and implementation of DeepDSL: A DSL for deep learning