Abstract:The explosion of data volumes generated by an increasing number of applications is strongly impacting the evolution of distributed digital infrastructures for data analytics and machine learning (ML). While data analytics used to be mainly performed on cloud infrastructures, the rapid development of IoT infrastructures and the requirements for low-latency, secure processing has motivated the development of edge analytics. Today, to balance various trade-offs, ML-based analytics tends to increasingly leverage an interconnected ecosystem that allows complex applications to be executed on hybrid infrastructures where IoT Edge devices are interconnected to Cloud/HPC systems in what is called the Computing Continuum, the Digital Continuum, or the Transcontinuum. Enabling learning-based analytics on such complex infrastructures is challenging. The large scale and optimized deployment of learning-based workflows across the Edge-to-Cloud Continuum requires extensive and reproducible experimental analysis of the application execution on representative testbeds. This is necessary to help understand the performance trade-offs that result from combining a variety of learning paradigms and supportive frameworks. A thorough experimental analysis requires the assessment of the impact of multiple factors, such as: model accuracy, training time, network overhead, energy consumption, processing latency, among others. This review aims at providing a comprehensive vision of the main state-of-the-art libraries and frameworks for machine learning and data analytics available today. It describes the main learning paradigms enabling learning-based analytics on the Edge-to-Cloud Continuum. The main simulation, emulation, deployment systems, and testbeds for experimental research on the Edge-to-Cloud Continuum available today are also surveyed. Furthermore, we analyze how the selected systems provide support for experiment reproducibility. We conclude our review with a detailed discussion of relevant open research challenges and of future directions in this domain such as: holistic understanding of performance; performance optimization of applications; efficient deployment of Artificial Intelligence (AI) workflows on highly heterogeneous infrastructures; and reproducible analysis of experiments on the Computing Continuum.

The orchestration of Machine Learning frameworks with data streams and GPU acceleration in Kafka‐ML: A deep‐learning performance comparative

Kafka-ML: Connecting the data stream with ML/AI frameworks

Online learning and continuous model upgrading with data streams through the Kafka-ML framework

Managing and Deploying Distributed and Deep Neural Models Through Kafka-ML in the Cloud-to-Things Continuum

An open source framework based on Kafka-ML for Distributed DNN inference over the Cloud-to-Things continuum

A Scalable Framework for Multilevel Streaming Data Analytics using Deep Learning

A new Apache Spark-based framework for big data streaming forecasting in IoT networks

A Data-Centric Optimization Framework for Machine Learning

On combining system and machine learning performance tuning for distributed data stream applications

KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics

STREAMLINE: A Simple, Transparent, End-To-End Automated Machine Learning Pipeline Facilitating Data Analysis and Algorithm Comparison

Exploring Real-Time Data Processing Using Big Data Frameworks

MLOps: Automatic, Zero-Touch and Reusable Machine Learning Training and Serving Pipelines

Distributed intelligence on the Edge-to-Cloud Continuum: A systematic literature review

On the Cost of Model-Serving Frameworks: An Experimental Evaluation

Reasonable Scale Machine Learning with Open-Source Metaflow

Machine Learning Pipelines with Modern Big Data Tools for High Energy Physics

Modular approach to data preprocessing in ALOHA and application to a smart industry use case

Apache Spark Streaming, Kafka and HarmonicIO: A Performance Benchmark and Architecture Comparison for Enterprise and Scientific Computing

Model-driven development of data intensive applications over cloud resources

KFIML: Kubernetes-Based Fog Computing IoT Platform for Online Machine Learning