The orchestration of Machine Learning frameworks with data streams and GPU acceleration in Kafka‐ML: A deep‐learning performance comparative

Antonio Jesús Chaves,Cristian Martín,Manuel Díaz
DOI: https://doi.org/10.1111/exsy.13287
IF: 3.3
2023-03-17
Expert Systems
Abstract:Machine Learning (ML) applications need large volumes of data to train their models so that they can make high‐quality predictions. Given digital revolution enablers such as the Internet of Things (IoT) and the Industry 4.0, this information is generated in large quantities in terms of continuous data streams and not in terms of static datasets as it is the case with most AI (Artificial Intelligence) frameworks. Kafka‐ML is a novel open‐source framework that allows the complete management of ML/AI pipelines through data streams. In this article, we present new features for the Kafka‐ML framework, such as the support for the well‐known ML/AI framework PyTorch, as well as for GPU acceleration at different points along the pipeline. This pipeline will be described by taking a real Industry 4.0 use case in the Petrochemical Industry. Finally, a comprehensive evaluation with state‐of‐the‐art deep learning models will be carried out to demonstrate the feasibility of the platform.
computer science, artificial intelligence, theory & methods
What problem does this paper attempt to address?