nuts-flow/ml: data pre-processing for deep learning

S. Maetschke,R. Tennakoon,C. Vecchiola,R. Garnavi
DOI: https://doi.org/10.48550/arXiv.1708.06046
2018-01-10
Abstract:Data preprocessing is a fundamental part of any machine learning application and frequently the most time-consuming aspect when developing a machine learning solution. Preprocessing for deep learning is characterized by pipelines that lazily load data and perform data transformation, augmentation, batching and logging. Many of these functions are common across applications but require different arrangements for training, testing or inference. Here we introduce a novel software framework named nuts-flow/ml that encapsulates common preprocessing operations as components, which can be flexibly arranged to rapidly construct efficient preprocessing pipelines for deep learning.
Machine Learning,Software Engineering
What problem does this paper attempt to address?
The problem this paper attempts to address is the inadequacy of data preprocessing capabilities in current deep learning frameworks. Specifically, existing deep learning frameworks excel in defining and training artificial neural networks but offer limited support for data preprocessing. Data preprocessing is a fundamental part of a machine learning task and is often the most time-consuming part, especially during the development and performance tuning stages. The paper points out that existing data preprocessing methods typically only support basic transformation and augmentation operations, while complex use cases (such as image patch generation, synchronized random augmentation and transformation of multiple images, etc.) lack direct support. Additionally, extending existing frameworks to achieve missing functionalities is often challenging, and the resulting pipelines usually lack readability. To address these challenges, the paper introduces a new software framework—nuts-flow/ml, which encapsulates common preprocessing operations as components that can be flexibly combined to quickly build efficient deep learning data preprocessing pipelines. This framework aims to simplify the implementation of data preprocessing steps and improve the readability and maintainability of the code.