Automated Image Data Preprocessing with Deep Reinforcement Learning

Tran Ngoc Minh,Mathieu Sinn,Hoang Thanh Lam,Martin Wistuba
DOI: https://doi.org/10.48550/arXiv.1806.05886
2021-04-30
Abstract:Data preparation, i.e. the process of transforming raw data into a format that can be used for training effective machine learning models, is a tedious and time-consuming task. For image data, preprocessing typically involves a sequence of basic transformations such as cropping, filtering, rotating or flipping images. Currently, data scientists decide manually based on their experience which transformations to apply in which particular order to a given image data set. Besides constituting a bottleneck in real-world data science projects, manual image data preprocessing may yield suboptimal results as data scientists need to rely on intuition or trial-and-error approaches when exploring the space of possible image transformations and thus might not be able to discover the most effective ones. To mitigate the inefficiency and potential ineffectiveness of manual data preprocessing, this paper proposes a deep reinforcement learning framework to automatically discover the optimal data preprocessing steps for training an image classifier. The framework takes as input sets of labeled images and predefined preprocessing transformations. It jointly learns the classifier and the optimal preprocessing transformations for individual images. Experimental results show that the proposed approach not only improves the accuracy of image classifiers, but also makes them substantially more robust to noisy inputs at test time.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the inefficiency and potential ineffectiveness in the data pre - processing process, especially in image data pre - processing. Specifically, current data scientists need to manually decide which transformations to apply to a given image data set and their order based on their experience. This not only forms a bottleneck in actual data science projects, but may also lead to sub - optimal results because data scientists need to rely on intuition or trial - and - error methods to explore the possible image transformation space. To solve this problem, the paper proposes a framework based on deep reinforcement learning to automatically discover the optimal data pre - processing steps for training image classifiers. This framework can jointly learn the classifier and the optimal pre - processing transformation for individual images from a set of labeled images and predefined pre - processing transformations. Experimental results show that this method not only improves the accuracy of image classifiers, but also significantly enhances the robustness of the classifier against noisy inputs at test time.