LM-IGTD: a 2D image generator for low-dimensional and mixed-type tabular data to leverage the potential of convolutional neural networks

Vanesa Gómez-Martínez,Francisco J. Lara-Abelenda,Pablo Peiro-Corbacho,David Chushig-Muzo,Conceicao Granja,Cristina Soguero-Ruiz
2024-04-26
Abstract:Tabular data have been extensively used in different knowledge domains. Convolutional neural networks (CNNs) have been successfully used in many applications where important information about data is embedded in the order of features (images), outperforming predictive results of traditional models. Recently, several researchers have proposed transforming tabular data into images to leverage the potential of CNNs and obtain high results in predictive tasks such as classification and regression. In this paper, we present a novel and effective approach for transforming tabular data into images, addressing the inherent limitations associated with low-dimensional and mixed-type datasets. Our method, named Low Mixed-Image Generator for Tabular Data (LM-IGTD), integrates a stochastic feature generation process and a modified version of the IGTD. We introduce an automatic and interpretable end-to-end pipeline, enabling the creation of images from tabular data. A mapping between original features and the generated images is established, and post hoc interpretability methods are employed to identify crucial areas of these images, enhancing interpretability for predictive tasks. An extensive evaluation of the tabular-to-image generation approach proposed on 12 low-dimensional and mixed-type datasets, including binary and multi-class classification scenarios. In particular, our method outperformed all traditional ML models trained on tabular data in five out of twelve datasets when using images generated with LM-IGTD and CNN. In the remaining datasets, LM-IGTD images and CNN consistently surpassed three out of four traditional ML models, achieving similar results to the fourth model.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to convert low - dimensional and mixed - type (including numerical and categorical features) tabular data into images, so as to fully utilize the potential of convolutional neural networks (CNNs) in image classification. Specifically, the authors propose a new method - Low - Dimensional Mixed - type Tabular Data Generator (LM - IGTD), aiming to overcome the limitations of existing tabular - to - image conversion methods when dealing with low - dimensional and mixed - type data. ### Main Problem Summary: 1. **Limitations of Tabular - to - Image Conversion**: - Existing tabular - to - image conversion methods are mainly applicable to high - dimensional data and perform poorly when dealing with low - dimensional data. - Mixed - type data (i.e., data containing both numerical and categorical features simultaneously) face challenges during the conversion process, such as missing values, data sparsity, and the co - existence of different types of features. 2. **Taking Advantage of CNNs**: - CNNs perform excellently in image classification tasks and are able to capture spatial relationships in data and provide better prediction performance. - By converting tabular data into images, the powerful functions of CNNs can be utilized to improve the effectiveness of prediction tasks (such as classification and regression). 3. **Enhancing Interpretability**: - The proposed method not only improves prediction performance but also enhances the interpretability of the model. By establishing a mapping between the original features and the generated images and applying post - hoc interpretability methods (such as Grad - CAM), key regions in the images can be identified, thereby improving the understanding of prediction results. ### Solutions: - **LM - IGTD Method**: Combine the random feature generation process and the improved IGTD method to generate 2D images suitable for CNN input. - **Automatic and Interpretable End - to - End Pipeline**: Generate images from tabular data and identify important regions in the images through interpretability techniques (such as Grad - CAM). - **Evaluation and Validation**: Extensive experiments were carried out on 12 low - dimensional and mixed - type datasets to verify the effectiveness of this method. The experimental results show that the images generated using LM - IGTD and the CNN model significantly outperform traditional machine - learning models on multiple datasets. Through these innovations, the paper provides a new perspective for the processing of low - dimensional and mixed - type tabular data, demonstrating how to effectively utilize deep - learning techniques to improve prediction performance and model interpretability.