An Autoencoder and Generative Adversarial Networks Approach for Multi-Omics Data Imbalanced Class Handling and Classification

Ibrahim Al-Hurani,Abedalrhman Alkhateeb,Salama Ikki

2024-05-16

Abstract:In the relentless efforts in enhancing medical diagnostics, the integration of state-of-the-art machine learning methodologies has emerged as a promising research area. In molecular biology, there has been an explosion of data generated from multi-omics sequencing. The advent sequencing equipment can provide large number of complicated measurements per one experiment. Therefore, traditional statistical methods face challenging tasks when dealing with such high dimensional data. However, most of the information contained in these datasets is redundant or unrelated and can be effectively reduced to significantly fewer variables without losing much information. Dimensionality reduction techniques are mathematical procedures that allow for this reduction; they have largely been developed through statistics and machine learning disciplines. The other challenge in medical datasets is having an imbalanced number of samples in the classes, which leads to biased results in machine learning models. This study, focused on tackling these challenges in a neural network that incorporates autoencoder to extract latent space of the features, and Generative Adversarial Networks (GAN) to generate synthetic samples. Latent space is the reduced dimensional space that captures the meaningful features of the original data. Our model starts with feature selection to select the discriminative features before feeding them to the neural network. Then, the model predicts the outcome of cancer for different datasets. The proposed model outperformed other existing models by scoring accuracy of 95.09% for bladder cancer dataset and 88.82% for the breast cancer dataset.

Machine Learning,Neural and Evolutionary Computing,Genomics

What problem does this paper attempt to address?

The paper aims to address two major issues faced by multi-omics data in cancer prediction: 1. **Dimensionality Reduction of High-Dimensional Data**: With the development of next-generation sequencing technology, multi-omics data has shown explosive growth, making it difficult for traditional statistical methods to handle such high-dimensional data. The paper proposes using autoencoders for feature extraction, thereby transforming the raw data into a low-dimensional space, retaining key information while reducing redundant features. 2. **Class Imbalance Problem**: Another common issue in medical datasets is the uneven distribution of sample classes, where the number of majority class samples greatly exceeds that of minority class samples, causing machine learning models to be biased towards the majority class, resulting in unreliable outcomes. To address this issue, the paper introduces Generative Adversarial Networks (GANs) to generate synthetic samples, thereby increasing the number of minority class samples and balancing the dataset. Through the aforementioned methods, the authors constructed a model capable of effectively handling multi-omics data and improving the accuracy of cancer classification. The model was validated on breast cancer (BRCA) and bladder cancer (BLCA) datasets, achieving results significantly better than existing methods.

An Autoencoder and Generative Adversarial Networks Approach for Multi-Omics Data Imbalanced Class Handling and Classification

Deep Convolutional Generative Adversarial Networks for Imbalance Medical Image Classification.

Imbalanced medical disease dataset classification using enhanced generative adversarial network

RN-Autoencoder: Reduced Noise Autoencoder for classifying imbalanced cancer genomic data

Multi-Task Generative Adversarial Network for Handling Imbalanced Clinical Data

A novel generative adversarial networks modelling for the class imbalance problem in high dimensional omics data

Multi-omics data integration by generative adversarial network

Augmenting healthy brain magnetic resonance images using generative adversarial networks

Cancer diagnosis using generative adversarial networks based on deep learning from imbalanced data

Enhancing Skin Disease Classification: A Novel Approach With Tailored Loss Functions And SMOTE Sumeet Ghumare

Deep convolutional neural networks with genetic algorithm-based synthetic minority over-sampling technique for improved imbalanced data classification

Synthetic Boosted Resampling Using Deep Generative Adversarial Networks: A Novel Approach to Improve Cancer Prediction from Imbalanced Datasets

Multi-Class Skin Problem Classification Using Deep Generative Adversarial Network (DGAN)

Gene Expression-Based Cancer Classification for Handling the Class Imbalance Problem and Curse of Dimensionality

Brain Tumor Classification Using a Combination of Variational Autoencoders and Generative Adversarial Networks

Integrated Multi-omics Analysis Using Variational Autoencoders: Application to Pan-cancer Classification

An Evolutional Neural Network Framework for Classification of Microarray Data

Generative Models Utilizing Padding Can Efficiently Integrate and Generate Multi-Omics Data

Generative Adversarial Networks for Data Augmentation

Multi-Label Classification of Lung Diseases Using Deep Learning

Comparative Analysis of Multi-Omics Integration Using Advanced Graph Neural Networks for Cancer Classification