Enhancing Gene Expression Representation and Drug Response Prediction with Data Augmentation and Gene Emphasis

Diyuan Lu,Daksh P.S. Pamar,Alexander J. Ohnmacht,Ginte Kutkaite,Michael P. Menden
DOI: https://doi.org/10.1101/2024.05.15.592959
2024-05-18
Abstract:Representation learning for tumor gene expression (GEx) data with deep neural networks is limited by the large gene feature space and the scarcity of available clinical and preclinical data. The translation of the learned representation between these data sources is further hindered by inherent molecular differences. To address these challenges, we propose GExMix (Gene Expression Mixup), a data augmentation method, that extends the Mixup concept to generate training samples accounting for the imbalance in both data classes and data sources. We leverage the GExMix-augmented training set in encoder-decoder models to learn a GEx latent representation. Subsequently, we combine the learned representation with drug chemical features in a dual-objective. enhanced gene-centric drug response prediction, i.e., reconstruction of GEx latent embeddings and drug response classification. This dual-objective design strategically prioritizes gene-centric information to enhance the final drug response prediction. We demonstrate that augmenting training samples improves the GEx representation, benefiting the gene-centric drug response prediction model. Our findings underscore the effectiveness of our proposed GExMix in enriching GEx data for deep neural networks. Moreover, our proposed gene-centricity further improves drug response prediction when translating preclinical to clinical datasets. This highlights the untapped potential of the proposed framework for GEx data analysis, paving the way toward precision medicine
Bioinformatics
What problem does this paper attempt to address?