Transcriptome Complexity Disentangled: A Regulatory Molecules Approach

Amir Asiaee,Zachary B. Abrams,Heather H. Pua,Kevin R. Coombes
DOI: https://doi.org/10.1101/2023.04.17.537241
2024-05-09
Abstract:Transcription factors (TFs) and microRNAs (miR-NAs) are fundamental regulators of gene expression, cell state, and biological processes. This study investigated whether a small subset of TFs and miRNAs could accurately predict genome-wide gene expression. We analyzed 8895 samples across 31 cancer types from The Cancer Genome Atlas and identified 28 miRNA and 28 TF clusters using unsupervised learning. Medoids of these clusters could differentiate tissues of origin with 92.8% accuracy, demonstrating their biological relevance. We developed Tissue-Agnostic and Tissue-Aware models to predict 20,000 gene expressions using the 56 selected medoid miR-NAs and TFs. The Tissue-Aware model attained an of 0.70 by incorporating tissue-specific information. Despite measuring only 1/400th of the transcriptome, the prediction accuracy was comparable to that achieved by the 1000 landmark genes. This suggests the transcriptome has an intrinsically low-dimensional structure that can be captured by a few regulatory molecules. Our approach could enable cheaper transcriptome assays and analysis of low-quality samples. It also provides insights into genes that are heavily regulated by miRNAs/TFs versus alternative mechanisms. However, model transportability was impacted by dataset discrepancies, especially in miRNA distribution. Overall, this study demonstrates the potential of a biology-guided approach for robust transcriptome representation.
Bioinformatics
What problem does this paper attempt to address?