Genes in Humans and Mice: Insights from Deep learning of 777K Bulk Transcriptomes

Zheng Su,Mingyan Fang,Andrei Smolnikov,Fatemeh Vafaee,Marcel E Dinger,Emily Oates
DOI: https://doi.org/10.1101/2024.04.01.587517
2024-07-19
Abstract:Mice are widely used as animal models in biomedical research, favored for their small size, ease of breeding, and anatomical and physiological similarities to humans. However, discrepancies between mouse gene experimental results and the actual behavior of human genes are not uncommon, despite their shared DNA sequence similarity. This suggests that DNA sequence similarity does not always reliably predict functional similarity. On the other hand, RNA-level gene expression could offer additional information about gene function. In this study, we undertook characterization and inter-species comparison of human and mouse genes by applying innovative deep learning methodologies to a large dataset of 410K human and 366K mouse bulk RNA-seq samples. This was achieved by using gene representations from our Transformer-based GeneRAIN model. These gene representations aggregate information from large gene expression datasets, and provide insights beyond DNA sequence similarity. We identified 2,407 human-mouse homologous genes with high DNA similarity but distinct RNA characteristics, and showed that these genes are more likely to have differing disease/phenotype associations between the two species. Additionally, we found 3,070 homologous genes with low similarity at both the DNA and RNA levels, suggesting the highest risk of discrepancies in study results between the two species. We propose that this approach will support future decision making around whether the mouse will be an appropriate model for studying specific human genes, and whether the results of specific mouse gene studies are likely to be recapitulated in humans. Our methodological innovations offer valuable lessons for future deep learning applications in cross-species omics data. The interspecies gene relationship findings from our study also contribute valuable insights into the gene biology and evolution of the two species.
Bioinformatics
What problem does this paper attempt to address?