Unsupervised machine learning reveals temporal components of gene expression in HeLa cells following release from cell cycle arrest

Tom Maimon,Yaron Trink,Jacob Goldberger,Tomer Kalisky
DOI: https://doi.org/10.1101/2022.07.13.499875
2024-09-21
Abstract:Gene expression measurements of tissues, tumors, or cell lines taken over multiple time points are valuable for describing dynamic biological phenomena such as the response to growth factors. However, such phenomena typically involve multiple biological processes occurring in parallel, making it difficult to identify and discern their respective contributions at any time point. Here, we demonstrate the use of unsupervised machine learning to deconvolve a series of time-dependent gene expression measurements into its underlying temporal components. We first downloaded publicly available RNAseq data obtained from synchronized HeLa cells at consecutive time points following release from cell cycle arrest. Then, we used Fourier analysis and Topic modeling to reveal three underlying components and their relative contributions at each time point. We identified two temporal components with oscillatory behavior, corresponding to the G1-S and G2-M phases of the cell cycle, and a third component with a transient expression pattern, associated with the immediate-early response gene program, regulation of cell proliferation, and cervical cancer. This study demonstrates the use of unsupervised machine learning to identify hidden temporal components in biological systems, with potential applications to early detection and monitoring of diseases and recovery processes.
Genomics
What problem does this paper attempt to address?