Cell reprogramming design by transfer learning of functional transcriptional networks

Thomas P. Wytock,Adilson E. Motter
DOI: https://doi.org/10.1073/pnas.2312942121
2024-03-08
Abstract:Recent developments in synthetic biology, next-generation sequencing, and machine learning provide an unprecedented opportunity to rationally design new disease treatments based on measured responses to gene perturbations and drugs to reprogram cells. The main challenges to seizing this opportunity are the incomplete knowledge of the cellular network and the combinatorial explosion of possible interventions, both of which are insurmountable by experiments. To address these challenges, we develop a transfer learning approach to control cell behavior that is pre-trained on transcriptomic data associated with human cell fates, thereby generating a model of the network dynamics that can be transferred to specific reprogramming goals. The approach combines transcriptional responses to gene perturbations to minimize the difference between a given pair of initial and target transcriptional states. We demonstrate our approach's versatility by applying it to a microarray dataset comprising >9,000 microarrays across 54 cell types and 227 unique perturbations, and an RNASeq dataset consisting of >10,000 sequencing runs across 36 cell types and 138 perturbations. Our approach reproduces known reprogramming protocols with an AUROC of 0.91 while innovating over existing methods by pre-training an adaptable model that can be tailored to specific reprogramming transitions. We show that the number of gene perturbations required to steer from one fate to another increases with decreasing developmental relatedness and that fewer genes are needed to progress along developmental paths than to regress. These findings establish a proof-of-concept for our approach to computationally design control strategies and provide insights into how gene regulatory networks govern phenotype.
Molecular Networks,Disordered Systems and Neural Networks,Machine Learning,Genomics
What problem does this paper attempt to address?
The main problem this paper attempts to address is the two major challenges in cell reprogramming design: the incompleteness of cell network knowledge and the combinatorial explosion of possible interventions. Specifically, the authors propose a transfer learning-based approach to control cell behavior through a pre-trained functional transcriptional network model, thereby overcoming these challenges. This approach can predict combinatorial interventions that minimize transcriptional differences between the initial and target states based on a large amount of gene perturbation data, thus achieving cell reprogramming. ### Main Problems 1. **Incompleteness of Cell Network Knowledge**: The current understanding of gene regulatory networks within cells is still incomplete, which limits our ability to design effective interventions. 2. **Combinatorial Explosion of Possible Interventions**: The large number of genes within cells results in a vast number of possible gene combination interventions, making it impractical to experimentally validate all combinations. ### Solution The authors developed a transfer learning-based approach that includes the following steps: 1. **Pre-training**: Pre-train a machine learning model using large-scale transcriptome data (such as gene expression profiles and RNA sequencing data) that can map transcriptional states to cell types. 2. **Functional Network Dynamic Modeling**: By calculating gene-gene correlation matrices and decomposing them into feature genes, select feature genes that best distinguish different cell types. 3. **Combinatorial Intervention Optimization**: Find the optimal interventions that guide the system to the target state through a linear combination of transcriptional responses to gene perturbations. 4. **Application-Specific Data Integration**: Integrate data specific to the application (such as gene perturbation experimental data) to further optimize the model for achieving specific cell reprogramming goals. ### Experimental Validation The authors used two large datasets (microarray dataset and RNA sequencing dataset) to validate their approach. The results show that this method can reproduce known cell reprogramming protocols and perform well in predicting new reprogramming strategies. Additionally, this method can provide insights into the dynamics of gene regulatory networks, helping to understand the mechanisms of cell phenotype changes. ### Significance This study provides new tools and methods for computationally designing cell reprogramming strategies, which is expected to accelerate the discovery of therapeutic targets for complex diseases and deepen the understanding of the dynamic properties of gene regulatory networks.