Analysis of cardiac single-cell RNA-sequencing data can be improved by the use of artificial-intelligence-based tools

Thanh Nguyen,Yuhua Wei,Yuji Nakada,Jake Y Chen,Yang Zhou,Gregory Walcott,Jianyi Zhang
DOI: https://doi.org/10.1038/s41598-023-32293-1
2023-04-26
Abstract:Single-cell RNA sequencing (scRNAseq) enables researchers to identify and characterize populations and subpopulations of different cell types in hearts recovering from myocardial infarction (MI) by characterizing the transcriptomes in thousands of individual cells. However, the effectiveness of the currently available tools for processing and interpreting these immense datasets is limited. We incorporated three Artificial Intelligence (AI) techniques into a toolkit for evaluating scRNAseq data: AI Autoencoding separates data from different cell types and subpopulations of cell types (cluster analysis); AI Sparse Modeling identifies genes and signaling mechanisms that are differentially activated between subpopulations (pathway/gene set enrichment analysis), and AI Semisupervised Learning tracks the transformation of cells from one subpopulation into another (trajectory analysis). Autoencoding was often used in data denoising; yet, in our pipeline, Autoencoding was exclusively used for cell embedding and clustering. The performance of our AI scRNAseq toolkit and other highly cited non-AI tools was evaluated with three scRNAseq datasets obtained from the Gene Expression Omnibus database. Autoencoder was the only tool to identify differences between the cardiomyocyte subpopulations found in mice that underwent MI or sham-MI surgery on postnatal day (P) 1. Statistically significant differences between cardiomyocytes from P1-MI mice and mice that underwent MI on P8 were identified for six cell-cycle phases and five signaling pathways when the data were analyzed via Sparse Modeling, compared to just one cell-cycle phase and one pathway when the data were analyzed with non-AI techniques. Only Semisupervised Learning detected trajectories between the predominant cardiomyocyte clusters in hearts collected on P28 from pigs that underwent apical resection (AR) on P1, and on P30 from pigs that underwent AR on P1 and MI on P28. In another dataset, the pig scRNAseq data were collected after the injection of CCND2-overexpression Human-induced Pluripotent Stem Cell-derived cardiomyocytes (CCND2hiPSC) into injured P28 pig heart; only the AI-based technique could demonstrate that the host cardiomyocytes increase proliferating by through the HIPPO/YAP and MAPK signaling pathways. For the cluster, pathway/gene set enrichment, and trajectory analysis of scRNAseq datasets generated from studies of myocardial regeneration in mice and pigs, our AI-based toolkit identified results that non-AI techniques did not discover. These different results were validated and were important in explaining myocardial regeneration.
What problem does this paper attempt to address?