Combining machine learning algorithms and single-cell data to study the pathogenesis of Alzheimer’s disease

Wei Cui,Liang Zhang,Fang-Rui Zheng,Xi Huang Li,Gui-Lin Xie
DOI: https://doi.org/10.1101/2024.01.26.577320
2024-01-29
Abstract:Extracting valuable insights from high-throughput biological data of Alzheimer’s disease to enhance understanding of its pathogenesis is becoming increasingly important. We engaged in a comprehensive collection and assessment of Alzheimer’s microarray datasets GSE5281 and GSE122063 and single-cell data from GSE157827 from the NCBI GEO database. The datasets were selected based on stringent screening criteria: a P-value of less than 0.05 and an absolute log fold change (|logFC|) greater than 1. Our methodology involved utilizing machine learning algorithms, efficiently identified characteristic genes. This was followed by an in-depth immune cell infiltration analysis of these genes, gene set enrichment analysis (GSEA) to elucidate differential pathways, and exploration of regulatory networks. Subsequently, we applied the Connectivity Map (cMap) approach for drug prediction and undertook single-cell expression analysis. The outcomes revealed that the top four characteristic genes, selected based on their accuracy, exhibited a profound correlation with the Alzheimer’s disease (AD) group in terms of immune infiltration levels and pathways. These genes also showed significant associations with multiple AD-related genes, enhancing the potential pathogenic mechanisms through regulatory network analysis and single-cell expression profiling. Identified three subpopulations of astrocytes in late-stage of AD Prefrontal cortex dataset. Discovering dysregulation of the expression of the AD disease-related pathway maf/nrf2 in these cell subpopulations Ultimately, we identified a potential therapeutic drug score, offering promising avenues for future Alzheimer’s disease treatment strategies.
Bioinformatics
What problem does this paper attempt to address?
The problem this paper attempts to address is the study of the pathogenesis of Alzheimer's disease (AD) by combining machine learning algorithms and single-cell data, and identifying potential therapeutic targets. Specifically, the researchers obtained multiple microarray datasets related to Alzheimer's disease (such as GSE5281 and GSE122063) and single-cell data (such as GSE157827) from the NCBI GEO database. By rigorously screening these datasets (P-value < 0.05 and absolute log fold change |logFC| > 1), they identified differentially expressed genes and further screened key genes with high accuracy using machine learning algorithms (such as Lasso regression and support vector machine SVM algorithms). The research results revealed a close relationship between four key genes (CDC37, LOC100272216, MAFF, and MYL5) and the immune infiltration levels and signaling pathways of Alzheimer's disease. Additionally, through gene set enrichment analysis (GSEA), the researchers found that these genes were significantly enriched in various signaling pathways, providing new insights into the molecular mechanisms of Alzheimer's disease. Finally, the researchers validated the expression levels of the key genes through Western blotting and predicted potential therapeutic drugs using the Connectivity Map (cMap) method, offering new ideas for precision medicine in Alzheimer's disease.