Research and Implementation of Gene Chip Data Analysis and Visualization Based on Machine Learning
Jiajun Li,Libo Huai,Zhenhua Lin,Minghua Cui
DOI: https://doi.org/10.1109/CISP-BMEI48845.2019.8965830
2019-01-01
Abstract:In order to extract the law and mining knowledge from the data accumulated in a large number of biological experiments, this paper proposes a genetic data visualization method based on R language and machine learning algorithm. Firstly, the original chip expression data, gene annotation information, chip grouping information and clinical data are collected; the background correction, normalization and log2 conversion are used to calculate the expression value; and the KNN algorithm is filled with the missing value to perform data preprocessing to obtain a gene expression matrix. Secondly, a hierarchical clustering algorithm is proposed, which gradually clusters gene expression matrices to obtain differential gene results, and solves the problem that biomedical experimental results cannot be expressed from the data. Finally, taking psoriasis as an example, the experiment was carried out in R language, and the gene expression differences of different groups of psoriasis were obtained, and the heat map and volcano map were visualized. The results show that the genetic data visualization method proposed in this paper can more clearly express the number of differential genes in the gene chip data, and it is more convenient to screen out the differential genes in different groups. Provide clues and directions for subsequent bioinformatics analysis such as correlation analysis, protein interaction analysis, and signal pathway analysis for clinical pathological data.