Gene selection for single cell RNA-seq data via fuzzy rough iterative computation model

Zhaowen Li,Jie Zhang,Y. Wang,Fang Liu,Ching‐Feng Wen
DOI: https://doi.org/10.21203/rs.3.rs-2897501/v1
2023-01-01
Abstract:Abstract A single cell gene decision space (scgd-space) means a real-valued decision information system that the samples, features and information values are cells, genes and gene expression values where gene expression data (ge-data) is single cell RNA-seq data (scrs-data), respectively. Because scrs-data have the characteristics of small samples, high dimension and noise, the existing gene selection methods based on an equivalence relation are often powerless for scrs-data owing to the strictness of the equality between gene expression values. This study explores fuzzy rough iterative based gene selection in a single cell gene decision space. In order to overcome the strictness of the equality between gene expression values, the equality between gene expression values is replaced by the distance between gene expression values, and the fuzzy symmetric relation on the cell set of an scgd-space are first established with the help of “The relationship between gene expression values is fed back to the gene set”. In this fuzzy symmetric relation, two variable parameters are introduced: one controls the similarity between cells, the other dominates the distance between gene expression values. Then, fuzzy rough approximations in an scgd-space are introduced. Moreover, some evaluation functions such as fuzzy rough approximations and dependency functions are presented. Next, fuzzy rough iterative computation model (F RIC-model) is given and a gene selection algorithm based on this model is designed. This model applies the iterative computation strategy to define some evaluation functions such as fuzzy positive region and fuzzy dependency. At last, the designed algorithm is testified in several publicly open scrs-data sets to estimate its performance. The experimental results show that the designed algorithm is more effective than some existing algorithms, is fast and does not occupy too much memory.
What problem does this paper attempt to address?