A brand-new clustering method and analysis system designed for revealing the truth of the high-dimension large data deciphered the complex composition structure of human brain endothelial cells from single-cell RNA sequence data

Boyong Wei
DOI: https://doi.org/10.1101/2023.06.14.544789
2023-06-15
Abstract:Abstract The clustering method is the key to high dimensional large data analysis, especially for single-cell NGS data in biological science and biomedicine sectors. Those data require a hierarchical clustering method to unveil important biological features including differentiation patterns, stem cell identifications, cell sub-type discovery, and so on. Traditional hierarchical clustering has several issues to be applied to large high-dimension data. There are a few new approaches invented recently trying to fill in the blank. However, these approaches were either based on low-dimension or down-sampled data after dimension reduction (Anibal et al., 2022) from methods like PCA or consumed an enormous amount of computing resources to get a massive number of layer levels with highly limited interpretable information. In order to create a practically available solution, I invented an entirely new hierarchical clustering method called the BW method which can be directly applied to high-dimension large data without a requirement for dimension reduction or massive computing resources. I applied BW clustering to six single-cell RNA sequence sample data. BW clustering brought deep insight into these sample data including sub-type, differentiation branch, cell state changes (development, aging process), and gene expression instability. BW-generated layers were very concise. For almost nineteen thousand cells, BW clustering only yielded 9 layers. An analysis system was created based on the BW clustering method which can unprecedentedly display the true form of high dimensional data space. The resource BW required is also very low as all the work done in this paper used a 16GB memory laptop only, making it easily accessible to researchers with limited computing resources. Overall, the BW clustering method represents a major advancement in high-dimensional large data analysis for biological and biomedical applications.
What problem does this paper attempt to address?