MVCLST: A Spatial Transcriptome Data Analysis Pipeline for Cell Type Classification Based on Multi-View Comparative Learning

Wei Peng,Zhihao Zhang,Wei Dai,Zhihao Ping,Xiaodong Fu,Li Liu,Lijun Liu,Ning Yu
DOI: https://doi.org/10.1016/j.ymeth.2024.11.001
2024-01-01
Abstract:Recent advancements in spatial transcriptomics sequencing technologies can not only provide gene expression within individual cells or cell clusters (spots) in a tissue but also pinpoint the exact location of this expression and generate detailed images of stained tissue sections, which offers invaluable insights into cell type identification and cell function exploration. However, effectively integratingthegene expression data, spatial location information, and tissue images from spatial transcriptomics data presents a significant challenge for computational methodsin cell classification. In this work, we propose MVCLST, a multi-view comparative learningmethod to analyze spatial transcriptomicsdata for accurate cell type classification. MVCLSTconstructs two views based on gene expression profiles, cell coordinates and image features. The multi-view method we proposed can significantly enhance the effectiveness of feature extraction while avoiding the impact of erroneous information in organizing image or gene expression data. The model employs four separate encoders to capture shared and unique features within each view. To ensure consistency and facilitate information exchange between the two views, MVCLST incorporates a contrastive learning loss function. The extracted shared and private features from both views are fused using corresponding decoders. Finally, the model utilizes the Leiden algorithm to clusterthe learned featuresfor cell type identification. Additionally, we establish a framework called MVCLST-CCFS for spatial transcriptomicsdata analysis based on MVCLST and consistent clustering. Our method achieves excellent results in clustering on human dorsolateral prefrontal cortex data and the mouse brain tissue data. Italso outperforms state-of-the-art techniques in the subsequent search for highly variable genes across cell types on the mouse olfactory bulbdata.
What problem does this paper attempt to address?