CoT: a transformer-based method for inferring tumor clonal copy number substructure from scDNA-seq data

Furui Liu,Fangyuan Shi,Fang Du,Xiangmei Cao,Zhenhua Yu
DOI: https://doi.org/10.1093/bib/bbae187
IF: 9.5
2024-04-27
Briefings in Bioinformatics
Abstract:Single-cell DNA sequencing (scDNA-seq) has been an effective means to unscramble intra-tumor heterogeneity, while joint inference of tumor clones and their respective copy number profiles remains a challenging task due to the noisy nature of scDNA-seq data. We introduce a new bioinformatics method called CoT for deciphering clonal copy number substructure. The backbone of CoT is a Copy number Transformer autoencoder that leverages multi-head attention mechanism to explore correlations between different genomic regions, and thus capture global features to create latent embeddings for the cells. CoT makes it convenient to first infer cell subpopulations based on the learned embeddings, and then estimate single-cell copy numbers through joint analysis of read counts data for the cells belonging to the same cluster. This exploitation of clonal substructure information in copy number analysis helps to alleviate the effect of read counts non-uniformity, and yield robust estimations of the tumor copy numbers. Performance evaluation on synthetic and real datasets showcases that CoT outperforms the state of the arts, and is highly useful for deciphering clonal copy number substructure.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily addresses the issue of inferring tumor clone copy number substructures in single-cell DNA sequencing (scDNA-seq) data. Specifically: - **Research Background**: Although single-cell DNA sequencing technology can reveal intratumor heterogeneity, jointly inferring tumor clones and their respective copy number characteristics remains a challenge due to the noise present in scDNA-seq data. - **Proposed Method**: The paper introduces a new method based on Transformer—CoT (Copy number Transformer), designed to infer tumor clone copy number substructures from single-cell DNA sequencing data. - **Advantages of the Method**: - **Global Representation Learning**: CoT utilizes a multi-head attention mechanism to explore the correlations between different genomic regions, thereby capturing global features and creating latent embedding representations of cells. - **Joint Analysis**: CoT first infers cell subpopulations based on the learned embedding representations, then estimates single-cell copy numbers by jointly analyzing the read count data of cells within the same subpopulation, thus improving the robustness of copy number estimation. ### Main Contributions - **Addressing Technical Challenges**: By using a Transformer autoencoder, the method overcomes technical issues present in scDNA-seq data, such as amplification bias and low sequencing coverage. - **Performance Validation**: The superior performance of CoT is validated on both synthetic and real datasets, demonstrating its better performance on complex datasets compared to existing methods. In summary, the paper aims to develop a new deep learning method to accurately infer tumor clone copy number substructures, thereby enhancing the understanding of tumor evolution and heterogeneity.