An Algorithm with Base-Pair Resolution for Identifying Cancer Heterogeneity by Estimating Multiple Clonal Haplotypes

耿彧,赵仲孟,刘建业,许静,崔代兵,萧笑,王嘉寅
DOI: https://doi.org/10.7652/xjtuxb201706015
2017-01-01
Abstract:An algorithm for identifying haplotype heterogeneity in cancer genomes is proposed to consider somatic mutational events carried by multiple sub-clones.The algorithm is based on the genomic sequencing data with multiple libraries of tumor tissue and extracts the features from both the multi-library and the constraints of paired-end reads.A priori number of sub-clones is roughly estimated by clustering the allelic variant frequency of each somatic loci.A contig-andextension algorithm is designed,and the haplotype sequences are assembled by traversing the reads mapping to the loci.Thus,the contigs present an identification resolution on base-pair level.The number and proportion of sub-clones and the evolution relationships among them are further estimated by maximizing the likelihood of the posterior probabilities.Simulation results show that the algorithm reaches 99 % in accuracy when the sequencing based library satisfies some coverage.The proposed algorithm outperforms the existing two-stage pipeline,which is widely used in data analysis now.
What problem does this paper attempt to address?